The Diagnosis of Writing in a Second or Foreign Language 1138201359, 9781138201354

The Diagnosis of Writing in a Second or Foreign Language is a comprehensive survey of diagnostic assessment of second/fo

195 57 29MB

English Pages 346 [347] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Diagnosis of Writing in a Second or Foreign Language
 1138201359, 9781138201354

Table of contents :
Cover
Half Title
Series Information
Title Page
Copyright Page
Table of Contents
Foreword
1 Introduction to Diagnosing SFL Writing
Diagnostic SFL Assessment
Writing and Reading in L1 and SFL
Overview of the Book
Diagnostic Cycle
Specific Themes Covered Across Several Chapters
Main Points of the Chapters
2 The Development of Writing Ability
How Writing Ability Develops – an Introduction and Overview
Cognitive Views On the Development of Writing
Cognitive Stages View of Development
Writing in L1 and SFL
Diagnostic Potential
Communicative and Linguistic Stages View of Writing Development
Diagnostic Potential
Illustrative Examples of Research
Relationship Between Writing and Other Skills
Socially-oriented Theories of Writing Development
Sociocultural Theory View of Development
How Does this Approach Understand Writing and Its Development?
Writing in L1 and SFL
Diagnostic Potential
Illustrative Example of Research
Development of Expertise
How Does this Approach Understand Writing and Its Development?
Writing in L1 and SFL
Diagnostic Potential
Illustrative Examples of Research
Other Socially-Oriented Theories of Development
Complex Dynamic Systems View of Writing Development
How Does this Approach Understand Writing and Its Development?
Diagnostic Potential
Illustrative Examples of Research
Commonalities in the Development of Writing
Implications for Diagnosing SFL Writing
3 The Cognitive Basis of Writing Ability With a Special Reference to SFL Writing
Introduction
The Relationship Between Writing and Other Language Domains From a Cognitive Perspective
Cognitive Models of Writing in L1
Hayes-Flower Model (1980) With Its Updates By Hayes
Bereiter and Scardamalia’s Model (1987)
Kellogg’s Model (1996)
L1 Writing Models: Implications for SFL Writing and Diagnosis
Memory and Writing
Long-term Memory and Writing
The Resources of Working Memory in Writing
The Division of Labour Between the Working Memory Components
Cognition and SFL Writing – Specific Attention to Formulation Processes
Cognitive Models of SFL Writing
Lexical Retrieval in SFL Writing
Graphic Transcription and Spelling in SFL
Main Implications for Diagnosing SFL Writing
4 How Writing Ability Can Be Diagnosed
Introduction
Key Characteristics Involved in Diagnosing Writing
Contexts and Agents
Diagnosing Development, Processes, and Products
Constructs and Instruments Used for Diagnosis
Inter- and Intrapersonal Factors to Be Acknowledged
The CEFR and Its Relevance to Diagnosing Writing
Illustrative Examples of Diagnostic Tasks, Tests and Instruments Employed for Diagnosing Writing
GraphoLearn Learning Environment: Word Forming Task
Description of the Instrument
What Do We Know About the Instrument?
Roxify
Description of the Instrument
What Do We Know About the Instrument?
DIALANG Writing Tasks
Description of the Instrument
What Do We Know About the Instrument?
The Diagnostic English Language Needs Assessment (DELNA)
Description of the Instrument
What Do We Know About the Instrument?
VERA8: Writing Tasks Used in German Secondary Schools to Diagnose Learners’ Writing Abilities
Description of the Instrument
What Do We Know About the Instrument?
Empirically-derived Descriptor-Based Diagnostic (EDD) Checklist
Description of the Instrument
What Do We Know About the Instrument?
The European Language Portfolio
Description of the Instrument (Parts)
What Do We Know About the Instrument?
Illustrative Examples of Diagnostic Tasks, Tests and Instruments: Dynamic Assessment Instruments and Approaches
The Questions Test
Description of the Instrument
What Do We Know About the Instrument?
A Human-Mediated Dynamic Assessment of L2 Writing
Description of the Approach
What Do We Know About the Approach?
Summarizing the Discussed Instruments and Approaches
Main Implications for Diagnosing SFL Writing
5 Characteristics of Tasks Designed to Diagnose Writing
Introduction
Variables for Designing Direct Writing Tasks
Comparing Indirect and Direct Tasks
Implications of (in-)direct Approaches for Diagnostic Assessment
Task Demands and Task Complexity
Task Demands and Pedagogical Tasks
Task Difficulty in Direct Writing Tests
Indirect Tasks and Task Complexity
Implications of Task Demands and Complexity for Designing Diagnostic Tasks
Task Design to Capture the Development of Writing
Models of Writing Development in SFL
SLA Research From a Developmental Perspective
Approaches to Capturing Development
Implications of Developmental Perspectives for Diagnostic Task Design
Level-specific and Multi-Level Approaches to Task Design
Diagnostic Tasks Used in Large-Scale Vs. Classroom Assessment Contexts
Diagnostic Writing Tasks in Large-Scale Assessment
Diagnostic Writing Tasks in the Classroom
Comparing Computer-Based Vs. Paper-Pencil Delivery Modes
Cognitive Aspects
Comparability Studies
Score Comparisons
Rater Effects
Controlling Computer / Keyboarding / Word Processing Skills
Implications of the Computer-Medium On Diagnosis
Main Implications for Diagnostic Task Design
6 Diagnosing the Writing Process
Introduction
Characteristics of the Writing Task That Impact Processes
Methods for Diagnosing Writing Processes
The Writing Process
Planning
Research On Planning in SFL Writing
Text Generation
Research On Text Generation
Reviewing and Revising Across Drafts
A Model for the Review Process
Defining the Task of Revision
Evaluation
Modifying the Text
Research On Revision in L1 and SFL
Main Implications for Diagnosing SFL Writing
7 Analyzing Writing Products By Rating Them
Properties of Diagnostic Scales
Holistic and Analytic Approaches
CSWE Certificates in Spoken and Written English – a Curricular Approach
Levels
Assessment Criteria
Task-specific Assessment Criteria
Design Principles of Diagnostic Scales
VERA8 – Taking the CEFR as Basis for Rating Scale Development
DELNA – Analysis of Discourse Features
Challenges for Human Raters
Main Implications for Diagnosis
Implications for Diagnostic Rating Scale Design
Implications for Diagnostic Rater Training
Differences Large-Scale Vs Classroom-Based Diagnosis
Limitations of Conventional Rating Scales for Diagnosis
Conclusions
8 Automated Analysis of Writing
Introduction
Need for Automated Analysis of Writing
Automated Scoring Vs Automated Evaluation
Automated Scoring Systems
Examples of Automated Essay Scoring Systems
Examples of Automated Writing Evaluation Systems
Automated Writing Evaluation Systems: Implications for Diagnosis
Automated Text Analysis Tools for Research Purposes
What Happens in Automated Writing Analysis?
Constructs Measured By Automated Writing Assessment Systems
What Tasks Are Suitable for Automated Writing Evaluation?
Automated Feedback
Usefulness of Automated Writing Evaluation
Future Developments in Automated Writing Evaluation Systems
9 The Role of Feedback in Diagnosing SFL Writing Ability
Introduction
Conceptualizations of Feedback
Hattie and Timperley’s Model of Feedback
What Is Known About Diagnostic Feedback On Writing
Agents
Delivery
Focus
Timing
Requested Responses
Summing Up: Effective Diagnostic Feedback On Writing
Automated Writing Evaluation and Technologically Supported Delivery
Sociocultural Angle to Feedback
Mediation and Feedback
Feedback as Mediation
Feedback as Mediation and Diagnosis
Evaluating Feedback in Diagnostic Instruments
VERA8
DIALANG
Roxify Online
Main Implications for Diagnosing SFL Writing
10 Conclusions and Ways Forward
Introduction
Themes Bridging the Chapters
Implications of Learners’ L1, SFL Proficiency and Prior Writing Experience for Diagnosing SFL Writing
Individualization of the Diagnosis of SFL Writing
Diagnosis of SFL Writing in the Classroom
Stage 1: Planning
Stage 2: Operationalization: Select Or Develop Instruments
Stage 3: Assessing and Analyzing
Stage 4: Feedback
Stage 5: Actions
Stage 6: Evaluating Achievements of Goals
Collaboration Between Teachers and Researchers
Advancing Our Understanding of SFL Writing Development and How It Can Be Diagnosed
Granularity of Diagnosing
A Moment in Time Vs Longitudinal Diagnosis
Nature of Diagnostic Measures: Direct Vs Indirect
Ways Forward in Diagnosing SFL Writing
Conclusions
References
Index

Citation preview

THE DIAGNOSIS OF WRITING IN A SECOND OR FOREIGN LANGUAGE

The Diagnosis of Writing in a Second or Foreign Language is a comprehensive survey of diagnostic assessment of second/​foreign language (SFL) writing. In this innovative book, a compelling case is made for SFL writing as an individual, contextual, and multidimensional ability, combining several theoretically informed approaches upon which to base diagnosis. Using the diagnostic cycle as the overarching framework, the book starts with the planning phase, cover design, development, and delivery of diagnostic assessment, ending with feedback and feed-​forward aspects to feed diagnostic information into the teaching and learning process. It covers means to diagnose both the writing processes and products, including the design and development of diagnostic tasks and rating scales, as well as automated approaches to assessment. Also included is a range of existing instruments and approaches to diagnosing SFL writing. Addressing large-​scale as well as classroom contexts, this volume is useful for researchers, teachers, and educational policy-​m akers in language learning. Ari Huhta is Professor of Language Assessment at the Centre for Applied Language Studies, University of Jyväskylä, Finland. Claudia Harsch is Professor of Research into Language Learning, Teaching and Assessment at the University of Bremen, and the director of the Languages Centre of the Universities in the Land Bremen, Germany. Dmitri Leontjev is a senior researcher at the Centre for Applied Language Studies of the University of Jyväskylä, Finland Lea Nieminen is a research coordinator at the Centre for Applied Language Studies, University of Jyväskylä, Finland.

New Perspectives on Language Assessment Series Series Editors: Antony J. Kunnan, University of Macau; and James E. Purpura, Teachers College, Columbia University.

Headed by two of its leading scholars, this exciting new series captures the burgeoning field of language assessment by offering comprehensive and state-​of-​the-​art coverage of its contemporary questions, pressing issues, and technical advances. It is the only active series of its kind on the market, and includes volumes on basic and advanced topics in language assessment, public policy and language assessment, and the interfaces of language assessment with other disciplines in applied linguistics. Each text presents key theoretical approaches and research findings, along with concrete practical implications and suggestions for readers conducting their own research or developmental studies. The Diagnosis of Reading in a Second or Foreign Language By J. Charles Alderson, Eeva-​L eena Haapakangas, Ari Huhta, Lea Nieminen, and Riikka Ullakonoja Talking About Language Assessment: The LAQ Interviews Edited by Antony John Kunnan Evaluating Language Assessments By Antony John Kunnan The Diagnosis of Writing in a Second or Foreign Language By Ari Huhta, Claudia Harsch, Dmitri Leontjev, and Lea Nieminen

THE DIAGNOSIS OF WRITING IN A SECOND OR FOREIGN LANGUAGE

Ari Huhta, Claudia Harsch, Dmitri Leontjev, and Lea Nieminen

First published 2024 by Routledge 605 Third Avenue, New York, NY 10158 and by Routledge 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2024 Taylor & Francis The right of Ari Huhta, Claudia Harsch, Dmitri Leontjev and Lea Nieminen to be identified as authors of this work has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. ISBN: 978-​1-​138-​2 0135-​4 (hbk) ISBN: 978-​1-​138-​2 0137-​8 (pbk) ISBN: 978-​1-​315-​51197-​9 (ebk) DOI: 10.4324/​9781315511979 Typeset in Bembo by Newgen Publishing UK

CONTENTS

Foreword 1 Introduction to diagnosing SFL writing Diagnostic SFL assessment  1 Writing and reading in L1 and SFL  2 Overview of the book  4 Diagnostic cycle  4 Specific themes covered across several chapters  6 Main points of the chapters  9 2 The development of writing ability How writing ability develops –​an introduction and overview  14 Cognitive views on the development of writing  17 Cognitive stages view of development  17 Writing in L1 and SFL  17 Diagnostic potential  18 Communicative and linguistic stages view of writing development  18 Diagnostic potential  25 Illustrative examples of research  26 Relationship between writing and other skills  29 Socially-​oriented theories of writing development  31 Sociocultural theory view of development  31

xii 1

14

vi Contents

How does this approach understand writing and its development?  32 Writing in L1 and SFL  33 Diagnostic potential  34 Illustrative example of research  34 Development of expertise  35 How does this approach understand writing and its development?  36 Writing in L1 and SFL  37 Diagnostic potential  38 Illustrative examples of research  40 Other socially-​oriented theories of development  41 Complex Dynamic Systems view of writing development  44 How does this approach understand writing and its development?  45 Diagnostic potential  46 Illustrative examples of research  46 Commonalities in the development of writing  47 Implications for diagnosing SFL writing  50 3 The cognitive basis of writing ability with a special reference to SFL writing 52 Introduction  52 The relationship between writing and other language domains from a cognitive perspective  53 Cognitive models of writing in L1  55 Hayes-​Flower model (1980) with its updates by Hayes  55 Bereiter and Scardamalia’s model (1987)  59 Kellogg’s model (1996)  60 L1 writing models: Implications for SFL writing and diagnosis  63 Memory and writing  64 Long-​term memory and writing  64 The resources of working memory in writing  65 The division of labour between the working memory components  66 Cognition and SFL writing –​specific attention to formulation processes  68 Cognitive models of SFL writing  68 Lexical retrieval in SFL writing  72 Graphic transcription and spelling in SFL  74 Main implications for diagnosing SFL writing  76

Contents  vii

4 How writing ability can be diagnosed Introduction  79 Key characteristics involved in diagnosing writing  80 Contexts and agents  80 Diagnosing development, processes, and products  82 Constructs and instruments used for diagnosis  84 Inter-​and intrapersonal factors to be acknowledged  86 The CEFR and its relevance to diagnosing writing  87 Illustrative examples of diagnostic tasks, tests and instruments employed for diagnosing writing  87 GraphoLearn learning environment: Word forming task  88 Description of the instrument  89 What do we know about the instrument?  91 Roxify  91 Description of the instrument  91 What do we know about the instrument?  92 DIALANG writing tasks  93 Description of the instrument  94 What do we know about the instrument?  96 The Diagnostic English Language Needs Assessment (DELNA)  97 Description of the instrument  98 What do we know about the instrument?  98 VERA8: Writing tasks used in German secondary schools to diagnose learners’ writing abilities  99 Description of the instrument  100 What do we know about the instrument?  101 Empirically-​derived Descriptor-​based Diagnostic (EDD) checklist  104 Description of the instrument  104 What do we know about the instrument?  105 The European Language Portfolio  106 Description of the instrument (parts)  107 What do we know about the instrument?  108 Illustrative examples of diagnostic tasks, tests and instruments: Dynamic assessment instruments and approaches  110 The Questions Test  110 Description of the instrument  110 What do we know about the instrument?  111

79

viii Contents

A human-​mediated dynamic assessment of L2 writing  113 Description of the approach  113 What do we know about the approach?  113 Summarizing the discussed instruments and approaches  114 Main implications for diagnosing SFL writing  115 5 Characteristics of tasks designed to diagnose writing Introduction  118 Variables for designing direct writing tasks  120 Comparing indirect and direct tasks  124 Implications of (in-)direct approaches for diagnostic assessment  126 Task demands and task complexity  127 Task demands and pedagogical tasks  128 Task difficulty in direct writing tests  129 Indirect tasks and task complexity  131 Implications of task demands and complexity for designing diagnostic tasks  132 Task design to capture the development of writing  133 Models of writing development in SFL  134 SLA research from a developmental perspective  135 Approaches to capturing development  136 Implications of developmental perspectives for diagnostic task design  136 Level-​specific and multi-​level approaches to task design  138 Diagnostic tasks used in large-​scale vs. classroom assessment contexts  140 Diagnostic writing tasks in large-​scale assessment  140 Diagnostic writing tasks in the classroom  141 Comparing computer-​based vs. paper-​pencil delivery modes  143 Cognitive aspects  143 Comparability studies  143 Score comparisons  144 Rater effects  144 Controlling computer /​keyboarding /​word processing skills  145 Implications of the computer-medium on diagnosis  145 Main implications for diagnostic task design  146

118

6 Diagnosing the writing process Introduction  148

148

Contents  ix

Characteristics of the writing task that impact processes  150 Methods for diagnosing writing processes  153 The writing process  157 Planning  157 Research on planning in SFL writing  161 Text generation  162 Research on text generation  163 Reviewing and revising across drafts  165 A model for the review process  167 Defining the task of revision  168 Evaluation  169 Modifying the text  170 Research on revision in L1 and SFL  172 Main implications for diagnosing SFL writing  173 7 Analyzing writing products by rating them Properties of diagnostic scales  177 Holistic and analytic approaches  178 CSWE Certificates in Spoken and Written English –​a curricular approach  178 Levels  180 Assessment criteria  181 Task-​specific assessment criteria  182 Design principles of diagnostic scales  182 VERA8 –​taking the CEFR as basis for rating scale development  185 DELNA –​analysis of discourse features  186 Challenges for human raters  191 Main implications for diagnosis  193 Implications for diagnostic rating scale design  193 Implications for diagnostic rater training  195 Differences large-​scale vs classroom-​based diagnosis  200 Limitations of conventional rating scales for diagnosis  202 Conclusions  203

176

8 Automated analysis of writing Introduction  205 Need for automated analysis of writing  206 Automated scoring vs automated evaluation  207 Automated scoring systems  208 Examples of automated essay scoring systems  209

205

x Contents

Examples of automated writing evaluation systems  210 Automated writing evaluation systems: Implications for diagnosis  213 Automated text analysis tools for research purposes  214 What happens in automated writing analysis?  218 Constructs measured by automated writing assessment systems  219 What tasks are suitable for automated writing evaluation?  222 Automated feedback  224 Usefulness of automated writing evaluation  227 Future developments in automated writing evaluation systems  231 9 The role of feedback in diagnosing SFL writing ability 235 Introduction  235 Conceptualizations of feedback  237 Hattie and Timperley’s model of feedback  237 What is known about diagnostic feedback on writing  240 Agents  241 Delivery  243 Focus  245 Timing  247 Requested responses  248 Summing up: Effective diagnostic feedback on writing  249 Automated writing evaluation and technologically supported delivery  249 Sociocultural angle to feedback  251 Mediation and feedback  251 Feedback as mediation  252 Feedback as mediation and diagnosis  255 Evaluating feedback in diagnostic instruments  257 VERA8  257 DIALANG  258 Roxify Online  260 Main implications for diagnosing SFL writing  262 10 Conclusions and ways forward Introduction  264 Themes bridging the chapters  266 Implications of learners’ L1, SFL proficiency and prior writing experience for diagnosing SFL writing  266 Individualization of the diagnosis of SFL writing  270 Diagnosis of SFL writing in the classroom  271

264

Contents  xi

Stage 1: Planning  272 Stage 2: Operationalization: select or develop instruments  274 Stage 3: Assessing and analyzing  275 Stage 4: Feedback  275 Stage 5: Actions  276 Stage 6: Evaluating achievements of goals  276 Collaboration between teachers and researchers  277 Advancing our understanding of SFL writing development and how it can be diagnosed  278 Granularity of diagnosing  279 A moment in time vs longitudinal diagnosis  279 Nature of diagnostic measures: Direct vs indirect  281 Ways forward in diagnosing SFL writing  282 Conclusions  284 References Index

287 329

newgenprepdf

FOREWORD

This book is the result of a joint effort by our team of authors who met both face-​to-​face and online to plan and discuss the various chapters of the book throughout the writing process, from first drafts to revising based on the feedback from external reviewers and series editors. However, our work would not have been possible without the advice and support of a number of other people. First and foremost, we would like to thank the series editors Antony Kunnan and James ( Jim) Purpura for their detailed and perceptive feedback on the texts at various stages of the writing process. Two anonymous external reviewers also provided us with useful advice that helped us to significantly increase the coherence of the volume. In addition, we benefited greatly from discussions with colleagues and friends during numerous conferences, seminars, and other meetings over the years. Some of our colleagues at the University of Jyväskylä gave us invaluable assistance in the preparation of the manuscript. Our thanks go to Laura-​Maija Suur-​A skola and Ghulam Abbas Khushik for checking and complementing the references and to Sinikka Lampinen and Ari Maijanen for helping us with the figures used in the book. We are also grateful to the editorial staff at Taylor & Francis, who helped us throughout the process and particularly in the production stage of the book. Ari Huhta, Claudia Harsch, Dmitri Leontjev, and Lea Nieminen

1 INTRODUCTION TO DIAGNOSING SFL WRITING

Diagnostic SFL assessment

An earlier volume in this series, The Diagnosis of Reading in a Second or Foreign Language, presented the case for a need to carry out research on diagnosing second or foreign language (SFL) proficiency and its development. The authors of the book (Alderson, Haapakangas, Huhta, Nieminen, & Ullakonoja, 2015) argued that “in the field of second or foreign language (SFL) education, there is very little discussion of diagnosis, of what it is, who does it, how it is done and with what results” (p. 1). They also noted that there have been very few tests that claim to be diagnostic. What Alderson et al. wrote a few years ago is still largely true although diagnostic testing and assessment are clearly becoming more popular among researchers representing different orientations in applied linguistics. Latest developments include further theorizing of what diagnosing SFL skills is about (Alderson, Brunfaut, & Harding, 2015; Huhta, forthcoming; Huhta, in print; Lee, 2015), elaboration of cognitive diagnostic models (e.g., Yi, 2012), investigating the effectiveness of diagnostic feedback (e.g., Dunlop, 2017: Hoang & Kunnan, 2016), accounts of the development and validation of new diagnostic tests (e.g., Mizumoto et al., 2019) and rating scales (e.g., Isaacs et al., 2018), as well as the application of corpus linguistics and automated analysis of learner performances for diagnostic purposes (e.g., Chapelle et al., 2015; Xi, 2017; Yan et al., 2020). Clearly, diagnostic SFL assessment is an increasingly active area of research and development. However, while some of the past and current work on diagnosing SFL has targeted writing, more comprehensive accounts of the field and of recent developments in diagnosing SFL writing are still lacking. The present book aims to fill this gap. DOI: 10.4324/9781315511979-1

2  Introduction to diagnosing SFL writing

Similar to Alderson, Haapakangas et al. (2015), our definition of diagnosis is quite extensive. We are interested in both the strengths and weaknesses of the language learners, but the weaknesses are likely to be more important to understand, as they will enable us to assist struggling learners to improve. It is important to be aware of the fact that learners who have weaknesses in their performance do not form a uniform group; rather, the sources for their problems can have very different origins. They range, for example, from learning disabilities to unsuitable teaching methods or materials, to ineffective learning strategies, and to a lack of motivation. Some notes about the terminology we use in the book are in order. We will use the terms diagnosis and diagnostic assessment interchangeably to refer to the whole diagnostic cycle (see Figure 1.1), although they could be used to refer only to the stage in which learners are actually assessed in some way. Where we refer to only one of the stages in the diagnostic cycle (see the “Diagnostic cycle” section later in this chapter), we will indicate so. Furthermore, we will use the terms writing skill or skills and writing ability interchangeably. Writing and reading in L1 and SFL

What is writing and how is it similar to or different from the other language skills, particularly reading? Writing is similar to reading in that they are both skills that have to be learned rather than acquired. Whereas oral skills in one’s first language (L1) are acquired at a very early age in everyday interaction with parents, siblings, and other people in the immediate environment, literacy skills of reading and writing are learned somewhat later, typically in some formal educational setting such as a school. In fact, a major purpose of early formal education is to teach children to read and write because they are considered key skills for learning new knowledge and for participating fully in society (see UNESCO, 2006, 2008; OECD, 2018). Both reading and writing are based on print, either on paper or nowadays also on screen, and they both require the language user to understand how the printed characters (e.g., letters, logograms) relate to the sounds of the language in question. Therefore, reading and writing must share a lot of skills to be mastered. Learning to read and learning to write are also very likely to have many commonalities. Writing is, however, different from reading in that it is easier to observe. Whereas the product of reading usually remains internal to the reader or, at best, comprises a set of responses to questions on the text(s) that have been read, writing leaves a more visible and more direct product –​a concrete text on paper or screen that can be analyzed and assessed in as much detail and as many times as necessary. The writing process, too, is somewhat easier to observe than the reading process. Admittedly, we cannot know (unless we ask, for example) what happens in a writer’s head any more than we can know how a reader

Introduction to diagnosing SFL writing  3

processes a text. However, if we make the effort, we can observe what writers do before they actually start writing –​for example, whether they jot down a list of points or draw a mind map or engage in some other form of planning what to write. We can also see what kind of changes writers might make to the text, whether they delete or replace some expressions or whether they reorganize their text in some way. It should be noted that we can observe such aspects of the process of how writers direct their attention and use tools such as dictionaries while reading instructions or revising text. This is possible to some extent by observing how readers/​w riters go about their task, for example, when they turn pages or return to revise an earlier part of the text they wrote. If we want to investigate writers’ processes in more detail and more systematically to understand and diagnose their strategy use, we would then need to resort to asking them questions about their actions or asking them to report what they do when they read or write. Less intrusive ways to gain insights into the reading/​ writing processes would be using eye-​t racking devices and systems that capture keystrokes or handwriting. Interestingly, Alderson, Haapakangas et al. (2015) wrote that [o]‌ne can argue that the diagnosis of problems in SFL writing or speaking is not as difficult, and indeed is both more widespread in classroom practice and second language acquisition research. This is partly because the learners’ problems in these two so-​called productive skills are more obvious: they can be seen in the students’ writing and heard in their oral use of the language. (p. 2–​3) We leave it to the readers of this book to judge to what extent the claims made by Alderson et al. regarding the comparative difficulties of diagnosing SFL writing ability versus SFL reading ability are warranted. It is clear, however, that studies of SFL writing, which are relevant for diagnosis, are indeed considerably more common in research on language teaching and second language acquisition than investigations of SFL reading. For example, studies of feedback on SFL writing in classroom contexts are many times more frequent than corresponding research focusing on feedback on reading. Therefore, there is a considerably larger research base to draw on for SFL writing than for reading. As is the case with reading, much of the theoretical work on writing has been carried out in the L1 context. Not surprisingly, therefore, there is no coherent and comprehensive theory of writing in a second or foreign language (SFL) (Polio & Williams, 2009, p. 486). As far as dedicated SFL writing theories are concerned, they tend to focus on specific aspects of writing such as writing processes (e.g., Sasaki, 2000, 2002), writer’s knowledge and awareness of the discourse community

4  Introduction to diagnosing SFL writing

(e.g., Matsuda, 1997) or how L1 and SFL knowledge are combined in SFL writing (e.g., Wang & Wen, 2002). Hence, in our discussion of different theories in Chapters 2 and 3, the emphasis will be on general, largely L1-​based theories of writing and its development. In addition, we will review the approach to describing SFL proficiency, including writing, presented in the nowadays very influential Common European Framework of Reference (CEFR; Council of Europe, 2001, 2020). The CEFR is not actually a theoretical model of language ability, but its set of proficiency scales illustrating writing activities can be seen as one approach to defining SFL writing. Overview of the book

We conclude this introductory chapter with a general overview of the book. More specifically, we explain how the chapters are organized around the concept of a diagnostic cycle and how the chapters relate to specific themes covered in the book. Finally, we provide brief summaries of each chapter. Diagnostic cycle

In line with the earlier volume on diagnosing SFL reading (Alderson, Haapakangas et al., 2015), we subscribe to the notion that useful diagnosis consists of several stages, which can be conceptualized in a cyclical way, a concept that we henceforth call the “diagnostic cycle”. This cycle is the framework that links the chapters of the book. We will thus ground the topic of each chapter in one or more stages of the cycle, which makes it easier to see how the theoretical and empirical work reported in the individual chapters contributes to our understanding of the diagnosis of SFL writing (see also Table 1.1 in this chapter). We perceive the diagnostic cycle as a process with five recurring stages: (1) defining what is to be assessed, (2) operationalizing it, (3) conducting diagnostic assessment, (4) designing feedback, and (5) implementing appropriate action (see Figure 1.1). The cycle begins with defining what is to be assessed, which in an educational context is often based on the curriculum or course goals, possibly combined with the learners’ needs as identified by the teacher. One part of this stage involves forming a sufficient understanding of the constructs of interest, that is, constructs that relate to the goals or needs of the learners. An example of such a construct is the SFL writing ability or specific aspects of it. A number of scholars argue that useful diagnosis requires a sufficiently detailed and accurate understanding of the constructs to be diagnosed, and how they are learned and acquired (e.g., Alderson, 2005, 2007b; Alderson, Haapakangas et al., 2015; Jang & Wagner, 2014; Jang et al., 2015; Lee, 2015). This understanding is a

Introduction to diagnosing SFL writing  5

FIGURE 1.1 The

Diagnostic Cycle.

prerequisite so as to maximize the positive impact of diagnosis and is the reason why we will devote considerable attention in the book to theories and models of writing and its development. In the second part of the cycle, we need to operationalize our analyses of the constructs and learner needs and goals. We need instruments and approaches that allow us to identify strengths and weaknesses in the skill and its development and, ideally, to understand why particular learners have specific types of skill profiles. In most classroom contexts, teachers have to design their own assessment procedures or use whatever formative assessment instruments provided in the textbooks they use, for example. In rare cases, they may have access to professionally designed instruments developed specifically for SFL diagnosis (see Huhta, forthcoming, for a more detailed discussion). Certain chapters of the book focus on the most commonly used instruments for diagnosing SFL writing such as tasks, rating scales and automated text analysis tools. The third stage of diagnosis, after defining the foci of assessment and selecting appropriate procedures, is the actual diagnostic assessment. The assessment can be carried out by the teacher or the learners themselves by engaging in self-​ or peer-​assessment. Depending on what was decided in the planning stage, diagnosis can focus on either the products or processes of language use. As we will discuss later, diagnostic assessments tend to focus on the products of writing –​the texts produced by the learners. However, we will elucidate how the writing process, too, can be diagnosed even in the classroom context. Furthermore, diagnostic assessment can be a one-​ shot event or multi-​ step process; to capture learner development, the latter being preferable.

6  Introduction to diagnosing SFL writing

The learners’ performances are subsequently interpreted by the teacher or the learners themselves with regard to how the learners are doing in relation to the goal. Meaningful interpretation is a necessary prerequisite for the fourth stage in the cycle, which concerns generating useful feedback for both the teacher and the learner. We devote a full chapter in the book to this crucial (feedback) step in diagnosis (see Chapter 9). The fifth and final stage in the diagnostic cycle refers to the action that should follow feedback. Without action to address the identified weaknesses, diagnostic assessment is incomplete. Action can come in two forms: First, it can involve modifications to teaching deemed necessary by the teachers based on their interpretation of the information obtained through diagnostic assessment, that is, based on the feedback that the teachers generate to themselves from the information. Second, and perhaps more importantly, action should lead to a modification to what the learner does. Changes in teaching alone may not result in any improvement if the learner is not engaged in meaningful activities that have the potential to remedy the situation. For independent learners, in particular, there may be no modified teaching available, but whatever diagnostic insights they may be able to draw from their language use or any external diagnostic tools (e.g., DIALANG; Alderson, 2005), they have to turn into remediating action themselves (see Huhta, forthcoming, for a discussion of diagnosis by independent SFL learners). Importantly, however, the diagnostic cycle should not end as long as learning and teaching are taking place. The cycle repeats, but because new insights into learners’ struggles may have emerged during the previous stages, the goals, needs, constructs, and instruments may need to be adjusted or re-​interpreted. That is, the focus of diagnosis can shift as the diagnostic process moves on. We will revisit the diagnostic cycle in Chapter 10 of the book, where we will describe in more detail the steps in diagnostic assessment in the classroom context. Specific themes covered across several chapters

We will discuss a number of themes relevant to the diagnosis of SFL writing across the different chapters of the book. These include the conceptualization of writing as both a product and a process, and as both a cognitive and sociocultural activity, agency in diagnosis (teacher, learner, other), context (classroom, research, large-​scale assessment), use of direct vs indirect measures of ability for diagnosis, and the use of general frameworks such as the CEFR (Council of Europe, 2001, 2020) for diagnostic purposes. Important as they are, none of them can serve as an overarching theme for the book; rather, they are discussed in those chapters for which they are most relevant. The overarching framework that binds the chapters across the book is, therefore, the diagnostic cycle that we

Introduction to diagnosing SFL writing  7

introduced above. Table 1.1 gives an overview of how the chapters, the themes, and the stages of the diagnostic cycle relate to each other. Table 1.1 addresses the three basic questions about diagnostic assessment of SFL ability: what, who, and how. The first of these, what, concerns deciding and defining the constructs, goals, and needs for diagnostic assessment, clearly a stage one activity in the diagnostic cycle. This question is most directly addressed in Chapter 2, which describes the major cognitive and socially-​oriented theories of L1 and SFL writing development, and in Chapter 3, which focuses on selected cognitive models aiming to explain what happens in writing. However, such an important construct-​related theme as the dual nature of writing as a process and product is also covered elsewhere in the book, most prominently in Chapter 6, which is dedicated to the diagnosis of the writing process, but also in the more product-​ oriented methodological chapters: Chapter 4, which presents and analyzes concrete examples of instruments for diagnosing SFL writing, as well TABLE 1.1 Themes Covered Across the Chapters and Stages of the Diagnostic Cycle

Stage in the diagnostic cycle

Themes

Chapters

WHAT is diagnosed

Stage 1 (defining constructs, goals, needs)

Writing as cognitively-​ oriented and socially situated activity; writing as product and process; CEFR in diagnosis

WHO diagnoses and uses diagnostic information

All stages, but particularly Stage 3 (assessment), 4 (feedback) & 5 (action)

Agency (teacher, learner, researcher); context (classroom assessment, large-​ scale assessment)

HOW diagnostic information is obtained and used

Stage 2 (operationalizing) Stage 3 (assessment) Stage 4 (feedback) Stage 5 (action)

Direct /​indirect assessment; context; agency; CEFR

Mainly Ch.2 (development) Ch.3 (cognition); also Ch.4 (how to diagnose); Ch.5 (tasks); Ch.6 (process); Ch.7 (rating scales); Ch.8 (automated assessment) Ch.4 (how to diagnose); Ch.8 (automated assessment); Ch.9 (feedback); also Ch.6 (process); Ch. 7 (rating scales) Ch.4 (how to diagnose); Ch.5 (tasks); Ch.7 (rating scales); Ch.8 (automated assessment); Ch.9 (feedback); Ch.10 (conclusions)

8  Introduction to diagnosing SFL writing

as Chapter 5 on diagnostic writing tasks, Chapter 7 on diagnostic rating scales, and Chapter 8 on automated diagnostic tools. Most of these chapters also take the application of the CEFR for diagnosing SFL writing into account. The second important question about diagnostic assessment, who, relates directly to the agency theme. Who is responsible for diagnosing –​who gathers diagnostic information and who uses it? As such, this cannot be pinned down to any single stage in the diagnostic cycle; it rather relates to all stages even if it might be more prominent in the more concrete phases of conducting assessment in practice, providing feedback, and acting on feedback rather than in the more conceptually-​oriented planning stages. Most research on diagnostic and other learning-​oriented types of assessment appears to concern teacher-​led diagnosis, but in the classroom contexts learners often have a significant role and agency through self-​or peer-​a ssessment. Agency in diagnostic SFL writing assessment is explicitly discussed in Chapter 4, where examples of diagnostic assessments are given, and in Chapter 9 on diagnostic feedback. Chapter 6 on the writing process and Chapter 8 on automated writing assessment also touch on agency, albeit from somewhat different perspectives. Context is another important theme discussed in the book. On the whole, the book concentrates on the classroom as the most typical context for diagnostic assessment. Nevertheless, at least two other contexts emerge in some of the discussions: large-​scale diagnostic assessment and diagnostic research. Chapter 4 presents two examples of large-​scale diagnosis: the international DIALANG system and the German VERA8 approach. Considerable research was carried out when developing these systems; therefore, they also illustrate research into SFL diagnosis. Chapter 6, too, touches on diagnostic research by discussing some instruments that are more amenable to research purposes than to practical diagnosis in educational contexts. The third major diagnostic question, how, concerns the ways in which diagnostic assessment is operationalized and carried out, and how the information collected through assessment is used as feedback to inform action. Thus, the focus is on all the stages in the diagnostic cycle that come after the initial planning stage. A major theme in the operationalization and assessment stages is the directness of diagnostic procedures: Do the learners have to actually write something, such as complete texts, which allows for a more direct evaluation of their writing ability; or are their writing skills tapped by using indirect tasks that focus on some specific sub-​a spects of writing. Directness is discussed in Chapter 4, which presents a wide range of diagnostic procedures that vary in their directness, and particularly in Chapter 5 on the design of diagnostic writing tasks. In addition, Chapter 10 revisits the key points in using direct and indirect tasks to diagnose SFL writing. Two further chapters are relevant for the assessment of direct writing tasks, namely Chapter 7 that discusses the use of rating scales to obtain diagnostic information and Chapter 8 that focuses on automated evaluation tools that can be used to analyze written texts.

Introduction to diagnosing SFL writing  9

The use of diagnostic information gathered through various assessment procedures is discussed in Chapter 9, which describes how the information gained during diagnosis can be turned into feedback. Using diagnostic information and feedback for planning action and for evaluating the impact of the action is discussed in the final Chapter 10, as part of the more general analysis of the stages of the diagnostic cycle in classroom contexts. The themes of agency and context are also relevant when the how of diagnostic assessment is considered, for example, in Chapters 4, 8 and 9. Finally, the CEFR is made use of in several diagnostic instruments, rating scales and feedback schemes, as reported in Chapters 4, 7 and 9. Main points of the chapters

We conclude this introduction with brief summaries of the chapters of the book. Chapter 2 contains a fairly extensive review of various theories and models of writing ability, particularly from the point of view of the development of writing ability in both L1 and SFL. Broadly speaking, the theoretical approaches can be divided into cognitively-​oriented and socially situated theories, although some of them share elements of both. The cognitively-​oriented models (e.g., Bereiter & Scardamalia, 1987) depict either general cognitive development or stages of communicative and linguistic proficiency (e.g., the CEFR). The socially situated theories (see, e.g., Lantolf & Thorne, 2007), for their part, describe how writing develops in learners’ interactions with their environment. The sociocultural theory (e.g., Vygotsky, 1978) and models of how (writing) expertise (e.g., Alexander, 2003) develops illustrate socially situated approaches. We also describe the Dynamic Systems Theory (Larsen-​Freeman, 1997) as a relatively recent approach in second language acquisition research that seeks to understand SFL development. We analyze the contribution that these theories might make to the diagnosis of SFL writing and present examples of research that illustrate such contributions. Chapter 3 gives a more detailed account of three influential cognitive models of L1 writing, and two models of SFL writing proposed by Börner (1989) and Zimmerman (2000) in order to better understand the cognitive demands that writers face when they are writing in their L1 and in a SFL. All these theories place importance on the role of memory –​either long-​term memory or working memory –​and therefore, issues concerning the central position of memory are described in more detail. Another approach to understanding the cognitive demands of writing that we take in this chapter is to compare and contrast the so-​called lower-​and upper-​level cognitive processes. The main conclusion resulting from this comparison is that the lower-​level processes appear to have a key role in SFL writing. This is followed by a discussion of the specific cognitive requirements in SFL writing in contrast to writing in an

10  Introduction to diagnosing SFL writing

L1. Finally, the chapter considers possible reasons for weaknesses in writing and the challenges of distinguishing between learning difficulties, low levels of language proficiency, and poor writing skills in SFL writers. Interestingly, the most influential L1 writing models discussed in this chapter describe general cognitive processes involved in the product of writing (i.e., the text). It seems, therefore, that an analysis of learners’ general cognitive skills has considerable potential for diagnosing their SFL writing processes (see, e.g., Purpura, 2014). A major challenge for the diagnosis of SFL writing arises from the fact that a diagnoser has to consider that the underlying reason for learners’ struggles in writing may be a learning disorder, a factor related to their lack of SFL proficiency or a lack of literacy (traditionally defined) in their L1, or any combination of these. In Chapter 4, we discuss factors that need to be taken into account to diagnose SFL writing. These include the contexts and agents of diagnostic assessment, the nature of the constructs in question, the approaches and instruments that are used for diagnosis, and relevant intra-​and interpersonal factors. The potential contribution of the CEFR to the diagnosis of SFL writing is also revisited. A major part of this chapter consists of the presentation and analysis of selected writing instruments and procedures that claim to be diagnostic or that have diagnostic relevance even if they might not be labeled as diagnostic by their designers and users. In the order of presentation, these are GraphoLearn, Roxify, DIALANG, DELNA, VERA8, EDD-​checklist, European Language Portfolio, and two examples of Dynamic Assessment approaches, one computerized, one human-​mediated. We finish by drawing several generalizations arising from our background literature review and our analysis of the diagnostic assessment instruments and approaches. We particularly argue for the necessity of teacher training to improve their diagnostic competence and for the need for longitudinal research targeting the diagnosis of the SFL writing ability. Chapter 5 discusses task characteristics and their implications for eliciting diagnostic information, focusing on the written products. Here, we report research from the fields of language testing, language pedagogy, and second language acquisition to explore implications of insights into task properties on the diagnosis of writing. This chapter is dedicated to task design principles, arguing that diagnostically useful task design should consider different types of task characteristics. Language is an obvious aspect to consider: A diagnostic task must elicit detailed information from learners at different levels of proficiency with regard to their linguistic and strategic competences. Other aspects that may need to be taken into account, depending on what exactly one wants to diagnose, include cognitive features related to text production (e.g., planning or revising), the discursive and social dimensions of writing (e.g., awareness of the genre or the discourse community) and a writer’s personal characteristics

Introduction to diagnosing SFL writing  11

(e.g., motivation, anxiety). We also discuss task specificity and openness, and their implications for diagnosing writing. Finally, we compare computer-​based delivery with paper-​pencil tests and explore the implications of the delivery mode on task design. We conclude the chapter by outlining several recent research directions that can potentially inform diagnostic assessment of SFL writing, and, related to it, expand our understanding of the construct of SFL writing. Chapter 6 is dedicated to diagnosing the process of writing. Two different meanings of ‘process’ are covered. The first is process writing, which typically refers to a process in which a writer works on a text for some time by producing different versions of it based on feedback received from teachers or peers. That is, the first version of the text is not the final draft, but the text is elaborated on and improved over time until a satisfactory product is achieved. The role of teacher-​, peer-​, and self-​oriented approaches to assessment is considered when the writing process is approached from this perspective. The second meaning of process refers to the actual production of text during the act of writing. Text production is typically studied by observing the writers while they are generating text, by eliciting writers’ concurrent or retrospective accounts of their process, or by recording their actions by using keystroke logging software or eye-​tracking equipment. Information gathered by means of technology-​mediated online platforms can also provide data that can be utilized in a somewhat similar way to keystroke logs, that is, information about time and timing of actions, revisions, and use of different sources of information while writing. Since the most widely used models of writing (e.g., Hayes & Flower; Kellogg; Bereiter & Scardamalia) focus on the writing processes, the implications of these models for developing the diagnosis of SFL writing processes are analyzed. In Chapter 7, we return to the product of writing and approach it from the perspective of ratings carried out by human raters, such as teachers or professional assessors. The focus is on the rating scales and other such coding schemes that can yield diagnostically useful information about learners’ strengths and weaknesses in their SFL writing. We present and analyze several performance scales available in the literature on writing assessment that have been or could potentially be used for diagnostic purposes. Examples of such scales include those based on the CEFR that were designed and used for large-​ scale diagnostic assessment in Germany, and scales specifically designed for diagnostic purposes in Australia. The chapter also discusses what rating scales designed for diagnostic purposes might look like, what challenges human raters face when striving for diagnostic ratings such as the halo effect, and the raters’ ability to make relevant and sufficiently detailed observations about the written products. The chapter is rounded off with an outlook on possible ways to address such challenges.

12  Introduction to diagnosing SFL writing

Chapter 8 reviews recent advances and systems used for the automated analysis of texts written by both L1 and SFL writers. The focus is on the writing products due to the nature of the applications and research in this area, but obviously some aspects of the writing process can also be analyzed automatically, as is elaborated also in Chapter 6. The automated scoring of SFL learners’ writing takes into account a wide range of linguistic and textual features of the texts, many of which have diagnostic potential. This chapter describes what is involved in the automated analysis of writing, how such analyses are used for assessment purposes, and how they can contribute to the diagnosis of SFL writing ability. We present a number of computer programs that are implemented in both high-​stakes language examinations and in tools that are intended to support learning by providing feedback and guidance to learners. An important part of the chapter is a discussion of the constructs that automated assessment programs measure –​and what might yet remain untapped by them. We also consider the types of feedback that these tools deliver, and conclude the chapter by suggesting directions for future development of automated evaluation and how that might expand the diagnosis of SFL writing ability. Feedback is a key stage of the diagnostic cycle. Chapter 9 provides an overview of research on feedback on learners’ writing, both in relation to writing products and processes. Most feedback on writing is provided by other people, usually by teachers, on the writing product, or on the writing process across several drafts. However, recent advances in automatically generated feedback have made computer-​generated feedback a viable option. The chapter covers questions such as the focus, timing and manner of feedback, and what is known about its effects on SFL learners’ writing. Special attention is given to the discussion of what makes feedback diagnostically useful. This discussion takes into account the agents giving feedback, the delivery and wording of feedback, and the feedback’s focus, timing, and uptake by learners. Our discussion of the nature of writing feedback makes use of the feedback framework proposed by Hattie and Timperley (2007) on the levels and types of feedback, which the previous volume on diagnosing SFL reading also referred to in its treatment of feedback. We also draw upon the sociocultural perspective on feedback. We then evaluate feedback in several diagnostic instruments presented in Chapter 4. We conclude the chapter by outlining several desirable characteristics of useful diagnostic feedback. The tenth and final Chapter briefly summarizes the previous sections of the book and discusses the most important themes that span other chapters but could not be discussed in more detail there. One of these is whether and how the learners’ level of SFL proficiency and their previous experience in writing in their L1 and/​or SFL could be taken into account in diagnosing SFL writing.

Introduction to diagnosing SFL writing  13

Other themes include how diagnosis could be individualized and what the special characteristics of diagnosis in classroom contexts are. We also revisit the diagnostic cycle introduced earlier in the book and discuss how it can be applied in classroom contexts. Finally, we review some promising areas of development and research in SFL diagnosis and thereby chart a tentative research agenda for the area.

2 THE DEVELOPMENT OF WRITING ABILITY

How writing ability develops –​an introduction and overview

In this chapter, we give an overview of several influential theories and models that explain how writing in L1 and SFL develop. Understanding development is a prerequisite for successful diagnosis; therefore, the chapter addresses the starting point of the diagnostic cycle, namely the phase that concerns defining the constructs of interest. Learning to read and write typically happens simultaneously. Often, learning these skills takes place at school, being the key purpose of early formal education. Unlike oral skills in one’s L1, literacy skills are not acquired automatically as part of everyday communication during the child’s first few years. Even if many children can read and even write before going to school, literacy usually develops after they have learned to speak (deaf people, for example, are obviously exceptions to this). It should be noted that in a literate environment, learning to write is preceded by a pre-​w riting stage during which children become aware of elements of writing before they learn to read and write. For example, they learn what written texts look like. Children, then, can produce something that looks like writing themselves. They also recognize the differences between spoken and written language and start to understand that there are different genres, first in oral language but then in written language. All this lays the foundation to later literacy skills (see Tolchinsky, 2016). The initial stage of learning to become literate involves the development of coding/​encoding (from spoken to written) and decoding (from written to spoken) skills, that is, learning how the written and spoken expressions match. DOI: 10.4324/9781315511979-2

The development of writing ability  15

For non-​a lphabetic writing systems such as logographic or syllabic systems (for example, Chinese and Japanese), the relationships between the units in the spoken and written language are more complex and, therefore, the development of literacy skills takes different pathways and usually more time. While coding/​ encoding skills, but also typing or writing by hand, are essential parts of writing, they are clearly not enough. Literacy skills involve using a range of skills and types of knowledge similar for reading and writing, such as understanding the meaning of the words and structures of the language as well as some background knowledge (e.g., about the topic, culture, and the world). At the early stages of becoming literate, reading probably develops before writing: we first need to be able to read something before we can write with meaning. However, in practice, the two skills develop simultaneously, particularly in a school context. The development of writing and reading depends very much on the learners’ age, needs, interests, language, and education. Children in a school context often start by writing single words and sentences in exercise books and by writing greetings and brief stories, for example. Much of what young learners actually write depends on the school curriculum and the teacher. In higher grades, the learners’ repertoire widens to encompass typical writing tasks such as compositions, essays, reports, and answers to test questions. In their free time, young learners’ writing nowadays includes contributing to written exchanges in different social media. If learners continue in higher education, they need to engage in longer, more abstract, and more complex academic writing tasks such as master’s theses. However, many people across the world do not learn to read and write in school as children but acquire those skills only in adulthood. The communication needs and interests of such adults differ from those of children; and, therefore, the first writing tasks of these adults, such as filling out forms, writing memos and reports, and replying to e-​m ails, may relate to their work. All in all, the development of writing can take place in many different ways. Typically, we learn to write at school in our childhood. Some, however, learn to write only as adults, as was just mentioned. Others, both adults and children, learn their literacy skills in a language different from their first language. For example, in several African countries, the language of formal education is English or French; thus, many children learn to read and write in English or French rather than in their own language. Migrants who learn to read and write in a language that is different from the language(s) that they can speak may never become literate in the languages of their country of origin. Finally, we must acknowledge that writing comprises many aspects, which can develop at different paces. While the technical aspects of writing are typically learnt relatively rapidly, creating texts to meet the specific requirements of a particular occupational, professional, or educational context can require a very long time

16  The development of writing ability

and a considerable amount of practice. It is these different but complementary aspects of writing that the models and theories that we will focus on next aim at elucidating. Theories of SFL writing development can be classified into two broad paradigms, according to how knowledge is perceived to be gained and development is conceptualized, that is, what Sfard (1989, in Scarino, 2009) called the acquisition metaphor and the participation metaphor. In the first paradigm, knowledge is seen as rather static, its development happening as a process of adding new knowledge to previously stored knowledge in the human mind. The individual and the social are thus separated. The individual’s autonomy in learning develops as this individual acquires means for controlling the environment through their cognitive development. This paradigm conceptualizes development as a mostly linear process, often with identifiable developmental stages common for all learners, with the presupposition that to reach the next stage one has to have mastered the previous stage, and that learners cannot skip a stage in their development. Consequently, diagnosis in this paradigm aims at identifying learners’ developmental stages and providing feedback that directs them to the following stage. The second broad paradigm conceptualizes learning as participation, taking place in interaction and co-​construction with others and the environment. Knowledge and skills are perceived as changeable. This modifiable nature of an individual’s knowledge, or, indeed, the understanding of the social as the source for human development, implies that development is created in social interaction and does not necessarily follow a fixed path. This paradigm usually suggests a development from simple (disjointed and experiential) concepts to complex conceptual systems, and from needing support to becoming more self-​reliant. The latter aspect, that is, learner autonomy, is conceptualized as development from being fully directed by others to gradually becoming less dependent on others to being fully able to monitor and direct one’s own learning. From the point of view of the diagnosis of writing, theories in this paradigm can shed light on why learners are writing in a particular way, but also how the development of their writing from simple concepts to complex conceptual systems, and towards more autonomous behaviour can be shaped. The two paradigms also underlie different theories of writing development, albeit not in a clear-​cut way. Many theories refer to both paradigms to varying degrees. A detailed discussion of different theories of SFL writing development has been provided elsewhere (e.g., Camp, 2012; Indrisano & Squire, 2000; Manchón, 2012). We next evaluate how these theories can inform the diagnosis of SFL writing. As it should not be assumed that any one theory is capable of accounting for all the factors involved in SFL writing (Norris & Manchón, 2012), we consider the development of writing from relevant theoretical viewpoints to discuss the implications for diagnosing SFL writing development.

The development of writing ability  17

Cognitive views on the development of writing Cognitive stages view of development

Development has been conceptualized as distinct stages in developmental psychology and education, as for example Perry (1970) and Piaget (1959, 1967) proposed (cited in Camp, 2012). Such theories of cognitive growth propose characteristic stages through which children move, postulating common features typical for each stage. Developmental stage theories have influenced both the characterization of less vs. more mature writers and the design of writing courses. Because of their effect on how writing is taught, it is possible that progress in writing in educational contexts follows the postulated stages of development (see Christie, 2012, for a detailed analysis of language and literacy development in educational contexts). Writing in L1 and SFL

One influential model of writing development was proposed by Bereiter (1980), who described writing development in L1 children and in adolescent SFL learners in five stages. According to this model, developing writers gradually integrate and automatize the necessary linguistic, intellectual, social, and cognitive skills. Once certain features are automatized, capacities are freed to integrate further features. The first stage, Associative Writing, is characterized by writing down one’s thoughts without planning or adhering to formal rules. Next, rules of style and mechanics are integrated in the Performative Writing stage. Here, genre, style and orthographic conventions are acquired and become automatized. In the third, Communicative Writing, stage, the focus shifts to social cognition and the addressee as writers become aware of the targeted readership. When critical and evaluative abilities are integrated, writers enter the stage of Unified Writing, which is followed by the last stage called Epistemic Writing, where reflective writing skills become relevant. Bereiter clarifies that learners can develop these aspects and stages in different orders, depending on their knowledge, personality and writing instruction. Moreover, earlier stages can influence later development, for example, associative writing can be used as a technique in all later stages. Equally, even if not all features of one stage become automated, writers can nevertheless move to the next stage, as long as mental capacities are free to integrate new features. We further discuss the cognitive processes underlying Bereiter’s (1980) and Bereiter and Scardamalia’s (1987) models in Chapter 3. When comparing children developing writing in their L1 with SFL learners, children are genre-​aware in their L1. For example, they know fairy tales and

18  The development of writing ability

how to tell stories, and they use patterns in their writing to which they have been orally exposed (Tolchinsky, 2016). They have enough linguistic ‘building blocks’ in their L1 to express those patterns. In contrast, adolescent or adult SFL writers are usually literate in their L1 and have automatized the different features proposed, for example, by Bereiter, but they may not yet have acquired the necessary linguistic means in the SFL. Furthermore, their ability to transfer their L1 writing knowledge appropriately to the SFL may vary depending on the genre and writing conventions in the different languages and cultures involved. Diagnostic potential

Stage models have implications for diagnostic assessment. Camp (2012, p. 969) cited Haswell (1991, p. 337), who defined diagnosis as “knowledge (gnosis) that looks through (dia) present performance to the underlying developmental forces that will bring about a different future performance”. In particular, the different features characteristic of the stages allow teachers to focus, both in instruction and diagnosis, on the most relevant aspects. This, in turn, facilitates targeted feedback with regard to aspects that need improvement. Thus, both teachers and learners get information about the current abilities and the next steps, as the stages depict a trajectory along which instruction and learning can be ordered. Camp (2012) reviewed literature on assessment dating mainly from the 1980s, where scholars tried to establish lists of characteristics depicting development and transition from one stage to the next, while others focused on applying developmental stages to teaching and instruction. Camp concluded that a developmentally informed conceptualization of writing was difficult to assess since it required context-​sensitive criteria capable of differentiating between cognitive maturity, adherence to genre conventions, and rhetorical effectiveness (see also Freedman & Pringle, 1980). Research on the development of writing has moved on to alternative conceptualizations of growth since the 1980s, which we will review below. Nevertheless, particularly in contexts where instruction is informed by developmental stages, diagnosis can benefit from focusing on those characteristic features that are used in instruction. Communicative and linguistic stages view of writing development

Approaches that define development in terms of communicative and linguistic features are probably more familiar to the language teaching and assessment community than the cognitively-​oriented theories of development. Descriptions of stages or levels of writing have been around for quite a long time, particularly in the form of rating scales. Among the best-​k nown general definitions of levels of writing that can be interpreted as descriptions of stages

The development of writing ability  19

of writing development are those published by the American Council on the Teaching of Foreign Languages (ACTFL) and the Council of Europe. The ACTFL Proficiency Guidelines were collaboratively developed between different organizations in the USA in the early 1980s that “brought to academic foreign language professionals a framework for understanding and measuring oral language ability” (Liskin-​Gasparro, 2003, p. 483). The original ACTFL (1982) framework divided proficiency into Novice, Intermediate, Advanced, and Superior levels which were further subdivided into a total of nine levels that were later increased to eleven, including a Distinguished level at the top (ACTFL, 2012). Interestingly, the 1982 version included not only generic descriptions that were meant to apply to any foreign language but also language-​specific definitions for French, German and Spanish. However, the most recent (2012) version contains only the generic descriptions. From the diagnostic assessment point of view, the language-​specific versions may have been more useful since they contained specific references to the linguistic features that were considered to characterize different levels in those languages. The ACTFL Proficiency Guidelines have been highly influential in foreign language education in the USA (Liskin-​Gasparro, 2003), but their use in other countries and continents has been limited. Therefore, we focus here on the other proficiency framework that is more extensive and that has been more widely used globally, namely the Common European Framework of Reference for Languages: learning, teaching, assessment (CEFR; Council of Europe, 2001, 2020). The CEFR is an extensive document that describes a wide range of competences, contexts, tasks, activities, texts, functions and other categories that characterize language use for communication (Council of Europe, 2001). The Framework aims to provide a common metalanguage to describe and discuss language use, learning, teaching, and assessment in SFL contexts. The CEFR defines six main levels of proficiency: A1, A2, B1, B2, C1, and C2. Table 2.1 lists the key characteristics of these levels for writing. The CEFR provides both holistic descriptions, such as the Overall Written Production scale, and analytical scales that illustrate specific aspects and contexts of writing. These include, for example, Creative Writing, Reports and Essays, Correspondence, Note-​ Taking, and Processing Text. The Correspondence scale (see Table 2.2) is typical of many analytical CEFR scales in that one or more of the main levels is split into two sub-​levels, and in that the top level C2 is not defined. The new Companion Volume to the CEFR (Council of Europe, 2020) complements the original scale system, adding scales for aspects such as mediation and plurilingual and pluricultural competences, and adding descriptors of new aspects to existing scales. For example, the original Correspondence scale was limited to personal correspondence, whereas the new scale also includes

20  The development of writing ability TABLE 2.1 Summary of CEFR Proficiency Levels for Written Production and Written

Interaction C2

Clear, smoothly flowing, complex texts, appropriate effective style, logical structure Present a case, critical appreciation of proposals or literary works

C1

Clear, well-​structured texts on complex subjects, expanding, supporting with subsidiary points, reasons, appropriate conclusion Relate to addressee flexibly, effectively Comment on correspondents’ news/​v iews

B2

Clear, detailed texts, variety of subjects, related to fields of interest Synthesizing/​evaluating information/​a rguments from different sources Developing argument Express news/​v iews effectively, relate to others

B1

Straightforward connected texts, familiar subjects of interest, shorter discrete elements linked into linear sequence Standard conventionalized formats Summarize accumulated factual information Write personal letters

A2

Series of simple phrases and sentences linked with simple connectors Short, simple formulaic notes, immediate needs

A1

Simple isolated phrases and sentences Ask for or pass on personal details in written form

Source: Council of Europe, 2001, pp. 26–​27, 83.

TABLE 2.2  A n Example of an Analytic Writing Scale in the CEFR: Scale for

Correspondence CORRESPONDENCE C2

As C1

C1

Can express him/​herself with clarity and precision in personal correspondence, using language flexibly and effectively, including emotional, allusive and joking usage.

B2

Can write letters conveying degrees of emotion and highlighting the personal significance of events and experiences and commenting on the correspondent’s news and views.

B1

Can write personal letters giving news and expressing thoughts about abstract or cultural topics such as music, films. Can write personal letters describing experiences, feelings and events in some detail.

A2

Can write very simple personal letters expressing thanks and apology.

A1

Can write a short simple postcard.

Source: Council of Europe, 2001, p. 83.

The development of writing ability  21 TABLE 2.3  Extract from the Augmented Correspondence Scale in the Companion

Volume to the CEFR Illustrating the New B1 Level Description B1

Can write personal letters giving news and expressing thoughts about abstract or cultural topics such as music, films. Can write letters expressing different opinions and giving detailed accounts of personal feelings, and experiences. Can reply to an advertisement in writing and ask for further information on items which interest them. Can write basic formal e-​m ails/​letters (e.g., to make a complaint and request action). Can write personal letters describing experiences, feelings and events in some detail. Can write basic e-​m ails/​letters of a factual nature (e.g., to request information or to ask for and give confirmation). Can write a basic letter of application with limited supporting details.

Source: Council of Europe 2020, p. 83.

formal correspondence (see Table 2.3 and Council of Europe 2020, pp. 82‒83). Furthermore, the revised scales contain more subdivided levels in the A2-​B2 range and introduce a pre-​A1 level that was not part of the original CEFR scales. Learners and teachers can use the fine-​g rained analytical CEFR scales for diagnosing, as suggested on page 38 of the CEFR, learners’ current stage, their profile of strengths and weaknesses, as well as for defining the next learning steps and goals. The CEFR can also be used to describe expectations of where learners should be, what proficiency levels they should attain at certain points in their education. Here, the CEFR has relevance for the development of curricula and educational standards. The attainment of such standards is usually evaluated via large-​scale tests or via classroom assessment, and the CEFR can provide the “bridge” to link curriculum and teaching goals with learning objectives and teaching/​ learning outcomes. Furthermore, it can connect educational monitoring and evaluation with classroom-​based teacher assessment and learners’ self-​assessment. The CEFR has particular relevance for the constructive alignment of external goals and standards, classroom-​based goals, learner-​d riven objectives, and the assessment, and diagnosis, of these objectives from different angles. Despite its widespread use in Europe and beyond, the CEFR has been criticized for a number of limitations. One fundamental problem with such descriptive scale systems is that they convey little information about the differences in the quantity and quality of performance at each proficiency level. In addition, they fail to address the differences in the amount of time and effort needed to progress from one level to the next. To illustrate, the lowest level in

22  The development of writing ability

the original CEFR scale (A1) defines a very limited set of language functions and contexts that are doable with a limited set of linguistic elements. To move up from A1 to A2 also requires a fairly limited amount of time and effort. In contrast, each successive level is more extensive in terms of both the quality and quantity of language proficiency, which is not captured by a scale such as that described in Tables 2.1 or 2.2. An ice cream cone would be a more appropriate way of illustrating how language proficiency develops (see, e.g., Alderson, 2007a; see Figure 2.3 that illustrates how a cone model might look like). At the higher end of the scale, it may also take longer to advance from one level to the next because of the expansion of quality and quantity but this is difficult to estimate precisely. Importantly, the CEFR does not provide answers to questions such as how to get from a particular level to the next. As is implied above, scale systems such as those in the CEFR simplify reality and can impose imprecise standards “in view of the complexity of languages and human behavior”; furthermore, these standards are based on professional consensus rather than empirical evidence (Cumming, 2009, p. 92). It is indeed the lack of empirical basis on learner language and its development in the CEFR level descriptions that have been considered its major weakness (see e.g., Harsch, 2014). The CEFR is not alone in this, however, as also the ACTFL scale has been criticized for lack of correspondence with findings of SFL writing development (Henry, 1996; Valdés et al., 1992). Furthermore, Hulstijn (2011) argues that the tasks and activities referred to at B2-​C2 on the CEFR demand intellectual and academic skills only attainable through higher levels of education. These criticisms for lack of empirical basis have resulted in research that seeks to shed light on the linguistic characteristics of the CEFR levels, as we will describe later in this chapter. To be fair, the authors of the CEFR were aware of the limitations of the descriptors of proficiency in the Framework. These included the fact that the CEFR is not based on theories of language learning (Council of Europe, 2001, p. 21) and that the scaling of the descriptors was based on teachers’ judgments (Council of Europe, 2001, p. 30) rather than on an investigation of learner language. Interestingly, the descriptors in certain writing scales could not be empirically scaled due to disagreement in the teacher judgments; rather, these descriptors were recombined from other scales (Council of Europe, 2001, p. 61). Since the CEFR and other such scale systems describe levels of functional proficiency, it is easy to interpret them as being definitions of how language skills develop stage by stage. It is also easy to jump to the conclusion that they must describe smooth, linear development (see Tables 2.1 and 2.2. However, it need not necessarily be assumed that development is a linear process, even if proficiency scales might suggest this. Here it is useful to distinguish between micro-​and macro-​level development and to consider the amount of language

The development of writing ability  23

use one is engaged in and whether the nature of the particular aspect of language or writing makes it possible to acquire it relatively fast or whether mastery typically takes a considerable amount of time and effort. When we view language more holistically at the macro level and consider a skill such as writing as a whole, development can easily be seen to happen in a linear fashion (see also Figure 2.3). Progress may be slow or rapid but as long as the learner is using the language somewhat regularly, for example, in school or at work, some development is likely to occur for most individuals. Development may be difficult to capture, however, if we assess language holistically, at the macro level, with reference to the broad CEFR levels. To detect development from one CEFR level to the next may require that the learner spends months or even years actively studying and using the language. Obviously, if the learner uses the language infrequently or stops using it altogether, development can halt completely, and proficiency can decline. Figure 2.1 illustrates how CEFR-​like scales can be used to track overall progress in language learning given enough time, several years in this example. The figure also shows how both linear and nonlinear development can be visualized at a very broad level with such scales. The three lines in Figure 2.1 can refer to two situations. First, they can indicate how SFL writing in a particular language, say Spanish, develops over a 5-​year period in three different learners. All three progress from (pre-​) A1 towards higher levels during their first two

FIGURE 2.1 An

Example of How Learners’ Linear and Nonlinear Development in Overall SFL Writing Skill Can Be Illustrated With Reference to the CEFR Levels.

24  The development of writing ability

years of study and use of Spanish. After that, however, their developmental paths diverge. One learner continues to use the language regularly and achieves a high level of proficiency after five years of study, as shown by the steadily rising line in Figure 2.1. The other two learners stop using Spanish after the second year and, therefore, their level of writing drops in the third year. One of them, however, resumes their use of the language which results in that learner regaining the previous level. In contrast, the other learner never returns to using the language again and gradually begins to lose their writing skills in Spanish, perhaps almost entirely, as has been demonstrated by research on language attrition (see, e.g., Ecke, 2004). An alternative way to look at Figure 2.1 is to view it as depicting what happens over the years to different languages that a single individual starts to learn in Year 1. The learner starts to study three languages, one of which they need more regularly than the others and which consequently continues to develop steadily. The fate of the other two languages is different as the learner no longer needs them for work, for example, and concomitantly, their ability in them decreases. The authors of this book have personally experienced something like what Figure 2.1 depicts. For English, which we have needed in our work, the development has been rather linear, even if the timescale may have been longer than the one displayed in the figure. However, for languages such as French or Swedish as a foreign language, the path has been rather similar to one or the other of the other two lines. Development at the micro level can differ considerably from macro-​level progress. Even if writing overall shows steady, linear development, specific aspects of writing can ‘behave’ in any manner. Some features can develop very rapidly, and learners achieve full command rather quickly, for instance, when a learner who knows the Latin alphabet acquires mastery of the Cyrillic alphabet or vice versa. A typical aspect of language that develops linearly, if the learner keeps using the language, is vocabulary size, which has been found to correlate with writing skill (e.g., Milton et al., 2010). However, features of language that have to do with (grammatical) rules, norms and conventions, can exhibit nonlinear development. For example, for English, the development of writing with reference to the use of articles, past tense–​ed, and plural–​s has been found to be highly individual, often nonlinear, and to continue even on very high levels of proficiency (e.g., Murakami, 2016; see also our discussion of the Dynamic Systems Theory in this chapter). However, the interaction between, or the combined effect of these various specific aspects of writing may result in the development of writing appearing to be linear, if writing is assessed holistically. The often very different developmental trajectories of particular components or aspects of writing cannot be detected unless we focus on them –​as we should if our aim is to diagnose such development.

The development of writing ability  25

Diagnostic potential

The preceding discussion of micro and macro-​level development found in the descriptions of the stages of writing development suggests that the potential for diagnosis of such stages needs to be considered at least at two levels. The overall levels of writing proficiency described, for example, in the ACTFL and CEFR frameworks can provide us with a very coarse diagnosis only, which may, however, be a useful first step. A holistic assessment of SFL writing with reference to the CEFR levels can yield a cross-​sectional macro-​level view of the state of the learners’ writing in comparison with their other SFL skills, with their writing in another SFL language or with their peers’ SFL writing ability. This is what diagnostic systems like DIALANG can do (see Alderson, 2005). As was described earlier in this chapter, it is also possible to track learners’ language development over time with the help of holistic writing scales but this requires that the assessments be done infrequently enough so that reasonable expectations of growth across broad proficiency levels, such as those in the ACTFL or CEFR, can be properly tracked. A more insightful approach to diagnosis is probably found at the micro level of analyzing linguistic and communicative SFL development. The CEFR contains several writing scales each focusing on specific types and contexts of writing. These scales yield more fine-​g rained information about the strengths and weaknesses in learners’ SFL writing skills. Writing personal correspondence or writing reports represent rather different types of texts with their different conventions and expectations of what is appropriate. Focusing on such narrowly defined writing tasks or genres has the potential for generating richer diagnostic information than assessing learners’ writing skills holistically. However, even the more analytical writing scales in the CEFR focus on functional proficiency, that is, they describe what different communicative tasks and activities in particular settings learners can accomplish and how well they can perform these. The scales do not describe what linguistic means learners use when they perform such activities (see e.g., Alderson, 2007a). A detailed understanding of such linguistic characteristics typical for the different proficiency levels, however, would enable us to design more level-​ appropriate curricula, textbooks and other teaching materials, as well as more precise diagnostic assessment procedures (see Hulstijn et al., 2010). Some research has explored the links between linguistic characteristics and the CEFR levels (Bartning et al., 2010; Hulstijn et al., 2010; Khushik & Huhta, 2020; Roberts et al., 2011). The aims of these studies have included the analysis of linguistic features of learner performance against the different CEFR levels, the examination of differences in linguistic profiles across different target languages and different L1 communities, and the investigation of which linguistic features could be used for diagnosis. Several language-​specific projects investigating

26  The development of writing ability

the linguistic characteristics of the CEFR levels have also emerged. These include the English Profile program for English (Green, 2012) and Profile Deutsch for German (Glaboniat et al., 2013). While collaboration between language learning and testing researchers had been called for earlier (e.g., Bachman & Cohen, 1998), it seems that the emergence of the CEFR has stimulated this research. The linguistic characteristics of the CEFR levels differ depending on the SFL in question: the grammatical structures that distinguish A2 from B1 in English as SFL are likely to differ from the structures that distinguish the same levels in Spanish as SFL. Furthermore, learners’ first language can affect the linguistic features of the same SFL even if the level remains the same: B1 in German may look somewhat different if it is based on analyzing Finnish-​speaking learners’ German vs. Dutch-​speaking learners’ German. These differences imply, for diagnostic assessment that refers to stages of linguistic development, that both the SFL and the learners’ L1 need to be considered in diagnostic assessment. Illustrative examples of research

We next report on selected studies that addressed the linguistic characteristics of the CEFR levels. We focus on the development of syntax, particularly syntactic complexity, which was at one time linked with ‘writing maturity’ and cognitive development, at least in L1 English (Hunt, 1965). Particularly a rise in the mean length of the clause and the T-​unit (defined as one main clause and all the subordinate clauses related to it) was considered a sign of increasing writing ability (see Camp’s review, 2012). Even if it is nowadays recognized that syntactic complexity may not be a sign of writing maturity and that syntactic growth needs to be related to the discourse, genre and context in which it takes place (Haswell, 2000), syntax is still an important aspect of language and has been investigated extensively. Syntactic complexity (SC) is generally considered to be a multidimensional aspect of grammatical knowledge (e.g., Ortega, 2003), that is, it comprises several features which do not necessarily develop at the same rate or whose relative prominence may change as writing improves. Different operationalizations of SC make it difficult to form a clear picture of its development, even in such a widely used SFL as English. According to available research, mean sentence length (number of words) appears to be the aspect of syntactic complexity that most systematically distinguishes CEFR writing levels, at least in FL English. Hawkins and Filipović (2012) found it to separate all CEFR levels from A2 to C2 in the English Profile data, whereas Alexopoulou et al. (2017) discovered the same for the A1/​A 2 to B2 range, but not for C1-​C2, and Khushik and Huhta (2020, 2022) for the A1 to B1 range. Other distinguishing features are the length of clauses and T-​units,

The development of writing ability  27

particularly in the A1 to B1/​B2 range (Alexopoulou et al., 2017; Gyllstad et al., 2014; Khushik & Huhta, 2020, 2022; Verspoor et al., 2012). Sentence level complexity (e.g., clauses per sentence or T-​unit) has also been found by some of the above-​mentioned studies to distinguish at least from A1 to B1. Recent research on SC has increasingly focused on phrasal level phenomena such as noun, verb or adverbial phrase density which can be measured, for example, as the number of times such phrases occur per 1,000 words in learners’ texts (Kyle, 2016; McNamara et al., 2014). Indices such as the number of words before the verb in the main clause of a sentence, called left embeddedness, and the number of modifiers per noun phrase are further examples of phrasal level phenomena that are of interest to researchers. For example, Green (2012) found noun phrase density and mean number of modifiers per noun to distinguish at the higher CEFR levels (B2 to C2), while Khushik and Huhta (2020, 2022) found several phrasal level indices to distinguish also at the lower CEFR levels. In general, SC was found to increase with writing proficiency in the studies just reviewed. Research on the relationship between SC and writing quality indicates that learners’ first language can affect the results. This has been found at least for English as FL. An extensive study of 28 SC indices by Khushik and Huhta (2020) compared two L1 language groups –​Sindhi and Finnish –​whose EFL writing had been rated on the same CEFR level (A1, A2 or B1). The study revealed that the majority of the indices differed between the L1 groups placed at the same levels. The results suggest that the linguistic basis of the CEFR levels varies depending on learners’ L1. Findings by Lu and Ai (2015) concur in that the authors compared EFL writing across several L1 groups and discovered that some groups differed in terms of SC at the CEFR levels B2 and C1. Furthermore, Banerjee et al. (2007), using the IELTS examination score bands rather than CEFR levels, found Spanish and Chinese EFL learners’ SC to differ when the learners’ overall writing level was the same. Khushik and Huhta’s (2020) study also found that writing at the lowest CEFR level was syntactically more similar across the two L1 groups than at the higher levels A2 and B1, which suggests that beginning SFL learners’ limited linguistic repertoire may force them to use a small number of the simplest syntactic constructions in SFL regardless of their L1. More research is, however, needed to shed light on this. In addition to syntactic complexity, the relationships between the CEFR levels and other linguistic features, particularly vocabulary, have been investigated. Studies by Milton and colleagues (Meara & Milton, 2003; Milton, 2009, 2010, 2013; Milton & Alexiou, 2009;) investigated vocabulary size in relation to CEFR levels for different foreign languages. Meara and Milton (2003) argue that learners need to know about 1,500–​2,500 lemmas to be placed at A2 in English as FL and about 3,250–​3,750 lemmas for B2, for example (a

28  The development of writing ability

lemma includes the base and inflected forms of a word such as “work”, “works”, “worked”, and “working”, but not worker as it represents a different part of speech). Milton and Alexiou’s (2009) study suggests that different foreign languages –​English, French and Greek in their case –​may require somewhat different vocabulary sizes to perform the communicative tasks associated with particular CEFR levels. Unlike for vocabulary size, it is difficult to estimate to what extent learners’ language structures might vary across languages when their proficiency level is the same. The reason for this is that at least the CEFR-​related studies have each focused on somewhat different grammatical features, partly because of the different structural properties of the languages in question but also because of the varied interests of the researchers. For example, Khushik and Huhta (2020) focused on syntactic complexity in English, whereas the research reported by Bartning et al. (2010) covered several languages and grammatical features. Thus, for example, Martin et al. (2010) examined morphological features typical of Finnish (e.g., local cases), whereas Carlsen’s (2010) investigation of L2 Norwegian focused on discourse connectives, and Forsberg and Bartning’s (2010) study covered subjunctive and other morpho-​syntactic features typical of French. When it comes to the L1 effect on the vocabulary size and CEFR levels, research by Milton and Alexiou (2009) provided mixed results. The vocabulary of Greek and Hungarian FL English learners was comparable at B1 and B2 but the Greeks appeared to reach C1 with a significantly smaller vocabulary size than the Hungarians. However, for the Greek and Hungarian learners of FL French no L1 differences were found across the whole A1–​C2 range. Besides vocabulary size, the relationship between word derivation and CEFR levels has received some attention. Mäntylä and Huhta (2013) and Leontjev et al. (2016) investigated Finnish and Estonian EFL learners and compared their performance on measures of English word derivation with their English writing ability based on two argumentative tasks and rated against the CEFR levels (see Huhta et al., 2014 for the rating procedure). Their findings suggest that word derivation scores increase rather steadily with learners’ CEFR writing levels but that the increase depends on the aspect of word derivation and that some aspects may develop more rapidly only above A2 or B1. To summarize, research on the linguistic basis of the CEFR levels has begun to yield results that have implications for diagnostic assessment. First, they show that different aspects of grammar and vocabulary knowledge do not progress at the same pace nor distinguish CEFR levels in the same way, even if many of them show linear development across the levels. Second, the findings can vary depending on the target language and, at least for syntactic complexity, also on learners’ L1.

The development of writing ability  29

Relationship between writing and other skills

One aspect of diagnosing SFL skills is to understand the components that a major language skill can be divided into and whether some of these components relate to strong vs. weak performance in that skill (Alderson, Haapakangas et al., 2015). We next provide a brief account of what research has to say about the associations between SFL writing and various other skills and summarize below relevant previous research, including lexical and grammatical knowledge as well as certain other linguistic and cognitive skills (more on how writing relates to reading and speaking on the cognitive level in Chapter 3). As was the case with SFL reading, research on the relative contribution of lexical vs. grammatical knowledge in SFL writing is somewhat contradictory. Baba’s (2009) review of research concluded that lexical knowledge played an important part in several different types of studies. For example, learners typically report they need vocabulary when writing, and in protocol studies –​ a research method that elicits verbal reports from participants –​learners have been found to pay a lot of attention to words and to have word-​related problems during writing. In addition, learners’ vocabulary size and depth (variety, richness, e.g., Koda, 1993; Milton, 2010, 2013; Zhou & Dai, 2016) and their ability to derive words (Leontjev et al., 2016) have been regularly found to correlate with SFL writing. However, the relationship between vocabulary and writing may be modified by such factors as the learners’ L1 and level of proficiency (S. Jarvis, 2002). As far as the type of writing task is concerned, the role of vocabulary may differ in independent vs. source-​based writing tasks, because in source-​based writing the source texts provide the learners with some of the lexis that they need. The availability of the words in the source text may diminish differences between learners’ vocabulary use in their writing. This may have contributed to the finding by Kyle (2016) that indices of lexical sophistication explained only about 8% of the scores given to TOEFL integrated (i.e., source-​based) essays whereas the variance accounted for in the independent TOEFL essay scores was about 35%. A similar lack of correlation between lexical diversity and writing quality was found by Baba (2009), whose Japanese-​speaking informants wrote a summary, another type of source-​ based writing task in English. However, it is a further indication of the complexity of the writing construct that Baba discovered several other dimensions of vocabulary knowledge such as size, depth, or definition ability to be associated with the quality of learners’ summaries. The takeaway for diagnostic SFL assessment from such studies is that the degree of independence from vs. dependence on source materials of the writing tasks needs to be taken into account in the diagnosis. Compared to studies on the relationship between lexical knowledge and SFL writing, research focusing on the role of grammatical knowledge is less

30  The development of writing ability

common. One such investigation is a large-​scale longitudinal study of teen-​ aged Dutch speakers of English by Schoonen et al. (2011) from grade 8 (age 13–​ 14) to grade 10 (age 15–​16), which explored the relative contribution of several types of skills and knowledge to writing in FL English. While they, too, found the knowledge of English vocabulary to correlate with writing in English, its independent contribution to English writing vanished when all the predictors were analyzed jointly. In contrast, learners’ English grammatical knowledge made an independent and substantial contribution to their FL writing even when all the predictors were jointly analyzed. Other SFL skills such as reading and listening are also often found to correlate with SFL writing. For example, Hirose and Sasaki (1994) found that English proficiency operationalized as vocabulary, grammar and listening explained about a third of the variance in the FL English writing of their Japanese informants. Schoonen et al. (2011) found that English spelling ability, measured separately from writing, was a significant contributor to the English writing quality of their Dutch-​speaking learners, although not quite as substantial as English grammatical knowledge. Whether SFL proficiency –​excluding SFL writing –​or first language proficiency plays a more important role in SFL writing ability is not clear as different studies come to different conclusions. Sasaki and Hirose (1996) found L1 Japanese writing to explain only 1.5% of learners’ FL English writing in contrast to about 33% accounted for by their English skills other than writing. In the Schoonen et al. (2011) study, however, the learners’ L1 Dutch writing skill was a substantially stronger predictor of their FL English writing than their English grammar and spelling skills. Results from the DIALUKI study in Finland (see Alderson, Haapakangas et al., 2015), more specifically from the longitudinal part of the study, found that learners’ FL English skills measured in grade 4 (age 10) correlated more strongly with their FL English writing in grade 9 (age 15) than their L1 Finnish skills (Huhta & Leontjev, forthcoming). A possible explanation for these differences may, again, be the different L1 backgrounds of the learners. Dutch and English are rather closely related Indo-​European languages whereas both Japanese and Finnish are unrelated to English. A further contributing factor here may be learners’ shared metacognitive knowledge between their L1 and SFL. As Schoonen et al. (2011, p. 66) hypothesized, the learners’ knowledge of text features, text structure, and writing processes was probably quite similar between Dutch and English, which could explain why the learners’ L1 Dutch writing ability was a strong predictor of their FL English writing. The implication of these findings for diagnostic assessment is that it is indeed important to consider learners’ L1 and also the linguistic and cultural distance between the languages when diagnosing their SFL writing. Finally, information about learners’ basic cognitive skills such as phonological processing ability, speed of lexical access, and working memory capacity can

The development of writing ability  31

be used to predict and explain strengths and weaknesses in SFL literacy skills (see Chapter 6 in Alderson, Haapakangas et al., 2015, and Chapter 3 in this book). Here we just briefly summarize the main findings of some recent studies. Schoonen et al.’s (2011) study included measures of learners’ speed of lexical retrieval and sentence construction in both L1 and SFL. They found that the speed of lexical retrieval in English made an independent contribution to the prediction of FL English writing, particularly towards the end of the 3-​year longitudinal study. In the longitudinal part of the DIALUKI study, learners’ cognitive skills, their performance on measures of the speed of lexical access and working memory collected in grade 4 (age 10) significantly predicted the learners’ writing quality in FL English even after five years in grade 9 (Huhta & Leontjev, forthcoming). Tests of basic cognitive skills appear, thus, to play a role in diagnosing SFL writing; their contribution may be clearest in the identification of weaknesses in writing that are related to learning difficulties in particular (see Chapter 3). Socially-​oriented theories of writing development

A different conceptualization of development emerges from theories related to learning as a social activity. In this paradigm, development is conceptualized as a pathway that is negotiated with and directed by communities of practice and the individuals in these communities where learning takes place (see Camp, 2012). In this paradigm, learning takes place as interactive, social, co-​ constructed activity and the developmental pathways can differ from learner to learner. Often theories and conceptualizations in this paradigm are based on or at least influenced by L.S. Vygotsky’s Sociocultural Theory (e.g., 1978; 1987). Sociocultural theory view of development

In Sociocultural Theory development cannot be understood separately from the social environment. Using Lantolf and Thorne’s (2007, p. 197) words: Developmental processes take place through participation in cultural, linguistic, and historically formed settings such as family life and peer group interaction, and in institutional contexts like schooling, organized sports activities, and workplaces, to name only a few. Notable is the conceptualization of the relationship between development and learning. Stage-​l ike theories (e.g., Bereiter, 1980), discussed earlier in this chapter, imply that learning can take place only when learners are developmentally ready for it. For Vygotsky, however, it is obuchenie, a dialectical unity of teaching and learning, that guides development. In the process or activity of obuchenie,

32  The development of writing ability

learners’ performance is co-​constructed with more knowledgeable others (e.g., Lantolf & Poehner, 2014; Poehner et al., 2018). Obuchenie is also intertwined with development, which “opens up opportunities for future teaching-​learning, which in turn leads to further development” (Lantolf & Poehner, 2014, p. 57). The notion of obuchenie will be important for our discussion of how Sociocultural Theory can inform diagnosis of SFL writing later in this section. This understanding of development stems from the central premise of Sociocultural Theory that individuals’ relationship with the environment is not direct, but is always mediated by means of symbolic tools, language being one of them. Using these symbolic tools, but also physical tools, we are able to regulate our functioning in the world and change it, and these tools are modified and are passed over to the generations to follow (e.g., Lantolf, 2000). This shift toward mediated development calls for focusing on how learners perform with assistance. The range of abilities that individuals exhibit with assistance from others is known as Zone of Proximal Development (ZPD; e.g., Vygotsky, 1978) and is one of the essential notions in Sociocultural Theory. This makes studying how education shapes human development one of the foci of sociocultural research. While Sociocultural Theory posits that development is not linear, it also predicts that pedagogical practices strongly shape learners’ developmental trajectories. It should be noted that Sociocultural Theory does not undermine the role of cognitive factors in human development, but it underscores the importance of the social and the cultural in it (Lantolf & Thorne, 2007). How does this approach understand writing and its development?

Writing understood through the lens of Sociocultural Theory is not an individual activity, but one to which others contribute either directly when collaborating on specific writing tasks or indirectly through previous experiences and interactions. That is, even a solitary writer is not alone, metaphorically speaking. Furthermore, each learner’s writing ability is developing following an individual rather than a common trajectory, which is socially mediated and in which meaning is co-​constructed, with teachers and peers in the context of the classroom being, in a way, co-​authors in learners’ writing (e.g., Prior, 2006). What shapes writing development becomes a focus of both L1 and SFL sociocultural writing research. The sociocultural perspective on development also addresses the problem with transfer delineated by Slomp (2012, p. 82; see the “Cognitive views on the development of writing” section in this chapter). Namely, knowledge, if considered static, should be transferred to different contexts mechanistically, which poses a problem for understanding reasons for negative transfer, for

The development of writing ability  33

example. From the sociocultural perspective, transfer refers to reconstructing internalized knowledge to accommodate it to new situations. Indeed, Wertsch (1998) discussed internalization as being able to reconstruct prior knowledge. From the point of view of diagnosis, the focus then should be on how this knowledge is repurposed to be used in the new context(s). Writing in L1 and SFL

Sociocultural research of writing takes many forms, from anthropological studies of literacy practices, to cross-​cultural psychological studies of literacy, to tracing writing practices across communities of practice (Prior, 2006, p. 54). In this section, however, we will limit the discussion of writing development to what, in our view, can inform and expand the diagnosis of writing in the classroom –​how mediation from others shapes and promotes an individual’s writing. As research reviews (e.g., Sala-​Bubaré & Castelló, 2017) suggest, in both L1 and SFL writing research in the sociocultural paradigm, writing processes rather than the products are the focus of attention. That is, the development of writing is about qualitative changes in the writing process stimulated by mediation and negotiation. Mediation is not limited to that from teachers but includes co-​construction with other learners, often in dyadic interactions (e.g., Wigglesworth & Storch, 2012). It is, perhaps, some of the actors who shape writing development that are different in L1 and SFL writing. In L1 writing, these include parents, who direct children’s early writing practices, which brings into focus, for example, fathers’ and mothers’ perceived roles and unique family guidance styles in the development of children’s literacy (e.g., Aram, 2010). These actors are different in SFL writing, so it is more feasible to concentrate on them in SFL diagnosis. From the sociocultural point of view, learners’ writing develops as learners internalize ways, norms, and values of expressing themselves in writing from others, namely teachers in the classroom. Through this internalization, learners gradually seize control over their performance. Therefore, assistance from teachers becomes of paramount importance for SFL writing development, as this assistance guides development. Diagnosis can inform this guidance by revealing what kind of assistance learners may benefit most from. There have also been broader sociocultural conceptualizations of writing and its development. Applebee (2000), for example, proposed to conceptualize writing as participation in social practice, in which individuals develop their own voice in a variety of contexts in order to use it to contribute to and change these contexts. We will discuss one such broader perspective in the “Development of expertise” section in this chapter.

34  The development of writing ability

Diagnostic potential

In this section, several points emerging from the earlier discussion are useful to reiterate. To start with, since development is a mediated process taking place between learner and environment /​others, it is important to find out what the learner is able to do with assistance from others, not just the learner’s unassisted performance (Vygotsky, 1987), as what learners can do with the help of others today they are able to do independently in the future. That is, diagnosing learners’ Zones of Proximal Development is as important as diagnosing their actual (unassisted) abilities. This conceptualization calls for assessment procedures that embed ways for directing learners to perform beyond their unassisted level of abilities. In other words, finding out how to push learners’ development forward becomes an objective of diagnosis. These objectives have been pursued in dynamic assessment (DA), the purpose of which is to identify learner abilities in the process of development by allowing the diagnoser also referred to as the mediator, to intervene whenever the learner encounters problems (e.g., Poehner, 2008). In other words, DA builds on the notion of obuchenie, merging teaching, learning, and assessment into one development-​oriented process. The forms of assistance in DA, regardless of whether they are standardized (the so-​called interventionist DA) or emerge in interaction with the learner (interactionist DA), are often provided in such a way that they gradually become more explicit and detailed. This is inspired by the seminal paper by Aljaafreh and Lantolf (1994). The contribution of this study was that the tutor in it did not rely on a single assistance strategy but systematically alternated between different forms of assistance with the aim of identifying minimum assistance needed by the learner to self-​correct the errors. On the implicit part of the assistance scale, the presence of the tutor creates a ‘collaborative frame’; on the explicit part of it, the tutor, rather than the learner, is responsible for the performance. Two further principles on which dynamic assessment is built are reciprocity and transcendence, both derived from the work of Feuerstein (e.g., Feuerstein et al., 2010). Reciprocity is learner responsiveness to mediational moves (e.g., Lidz, 1991), a basic principle of mediation during dynamic assessment (Poehner & Leontjev, 2020). Transcendence in DA refers to both the learner and the mediator building on the insights obtained in previous situations and interactions, for example, other tasks or previous assessment sessions and applying them to new encounters. Incidentially, the concept of transcendence aligns with the notion of action in the final stage in the diagnostic cycle we discuss in more detail in Chapter 4. Illustrative example of research

Here, we describe a DA study by Shrestha and Coffin (2012), who examined two undergraduate students’ academic writing development, one having English

The development of writing ability  35

as their L1 and the other as an additional language. There were two sessions involving learners’ writing a text independently as the first step. The mediation was provided in an asynchronous manner via e-​mails and wikis and followed the interactionist approach to DA. That is, the mediation emerged in interaction with the learners. The mediation targeted such broad criteria as the use of source materials, academic style, quality of presentation, and grammatical accuracy. The authors studied the mediation that was provided to the learners and their responsiveness to it. They found that during the second DA, whose goal was to trace transcendence, learners were more self-​regulated in their writing as evidenced in the change of the frequency and explicitness of mediation and in an increase of learners monitoring their writing. The authors illustrated how the DA interactions between the learners and the mediator allowed for identifying areas where the learners struggled and for providing the right amount and quality of assistance to promote the development of the learners’ writing in those areas where weaknesses were identified. Development of expertise

Development of expertise is an example of a broader conceptualization of development than that perceived as happening in expert-​novice interaction. Notably, this perspective has its origin in cognitive psychology, in research on what differentiates a novice from an expert (Camp, 2012; Perkins & Salomon, 1989). One of those researchers who has forcefully argued for the importance of social factors in the development of expertise is Alexander (e.g., 2003), whose Model of Domain Learning (MDL) we will outline in the following. The MDL depicts development in the academic domain and conceptualizes development as interplay of knowledge, strategic processing, and interest, taking place in three broad stages: acclimation, competence, and proficiency (Alexander, 2003): • Acclimation stage: learners’ knowledge is fragmentary, strategies are limited to surface-​level, such as skipping unknown words, and interest is induced by the immediate context rather than coming from the learners. • Competence stage: a qualitative and quantitative increase in domain knowledge, which is also now more cohesive, the use of both general and deep-​processing strategies, such as comparison of several sources, and an increase in personal interest while the situational interest still plays an important role. • Proficiency stage: broad and deep domain knowledge with which experts themselves contribute to the domain, transforming it; almost exclusively deep processing; mostly personal interest. The differences between these three stages in MDL imply that instruction to learners should vary depending on the stage that learners are diagnosed to be at.

36  The development of writing ability

How does this approach understand writing and its development?

One often-​cited model of writing development which incorporates MDL as its developmental dimension is Beaufort’s (2007) model. The model includes four interrelated and interacting knowledge types –​subject-​m atter knowledge, genre knowledge, rhetorical knowledge, and writing process knowledge –​with the fifth, discourse community knowledge, superordinate to the other four. To elaborate, subject matter knowledge involves both the knowledge of topics and concepts and the ability to use, apply, and transform this knowledge in the discourse community. Genre knowledge refers to knowledge of features of particular genres highlighting linguistic or structural features of language used in specific contexts. The discourse community is important here, as the features within one genre can vary from one discourse community to another (see Beaufort, 2012). Rhetorical knowledge in Beaufort’s model refers to knowing rhetorical peculiarities of writing with the aim of rendering a specific purpose to a specific audience. Writing process knowledge refers to metacognitive knowledge of processes involved in discipline-​specific writing. Finally, the overarching domain of discourse community knowledge involves knowledge of values, goals, and meta-​d iscourses specific to particular disciplines. Beaufort (2007; p. 22) argued that writing expertise is about “becoming engaged in a particular community of writers who dialogue across texts, argue, and build on each other’s work”. Discourse community knowledge is, therefore, vital for the development of all the other types of knowledge. By putting discourse community knowledge at the heart of the model, Beaufort underscores the social nature of writing. The development in the domains of knowledge in Beaufort’s model, as we mentioned earlier, is conceptualized with reference to Alexander’s (2003) MDL stages. We should note that Beaufort (2007, p. 95) acknowledged the vastness of its scale, noting that the case study participant, by the end of their college years, was probably still far from becoming an insider in the discourse community of historians (one of their fields), being in their acclimation stage. The same was true for their development in the rest of the knowledge domains. Overall, the development of writing expertise implies transfer of knowledge from one context to a different one, that is, application of knowledge in a different discourse community. For this to happen, learners’ awareness should be directed towards noticing the similarities (and differences) between what they have learned before and what the new context requires (Beaufort, 2007; Perkins & Salomon, 1989; Smit, 2004). This implies that different discourse communities and their expectations, rules, and values should be made visible to learners in writing courses (Beaufort, 2007; Smit, 2004). Particularly at the beginning stage of expertise development, learners need guidance within the different domains of knowledge. This guidance should be provided within the learner’s ZPD, as Beaufort (2007) suggested.

The development of writing ability  37

Novice writers also require guidance in being strategic in their writing. Surface-​level, or “weak” strategies (Smit, 2004) allow for extending previous knowledge to novel contexts, as they are applicable to a number of contexts, whereas deep-​processing, or “strong” in Smit’s words, strategies are usually context-​specific. As Smit (2004) noted, as learners’ expertise grows, they eventually find surface-​level strategies of little use, requiring instead strategies building on knowledge of characteristics of writing in a specific discourse community. This chimes with the qualitative changes in strategy use reflected in Alexander’s MDL model and implies that instructing learners in the use of strategies should depend on where learners are in their development, as teaching novice writers deep-​processing strategies will probably be unproductive. Finally, we would like to mention the potential role of intrapersonal factors such as interest, and exemplify it with reference to the interest aspect in the MDL and to Beaufort’s model. It appears that while acknowledging this aspect of development, Beaufort (2007) placed much less emphasis on intrapersonal factors than on their knowledge. The interest aspect in Alexander’s (2003) model, however, plays an important role, suggesting that in the acclimation stage, where Beaufort’s participant stayed during their college years, motivation from others can push learner development forward. As Slomp (2012, p. 84) suggested, intrapersonal factors could have yielded insights into Beaufort’s participant’s writing. Writing in L1 and SFL

While the development of writing expertise is meaningfully discussed with reference to becoming an expert writer in one specific discourse community, it does not mean that an expert writer in one discourse community is an expert in –​or can transfer their expertise to –​another discourse community. Equally, being an expert in a community in one language, say L1, does not automatically mean that this expertise can readily be transferred to an SFL. With regard to the development of SFL writing expertise, language proficiency becomes the most relevant factor, unlike for the development of writing expertise in L1. Weigle (2005), for example, based on a synthesis of previous research, identified a threshold in learners’ SFL proficiency below which they are unable to transfer their L1 writing practices to their SFL writing. That is, before learners are able to effectively take such domains as genre into consideration in their writing, their proficiency needs to have reached this threshold. This coincides with similar findings in the realm of SFL reading (see Alderson, Haapakangas et al., 2015, pp. 70–​71). Diagnosis of SFL writing expertise development informed by Alexander’s stages is even more difficult than that of development of L1 writing expertise, as reaching the SFL proficiency threshold increases the time required by

38  The development of writing ability

SFL writers to become experts in a discourse community. Beaufort (2007) reported that the learner whose development the author traced was still at the acclimation stage at the end of two years of college. It is logical to assume that this process takes even longer in developing writing expertise in an SFL discourse community. Thus, developmental continua containing finer-​g rained stages have been suggested. Carter (1990), for instance, proposed four stages that Weigle (2005, p. 138) summarized as follows: • beginners, who use global strategies, not specifically linked to any domain; • advanced beginners, who use some strategies linked to writing; • competent writers, who use strategies applicable to different writing domains; • experts in a specific discourse community. A factor that seems to be important in the development of writing expertise in SFL and also in L1 is metacognitive knowledge (see, e.g., Weigle, 2005; Beaufort, 2007, for SFL and L1 respectively). Weigle (2005), building on Roca de Larios et al. (2002), underscored the importance of teaching SFL learners metacognitive strategies such as setting goals, evaluating, and monitoring, particularly since the use of these strategies is not constrained by such factors as SFL proficiency. Considering strategy types in Alexander’s model, Weigle (2005) refers to surface-​level strategies which she regards as useful to be taught in the early stages of the development of expertise. Diagnostic potential

Since different stages of development focus on different aspects, diagnostic assessment can focus on those aspects most relevant for a certain stage. For example, assessment at the early stage of development of writing expertise can focus on learners’ use of surface-​level strategies and transfer of general concepts and skills, or on the transfer of learners’ L1 writing skills to novel SFL writing contexts, as suggested by Beaufort (2007), although the author did not elaborate what exactly this assessment could look like. We suggest that such diagnostic assessment can be informed by the sociocultural conceptualization of transfer as reconstruction of knowledge (“Sociocultural theory view of development” section in this chapter). The concept of adaptive transfer proposed by Depalma and Ringer (2011; 2013) can be a useful starting point for such an assessment. To elaborate, they argued that transfer in SFL, be it across disciplines or from L1 to SFL, does not simply involve re-​using prior knowledge, but involves reshaping it in order to apply it in a novel context. This conceptualization aligns with the sociocultural concept of internalization, as discussed in the

The development of writing ability  39

“Sociocultural theory view of development” section in this chapter. Depalma and Ringer (2011, p. 141) further proposed that adaptive transfer is: • dynamic, meaning that prior knowledge can be reshaped to novel situations; • idiosyncratic, meaning that transfer is unique to individuals; • cross-​contextual, as it occurs when learners recognize similarities between/​ among different contexts; • rhetorical, as understanding the context, the audience, and the purpose of writing is required; • multilingual/​plurilingual, as learners can draw from a variety of linguistic resources, including for the purpose of changing the context; and • transformative, meaning that learners have the opportunity both to shape and to be shaped by practice. For diagnosis, the concept of adaptive transfer implies that rather than assessing whether learners apply certain strategies and knowledge in general, in SFL writing, diagnostic assessment should focus on assessing how learners reshape and internalize strategies and knowledge. Reflective journals in which learners contemplate the process of writing directed by prompts that address different aspects of adaptive transfer can be useful diagnostic tools, allowing, for example, for the discovery of factors that promote or impede transfer. In addition, reflective essays, as proposed by Leaker and Ostman (2010), could be used for targeting far transfer, namely transfer to novel contexts. For near transfer assessment, that is, assessment of how well learners are able to transfer newly acquired knowledge and understandings to a context which is similar to but not exactly the same as the context in which the knowledge and understanding emerged, dynamic assessment procedures seem promising (see the “Sociocultural theory view of development” section in this chapter). As research outlined in the following subsection suggests, it could be difficult to assess all the knowledge domains which are part of expertise due to the scope and complexity of development within and across these domains. On the other hand, as Beaufort (2007, pp. 132–​133) discussed, assessing only some of them may result in a distorted picture of development, not least because these knowledge domains are interrelated and develop interactively. From the perspective of diagnosing SFL writing, therefore, in order to get an accurate picture of learners’ strengths and weaknesses, assessing all different knowledge domains would be ideal. By and large, it appears that using only learners’ written performance samples could be inadequate, as it does not allow deeper insights into learners’ understanding of the discourse community, including rhetoric, peculiarities of different genres, and writing processes in these communities; neither does it offer insights into learners’ abilities to

40  The development of writing ability

transfer prior knowledge to new context(s). Here, portfolio assessment of learners’ work on a range of tasks and contexts could help, as could the use of learners’ reflective journals directed by prompts. Additionally, learners can be asked to analyze texts written by experts in particular disciplines; the analyses can be used as a kind of indirect assessment of learners’ discipline-​specific knowledge. One question remains, though –​whether all these different knowledge domains should be assessed simultaneously, which could be impractical, or in succession, and if so, in which order should the interrelated domains best be addressed? Beaufort (2007) underscored the importance of sequencing problems from easier to more complex. The exact nature of such sequences and the way that development of one domain may trigger the development of other domains are interesting questions for future research. Finally, the degree of learner socialization in a discourse community could be considered. As suggested by Beaufort (2000), sequencing tasks based on their importance for the discourse community in question could be useful for assessing whether learners are able to take responsibility for contributing to the discourse community, thereby assuming active roles. Here, dynamic assessment principles could be used to diagnose the degree of learner responsibility for their writing and to mediate learners’ awareness of the goals and values of the specific discourse community. Illustrative examples of research

What is common to many studies on the development of writing expertise is that they generally concentrate on only one or a few aspects of Beaufort’s model. This is likely due to the aforementioned complexity of the domains of expertise development. For example, Leaker and Ostman (2010) examined writing at the crossroads of vocational experience and the academic context; they studied how college students applied rhetorical knowledge obtained outside the academic context in reflective academic essays. In a similar vein, Reiff and Bawarshi (2011) reported on whether and how students were able to negotiate and build upon the genre knowledge acquired previously in school, work, or discourse communities in the academic context. They found that students who were ready to question their expert status and genre knowledge, adopting the status of novice, were better able to successfully build on their prior knowledge by adopting strategies to be used in the new academic writing contexts. Roozen (2010) explored how extra-​d isciplinary process knowledge informed one student’s academic writing process, as this student repurposed and modified this knowledge for the new context. Just like Beaufort (2007), the author argued for raising learners’ awareness about the range of practices they

The development of writing ability  41

have at their disposal. The author also argued that the student’s repurposing of their practices for English studies pushed their development in subject-​m atter knowledge, genre knowledge, rhetorical knowledge, and discourse community knowledge. A similar picture emerges in SFL writing research. Cheng (2007), for example, studied the development of genre awareness of one Chinese SFL English graduate student of electrical engineering. Having analyzed the learner’s written performance samples, the author suggested that to fully understand learners’ genre knowledge, the ways in which learners reconceptualize their genre awareness should be traced, including how their awareness of rhetorical considerations and the writer/​reader interaction develops. One further line of research is that of exploring how learners socialize in new discourse communities. Ferenz (2005), for example, using learner interviews, studied how undergraduate and graduate SFL English students developed their professional networks. The author found that learners’ goals and identities shaped their professional networks, which in turn, impacted the development of their SFL English literacy skills. Those learners who were academically-​ oriented moved towards adopting the academic discourse community literacy practices and assumed an active role in this community, whereas those who were not so inclined did not show such adaptation. The study emphasizes the role of learner goals in shaping their writing expertise; the author suggested that discussing learners’ goals with them could be important for getting insights into their potential development of writing expertise. Other socially-​oriented theories of development

As we stated at the beginning of this chapter, our aim is not to provide a comprehensive discussion of all existing theories of language development. Instead, we opted for discussing those theories that can provide important insights to SFL writing diagnosis. Hence, we now briefly mention other socially-​ oriented theories of development that can potentially inform SFL writing diagnosis, devoting slightly more space to ecological approaches to development. Ecological approaches to development draw on the biological concept of ecology and focus on the relationship between the individual and the environment. Bronfenbrenner’s (e.g., Bronfenbrenner, 1979; Bronfenbrenner & Morris, 2006) model conceptualizes these relationships as nested ecosystems: • microsystem: patterns of relations in the individual’s immediate environment • mesosystem: linkages between settings in which the individual is directly involved

42  The development of writing ability

• exosystem: links between contexts where the individual is directly involved and those where they do not have an active role • macrosystem: values, attitudes, and ideologies. The model is illustrated in Figure 2.2. The value of the model is in the interrelations between the systems (van Lier, 2006, p. 201), which can yield insights into the different contexts and factors that shape learners’ development, including those beyond formal schooling, such as parents assisting with homework and after-​ school programmes in neighbouring schools shaping learners’ writing development (Wilson, 2013). Ecological approaches have informed such assessment frameworks as Learning Oriented Assessment (LOA), which places learning at the core and as starting point for exploring the interrelations between learning, assessment, and instruction (Purpura, 2004; Turner & Purpura, 2016). Purpura (e.g., Turner & Purpura, 2016, p. 261) conceptualizes learning-​oriented assessment as a set of interrelated dimensions: • contextual: how the context is shaped by sociocultural, educational, and political factors;

FIGURE 2.2  Ecological

Systems Theory Perspective on Development (based on Bronfenbrenner, 1979).

The development of writing ability  43

• elicitation: how learners’ knowledge, skills, and abilities are elicited; • proficiency: how learners’ knowledge, skills, and abilities change over time; • learning: how learning is conceptualized and how assessment fosters learning; • instructional: how assessment is arranged and how information obtained in assessment is acted upon; • interactional: how LOA is structured interactionally; • affective: how assessment yields insights into such factors as learner engagement. The LOA framework was highly informative for our understanding of factors involved in SFL writing diagnosis (see Chapter 4). Regarding diagnostic implications, while it might be challenging to consider all possible factors in any one diagnostic assessment, ecological approaches can nevertheless serve as a backdrop upon which to select the most relevant factors. Particularly in the classroom context, diagnosis should focus on such factors that can effectively be acted upon, while a more comprehensive set of factors could provide useful insights for diagnostic research purposes. Self-​psychology theories of personal growth, for example, as discussed by Camp (2012), underscore the importance of the development of the writer’s voice, as this writer develops their relationship with the discipline or profession. Camp (2012) suggested that reflective self-​ a ssessment in the form of, for example, portfolios, can be a useful tool for capturing such growth. Overall, orientation to the process of writing becomes the key. Self-​psychology theories added to our discussion of the development of expertise in writing (see the “Development of expertise” section in this chapter), especially the way it can inform SFL writing diagnosis. Contextual theories of transfer are informed by the sociocultural paradigm and are characterized by the dynamic nature of knowledge, contexts, and the individuals in these contexts. Individuals repurpose and reconstruct their knowledge to adapt it to new contexts, and by this, also shape these contexts. Transfer plays a central role in the development of writing, and we have underscored this role by discussing the role of task-​ oriented transfer above where relevant, particularly in the sections on Sociocultural Theory and Development of Expertise in this chapter, where we discussed the model by Depalma and Ringer (2011; 2013). Here, we want to stress the ability to transfer knowledge to novel contexts, which Slomp (2012) called far transfer. For the diagnosis of writing development, this implies the design of tasks that elicit such far transfer. Such tasks should (1) be placed in contexts novel to the learners, (2) elicit learner reflection, as suggested by Slomp (2012) and Wardle (2007), and (3) require learners to generalize prior knowledge (Wardle, 2007).

44  The development of writing ability

Complex Dynamic Systems view of writing development

Complex Dynamic Systems Theory (hereinafter DST) has its origins in mathematics and was introduced to SLA by Diane Larsen-​Freeman (1997). Since then, it has been used as a conceptual framework in an ever-​g rowing number of SLA studies. Some scholars consider it to be a ‘meta-​theory’ that can potentially, albeit with some modifications, resolve differences between other theories of (language) development (e.g., de Bot et al., 2007; Karimi-​ Aghdam, 2016), as it takes into account both the cognitive and the social in language development (e.g., Larsen-​Freeman, 2002). Considering that both the cognitive and the social are given prominence in DST, we decided to present an outline of this theory in a subsection separate from the cognitively-​and socially-​oriented theories of development. Larsen-​Freeman (1997, p. 142) defined dynamic systems as “dynamic, complex, nonlinear, chaotic, unpredictable, sensitive to initial conditions, open, self organizing, feedback sensitive, and adaptive”. This summarizes the core principles of DST. To start with, an inherent property of any dynamic system, be it a human brain, the weather, a language learner, or a language classroom is its change over time (de Bot et al., 2007; Larsen-​Freeman, 1997). Moreover, a dynamic system consists of a set of interconnected variables, is a part of larger complex dynamic systems, and, in turn, consists of subsystems. All subsystems within one dynamic system, too, change over time and interact with each other in complex ways. Even a minor change in one subsystem can result in drastic changes in the larger system, an effect that is known as the butterfly effect. Related to this is the nonlinear relationship between the initial size of the differences between two systems such as two learners and the differences emerging between them in the long run, meaning that minor initial differences, for example, in uptake of instruction or motivation, can result in profound differences between two learners later on; likewise, the effect of large differences can become miniscule over time (see de Bot et al., 2007). That is, dynamic systems are chaotic, though not random, in that they can change unpredictably (see also Larsen-​Freeman, 2012). De Bot et al. (2007) noted that for development to happen, there should be at least a minimum amount of both internal, for example, capacity to learn or motivation, and external, such as external reinforcement or resources. These resources can be compensatory, meaning that a lack of motivation can be compensated by longer time, for example. At the same time, different subsystems compete for these resources. The latter chimes with Bereiter’s (1980) argument for the need for the automatization of certain features of writing, which frees resources for the development of other features, as outlined above in the “Cognitive stages view of development” section in this chapter.

The development of writing ability  45

By and large, it transpires from the literature on the application of DST to SLA that language development in this perspective is not fully explicable by a cause-​and-​effect relationship between relevant variables. Rather, while depending on the previous state of development, the development of the system is emergent, and the relationship between variables leading to this development is reciprocal. From the point of view of education, the above implies that the same instructional procedure can result in very different developmental trajectories. The system –​here, a learner’s SFL proficiency –​will find an equilibrium state in its development at one point, only to change again at some later point due to the complex interactions between the system and its environment, including interactions between numerous variables. These equilibrium or attractor states can be similar across learners, directed, for example, by common instruction, as we will discuss later in this section. The final two properties of DST are self-​adaptation and self-​organization, meaning that the system restructures itself internally by collecting and adapting to the feedback from the environment, thus becoming more complex with time. Finally, the perspective of DST on variation is noteworthy. Variation in DST is not unnecessary noise but essential for yielding useful insights into the developmental process, for example, for finding out whether variation is due to variables external to the system, those that are a part of it, or both (e.g., Larsen-​ Freeman, 2012; Verspoor et al., 2004). Therefore, DST postulates that while a generalized picture of development aggregated from several learners may seem like a rather linear process, individual trajectories may be very heterogeneous. In DST, it is precisely this variation that is of more interest than the generalized development picture. How does this approach understand writing and its development?

The understanding of writing development in DST emerges from its general principles. To start with, a change in one subsystem leads to a change in other subsystems, and potentially in the larger system. Wind (2013, p. 94) exemplified this by outlining how a change in the lexical system –​a learner acquiring the word “born” –​ results in a change in the syntactic system, as the learner then has to acquire its grammatical use. In a similar way, the whole lexical system may be reorganized when a learner acquires a synonym (e.g., “gulp”) of a high-​ frequency word (“drink”), as with this, the need for differentiating between synonyms is introduced (Wind, 2013). Another DST principle readily applicable to the development of writing is its nonlinearity, meaning that the relationship between different subsystems involved in writing within the same learner is ever-​changing across different

46  The development of writing ability

points in time. The nonlinearity is also evidenced in the variability across learners even if the average performance on the group level implies a gradual development (see the studies we discuss in the “Communicative and linguistic stages view of writing development” section in this chapter). From the perspective of DST, writing is a highly individual process in which inter-​and intrapersonal variability is a characteristic part of development, being caused by complex interactions of internal and external factors unique for each learner and the environment within which they learn. Diagnostic potential

DST can offer several insights for diagnosis. The most evident of them is that diagnosis should be longitudinal. Several assessments are required to sketch developmental trajectories of learners’ emerging abilities and facilitate understanding of what factors shape learners’ development. To obtain insights into how the interaction of different factors shapes learners’ development, relatively dense longitudinal data should be collected. Some computerized diagnostic instruments that we discuss in Chapter 4, at least potentially, allow for collecting such data while addressing the issue with practicality of such diagnosis. Furthermore, since DST brings into focus variation in individual learners’ performance, a closer look at individual changes across different points in time could reveal whether individual learners follow the instructional path as provided to the whole group, in order to identify aspects where individual learners may fall behind the group. Plotting individual learners’ trajectories with regards to the development of particular aspects targeted by the diagnosis, for example, syntactic complexity in SFL writing, could be a first step in gaining insights into reasons behind inter-​and intra-​individual variation. With regard to the relationship of the different subsystems of SFL writing such as vocabulary and syntactic subsystems with the development of the overall writing ability, DST predicts that the strength of these relationships changes over time, transforming from the competitive stage to the supportive stage and back (see Robinson & Mervis, 1998, for an L1 example). Tracing these changes can yield important insights for diagnosis allowing for adjusting instruction. Insights from the growing number of SFL writing studies based on DST can inform diagnosis. For example, insights into longitudinal interrelations of variables involved in SFL writing can facilitate the selection of the most relevant variables, as well as their operationalization into diagnostic instruments and approaches. Illustrative examples of research

Much of the research on writing development in SFL from a DST perspective has investigated longitudinally changes in the relationship between the

The development of writing ability  47

constructs of complexity, accuracy, and fluency, and individual variables within these constructs. Two main methods have been used, often in combination. The first is using computer simulations to model iteratively how a given system, for example, a learner’s mental lexicon (see Meara, 2004), changes depending on the previous state, resources, and interaction among external and internal variables. The second method involves studying real longitudinal data on changes in various variables and their interrelations over time. Yang and Sun (2015), for example, studied the interrelations of three constructs of complexity, fluency, and accuracy in writing in Chinese, French, and English of five L1 Chinese learners of English and French. During the period of ten months, both the groups’ and individual learners’ data indicated large variability and fluctuations in all three constructs and all three languages with a strong competitive relationship between accuracy and lexical complexity in all three languages. Another study that explored the development of writing of multilingual learners was De Angelis and Jessner (2012), who found reciprocal relationships between the development of the participants’ second (German) and foreign (English) languages. The learners’ L1, Italian, also impacted on both their second and foreign languages. This suggests that the development of SFL writing is influenced by learners’ L1 and other SFLs. Overall, similar findings emerged in SFL writing DST research (e.g., Larsen-​ Freeman, 2006; Rosmawati, 2016; Tilma, 2014; Wind, 2013). It is interesting to note that differences among learners might not transpire until interactions between variables are traced longitudinally, as was demonstrated by Tilma (2014). Tilma (2014) was able to establish differences between two learners in the relationship between several complexity and accuracy measures. The Finnish as FL learner used gradually more complex structures over time but also made more errors, while the learner studying Finnish as SL made fewer errors as the complexity in their use of structures increased. This led Tilma to hypothesize that these differences could be explained by the different foci in instruction that these two learners received, i.e., a focus on form for the FL learner as compared to a focus on meaning for the SL learner. These findings underscore the importance of considering inter-​learner variability in diagnosis. Commonalities in the development of writing

Considering the theories of the nature and development of writing discussed in this chapter, it appears that development is an individualized process in which individual learners follow their unique trajectories and in which they interact and negotiate ways of development with their social environments and communities of practice. However, when we look at groups of learners, it is possible to detect commonalities and shared paths along which writing ability as seen across different learners develops. The main reason for these

48  The development of writing ability

commonalities lies in the fact that writing development mostly takes place in educational contexts where teachers, curricula, and textbooks shape the order in which writing and its subsystems are taught and, thus, develop. Hence writing development in educational contexts is an interaction of intra-​individual factors, contextual factors, and instruction –​via teachers or a community of practice (see also Christie, 2012). Hence, while we acknowledge the individual trajectories along which individual learners develop, as proposed foremost in the socially situated theories, we argue that it is also possible to establish development across groups of learners, as depicted foremost in the cognitive and linguistic views on development. With regard to diagnosing development, this opens at least two different windows, one for tracing the development of individual learners, and one for tracing the development of a group. Commonalities across learners are usually depicted by stage models of development (see the “Cognitive views on the development of writing” section in this chapter). These models may, indeed, have been influenced by instruction, as is the case for the CEFR. The CEFR scales were calibrated in an educational context, more specifically in Switzerland by SFL teachers. Thus, it is fair to assume that instructional progression and sequencing has influenced the sequencing of the CEFR descriptors that constitute the progression of the CEFR proficiency levels. Hence, we need to be careful to not interpret such levels or stages as a “natural” order in which SFL writing develops, but rather as plausible models to describe what is found in educational settings. The same holds true for findings from longitudinal studies in educational contexts, where the instructional progression will be one explanatory factor for commonalities of learner development. This need not be a problem for diagnosis, as long as the instructional context and its implications are given due consideration when designing diagnostic assessments and interpreting their results. As we will elaborate in Chapter 4, the first step in any diagnosis includes a careful analysis of the context, aims, and constructs. In this initial stage, appropriate models and approaches are selected. Often, a combination of different approaches may result in the best-​fitting theoretical understanding of what is to be diagnosed. Such a combination can also give us new and broader perspectives to SFL writing development. Figure 2.3 gives an example of what Bereiter’s (1980) stage model might look like if it were presented in a similar way as the CEFR levels are traditionally displayed. However, analogous to the model of expanding proficiency, both in terms of breadth and depth of the different dimensions of proficiency, that underlies the CEFR conceptualization of proficiency (Alderson, 2007a; de Jong, 2004), we have further modified Bereiter’s stages of writing development. The stages are depicted here as an expanding cone in which the early stages of writing are both quantitatively and qualitatively more limited than the stages at the higher levels –​something that

The development of writing ability  49

FIGURE 2.3  Two

Views on Developmental Stages in Writing (Left: adapted from CoE, 2001; Right: adapted from Bereiter, 1980; Slomp, 2012; Camp, 2012).

underlies the CEFR scales but is not obvious in the way the CEFR scales are typically presented. Such a combination may work for educational contexts in which curricula and instructional progression are aligned to the CEFR levels. The models and insights into writing development by Bereiter (1980), Camp (2012), and Slomp (2012) can inform the focus of the diagnosis, as well as the characteristics of expected texts that learners should be able to produce at different stages during their educational journey. This is but one illustrative example, and we are aware that different contexts require different approaches. Ultimately, the purpose of diagnosis over a certain period of development will determine what approach is most feasible. For example, if the aim is to gain diagnostic information about the

50  The development of writing ability

effectiveness of certain aspects of an instructional program, Bronfenbrenner’s ecological model may be most informative to capture all relevant factors, whereas Beaufort’s model may inform assessment targeted at specific writing contexts where the focus lies on developing expertise. Another important aspect to bear in mind is the notion of thresholds in development. In SFL contexts, where learners usually have mastered one language system (L1), we may find thresholds in development with respect to certain building blocks which one needs to master before one can produce a “text”. At the very beginning, a certain minimum requirement of vocabulary items may constitute such a threshold, followed by a grammar threshold at a later stage, as suggested by Alderson (2007b), which implies that grammar tests may be of more diagnostic value once the vocabulary threshold has been passed. Other “tipping points” may exist at higher levels of proficiency, but very little is still known about these although research on the linguistic characteristics of the CEFR, or other frameworks, may in due course shed light on this question. Implications for diagnosing SFL writing

We now summarize the main implications for diagnosing the development of writing that arise from research and from cognitively as well as socially-​ oriented theories of writing. Here, we present the list of assumptions stated by Camp (2012), as they are clear and succinct: • Development is contextual rather than universal, reflecting the value systems, social interactions, and conventions of the community. Writing assessment should reflect this understanding. • Development is multidimensional: assessing a single dimension of growth (e.g., cognitive maturity) and generalizing from a single performance (e.g., an end-​of-​semester test) is inadequate. Writing assessment should exhibit this understanding. • Development is influenced by the personal motivations of the individual: professional aspirations, community connections, and identity definitions are important catalysts for growth. Writing assessment should harness this understanding. • Development does not always correlate with performance quality: additional data, such as metacognition on activity or growth, or analysis of the relationship between an individual’s advancement and regression, may provide a fuller description of the uneven but genuine progress of the individual. Writing assessment should account for this understanding. (Camp, 2012, pp. 101–​102)

The development of writing ability  51

While Camp (2012) referred to writing assessment in general, our discussions in this chapter show that these assumptions about the nature of writing development are also highly informative for diagnostic assessment. To sum up, the development of learners’ writing skills depends on individual as well as contextual factors and encompasses cognitive and linguistic aspects. Individual factors such as age, cognitive maturity, motivation or uptake of instruction will influence individual developmental trajectories in SFL writing, as will learners’ L1 and other SFL they previously or simultaneously acquire(d). Contextual factors refer to the educational or instructional settings or to the communities of practice the learners are engaged in. This also encompasses broader aspects such as ideologies or values, as suggested by ecological approaches. Depending on the aims of a diagnostic assessment, different factors need to be taken into account, and the focus and approach that the diagnosis takes need to be chosen carefully, in order to maximize benefits for learners and teachers/​d iagnosers. Depending on the point, zone or stage of development of the learners, it may be necessary to focus on different aspects and subcomponents of writing. We will take up the discussion of these aspects in Chapter 5, when we examine the design of diagnostic writing tasks. Before we can do this, however, we first need to turn our attention to another fundamental aspect of writing, namely the writing process. Hence, key theories of writing processes are discussed in the next chapter. The diagnosis of the writing process will be analyzed in Chapter 6.

3 THE COGNITIVE BASIS OF WRITING ABILITY WITH A SPECIAL REFERENCE TO SFL WRITING

Introduction

Cognition refers to abilities and processes which focus on knowledge, memory, and intelligence. Cognitive processes involve understanding, making inferences, categorizing information, combining knowledge from different sources, learning, and being aware of these mental activities. No wonder, then, that such general cognitive skills are so important for writing, and as an essential starting point to diagnosis, we need to understand what writing is when examined from a cognitive perspective. Hence, this chapter focuses on the information and theories related to the first stage of the diagnostic cycle. Writing requires the integration of several cognitive activities and skills (e.g., Alamargot & Chanquoy, 2001, pp. 1‒3; Olive, 2004; Torrance & Jeffery, 1999). Writers conduct these activities simultaneously even if they are usually conceptualized as separate skills (Harrison et al., 2016). Besides the obvious productive aspects, writing involves reading and comprehension (Alamargot & Chanquoy, 2001; also Chapter 2), which constitute another layer of cognition in the writing process. When writing in a SFL, further cognitive demands are added to the writing process. Since writing is determined by a highly complex array of different cognitive factors, and thus weaknesses and strengths in writing are, in turn, influenced by these features, understanding how the factors function in the writing process yields diagnostically relevant information. The cognitive aspects of writing have been mostly discussed in models of writing processes, which almost exclusively refer to writing in the first language

DOI: 10.4324/9781315511979-3

The cognitive basis of writing ability  53

(L1). These models are obviously informative for SFL writing, since the basic nature of the overall writing process in a SFL does not differ remarkably from the one in a writer’s L1. Therefore, we first introduce three influential models of L1 writing, created by Hayes and Flower (1980) with later modifications by Hayes (1996, 2012), Bereiter and Scardamalia (1987), and Kellogg (1996). The three L1 models create the basis for the SFL writing models by Börner (1989) and Zimmerman (2000) in the “Cognition and SFL writing” section in this chapter. We pay specific attention to language-​related issues such as finding suitable wording, applying grammatical rules, and following writing conventions, since they most likely cause problems for SFL writers. To fully understand the complexity of converting thoughts into writing, cognitive activities such as lexical retrieval, graphic transcription, and spelling are discussed in more detail. The final section of this chapter discusses the diagnostic implications of the cognitive aspects for SFL writing and concludes by distinguishing potential sources of weaknesses in SFL writing, that is, learning difficulties, low levels of language proficiency, and underdeveloped cognitive skills. We first discuss the cognitive commonalities and differences between writing, speaking, listening, and reading, to better understand what characterizes the writing process (see also Chapters 2 and 6). The relationship between writing and other language domains from a cognitive perspective

Language is traditionally divided into the expressive domains of speaking and writing, and the receptive domains of listening and reading. These domains work together, providing language users with a versatile means to produce and receive information. We may gather new ideas and knowledge by listening to others and by reading their texts, and then create something new by combining them with our former knowledge and writing texts or taking turns in conversations. And when writing, we are likely to silently read parts of what we have written to check, for example, spelling or content. Sometimes, we may even read our text aloud to check if it sounds right –​thus, we may listen to our own text. While speaking, speakers are also producing visual information with their gaze, gestures, and facial expressions. Thus, while listening, we usually receive plenty of visual information, and similarly, while speaking, we provide our listeners with such information. This applies to reading and writing as well. We obviously produce visual marks such as alphabetic letters or logographic characters when writing, and when reading we interpret them. But writing also provides other visual cues about the writers and their situation. What a handwritten text looks like may tell us about the writer’s gender or age or, for

54  The cognitive basis of writing ability

example, if the writer was in a hurry. Typographic features such as dividing a text into paragraphs give the text rhythm and help the reader interpret it. All four domains of language share cognitive processes. However, there are certain similarities –​and differences –​worth highlighting. Writing as production shares certain cognitive commonalities with speaking, such as verbalizing ideas, knowledge, and experiences in one’s mind for others to interpret. Not surprisingly, processing models of writing and speaking exhibit considerable similarities. Writing and speaking differ also from each other in many ways (Bourke & Adams, 2010). Writing is much slower, which is why it is assumed that in writing, more capacity is available for parallel processing (Manchón & Williams, 2016), particularly when certain aspects, such as spelling or transcription, are automatized, where transcription refers to transforming thoughts and ideas into verbal language in one’s mind, and, then, into writing. Transcribing a sentence takes longer than producing it orally, and therefore, the cognitive resources needed are spread over a longer period of time. This frees capacities for information retrieval and planning, which can be kept active during the transcription phase (Grabowski, 2007). As Grabowski (2007) points out, compared with speaking, information also stays active longer in the writer’s mind. This increases the opportunities for related information to be activated and to become part of the text. Frequent pausing also characterizes writing and may take as much as 60‒70% of the total composition time (Alamargot et al., 2007; Chapter 6). In contrast, long and frequent pausing in speaking is avoided as it easily causes speakers to lose their turn in conversation. A further difference is that writers do not necessarily have to hold in their short-​term memory what information and specific expressions were selected to be conveyed in writing, as everything can be checked in the text. If needed, the information can also be modified, corrected, or rewritten. In speaking, however, everything must be kept in memory, and modification will not entirely sweep away what was said earlier. All these factors –​distribution of the cognitive load over time, pausing, and little need for storing the text in memory –​indicate that writing requires different cognitive management strategies from speaking. In addition, these factors suggest what Grabowski (2007) calls the “writing superiority effect”, namely the cognitive advantages of writing compared to speaking. This, however, concerns only simple writing tasks, where the re-​organization of information needed for planning and generating complex texts is not required (Grabowski, 2007). Furthermore, the process of transcription, where thoughts are transformed into a visual text, makes writing more resource-​demanding than speaking (Bourdin & Fayol, 1994). Writing is closely related with reading because they share the written form of language. According to Berninger et al. (2002), the same cognitive abilities

The cognitive basis of writing ability  55

are required for acquiring reading and writing, for example, phonological awareness, which is needed for learning the correspondence between sounds and letters, and the ability to retrieve words from the mental lexicon. However, the relationships between the underlying cognitive abilities and their execution in the two literacy skills are not necessarily identical (Abbott & Berninger, 1993; Shanahan, 2016). Reading and writing share four types of knowledge (Fitzgerald & Shanahan, 2000; Shanahan, 2016; Chapter 2). The first is metaknowledge, which refers to the awareness of the purposes and different functions of reading and writing. The second is knowledge of substance and content. Both readers and writers usually have some prior knowledge, but they can also generate new information while reading or creating texts. The third category shared by readers and writers is knowledge about universal text attributes. This includes phonological knowledge about how sounds and letters correspond and how the written words can take different shapes (e.g., fonts), as well as knowledge about syntactic rules, awareness of how text can be organized into paragraphs, and how text and graphics work together. The fourth category is procedural knowledge and skill to negotiate reading and writing, which refers to knowing how to gain and apply the knowledge from the three other categories. This also includes the cognitive processes involved in retrieving information from memory and intentional strategies such as questioning or searching for analogies between phenomena. Cognitive models of writing in L1

Given the cognitive complexity of the writing process, processing models help us understand the different sub-​processes of writing. Since the 1980s, several cognitive models of L1 writing have been introduced, and we next discuss three such seminal models. Hayes and Flower (1980) and later Hayes (1996, 2012) captured the writing processes from the perspective of expert writers, whereas Bereiter and Scardamalia (1987) took a more developmental approach and introduced two models: one for beginning writers and the other for advanced writers. Finally, Kellogg’s (1996) model, building on the earlier models, integrates Baddeley’s (1986) view of the functions of working memory to depict its involvement in the writing process. Hayes-​Flower model (1980) with its updates by Hayes

In their influential model, Hayes and Flower (1980; Flower & Hayes, 1980) treat writing as a problem-​solving process (MacArthur & Graham, 2016). To identify different writing processes, Hayes and Flower used verbal protocol analysis, where expert writers had to think aloud while writing expository

56  The cognitive basis of writing ability

FIGURE 3.1 Hayes-​F lower

Model of Writing Processes. (Source: Hayes & Flower,

1980, p. 11).

texts. The researchers identified three writing processes (Figure 3.1): planning a text, translating propositions into written text, and reviewing the written result. These processes are referred to also in later models of writing (Torrance & Jeffery, 1999). It should be noted that in Hayes and Flower’s model, translating does not refer to converting expressions from one language to another but converting propositions into verbal form and finally into grammatically acceptable sentences. Propositions are not necessarily stored in the memory in any language, but rather as experiences, images, feelings, knowledge, ideas etc. In Hayes and Flower’s model, planning a text involves the sub-​processes of generating, organizing, and goal-​setting. Generating refers to activities which produce notes and ideas based on the task and the writer’s knowledge in the long-​term memory. The information from these sources is then organized into a writing plan that meets the goals of the task. Translating the ideas into a text presumably happens according to the writing plan. Hayes and Flower do not identify any sub-​processes of translating, nor do they describe how the ideas are converted into language and written symbols. The final stage of writing, reviewing, includes both reading and editing, and aims to improve the quality of the text so that it meets its goals. Hayes and Flower’s model was the first to describe the writing process as an interactive and recursive activity (Abbott & Berninger, 1993). Older, sequential models treated writing as a linear process where the writer progresses from one phase to another without returning to the previous phases. Hayes and Flower’s model allowed interaction among the processes as well as an ongoing combination of the different activities. However, it describes the processes of competent

The cognitive basis of writing ability  57

writers only. For example, young children almost exclusively concentrate on translating processes and skip planning and editing (Abbott & Berninger, 1993; McCutchen, 1996; Chapter 6). The translation process may require so many cognitive resources from inexperienced writers that little capacity is left for planning and reviewing. In fact, later studies (e.g., Zimmermann, 2000) argue that the requirements of the translation (or formulation, in Zimmermann’s terminology) process have been underestimated in the earlier writing models. The original Hayes-​Flower model has gone through several modifications (Hayes, 1996, 2012) and the latest version (Figure 3.2) by Hayes (2012) differs significantly from the original. The latest model divides into resource, process and control levels, which, thus, envisages writing as comprising a number of simultaneous activities. It further emphasizes the interactive and recursive nature of the writing process, which was already present in the original Hayes-​ Flower model. The resource level refers to the mental resources writers use to create new texts. The resources include using long-​term memory to access prior topical knowledge, reading new topic-​related information, and using working memory to process information for writing purposes. All this requires the writer’s attention to the topic and the writing activity itself. The process level in the model combines the task environment and the actual writing processes. Collaborators and critics as well as task materials and written plans guide the process of proposing ideas for the text. The ideas are verbalized and further transcribed into written text; at any point, the products of these processes can be evaluated for inclusion in the text under construction. The whole process is controlled by the writer’s motivation and the goal of writing. The elements included in the original Hayes-​ Flower model (1980) and the latest version also differ. Hayes (2012) removed monitoring, planning and revising from the model but added motivation, working memory, and transcription. These changes reflect the outcomes of writing research after the introduction of the original model. For example, working memory is now included, since its importance for literacy activities has been well documented in research (see also the “Memory and writing” section in this chapter). As Hayes stated (2012, p. 370), excluding working memory from the original model was an “obvious oversight”. Also adding motivation is very understandable, since it influences not only particular writing tasks but writing in general. From the perspective of SFL writing, the inclusion of transcription into writing processes is also welcome even if the resources transcription requires in either L1 or SFL are still not fully understood (Galbraith et al., 2007; MacArthur & Graham, 2016). For example, our familiarity with the instrument (transcription technology, see Figure 3.2) we use for writing is an important factor in generating text (see the “Graphic transcription and spelling in SFL” section in this chapter).

58  The cognitive basis of writing ability

FIGURE 3.2 Updated Version of Writing Processes by Hayes. (Source: Hayes, 2012,

p. 371).

The exclusion of planning, revising, and monitoring in Hayes’ revised 2012 model warrants some elaboration. This change originates from how Hayes defines the nature of those processes. Since the model of writing describes the relationship between the sub-​processes of writing, both planning and revising are defined as sub-​processes in the 1980s model. However, Hayes (2012) argues that they are in fact writing events of their own with all the same sub-​elements as any other writing task. They are also considered as parts of goal-​setting of a writing task on the control level of the model (Figure 3.2). The purpose of planning is to create a written text which includes ideas, viewpoints, facts, beliefs, and other structural and content elements, which then can become parts of text production. Similarly, the purpose of revising is to essentially create a new text, since when it is done thoroughly, revision concerns the whole structure and content of the text as well as the small details of grammar and spelling. The result of revision may be very different from the original text, and while producing it the writer has gone through the same sub-​processes as in creating the original text. Thus, defining planning and revising as productions of a new text makes it unnecessary to place them into the writing model as sub-​processes. Instead, they are seen as goals of writing, which distinguishes Hayes’ model from the other models discussed here (it also differs from our approach in Chapter 6 on the diagnosis of the writing process).

The cognitive basis of writing ability  59

The new role of planning as a goal and not as a sub-​process of writing is also related to why monitoring is excluded from Hayes’ (2012) model. Originally (Figure 3.1), Hayes and Flower referred to monitoring with regard to how writers sequence the writing process between planning and writing. In Hayes (2012), planning is no longer a sub-​process of writing, and, therefore, there is no need for monitoring in the model either. Hayes-​ Flower and Hayes models have certain general implications for diagnosing writing. The different processes outlined in them imply that learner strengths and weaknesses can be interpreted with reference to particular stages or processes of writing. Relevant diagnostic information about these processes can be obtained by using, for example, think-​aloud protocols, retrospective interviews, and portfolios (see Chapter 6 for more details on diagnosing the writing process). Bereiter and Scardamalia’s model (1987)

While the Hayes and Flower (1980) model described the writing processes of competent writers, Bereiter and Scardamalia (1987) took a more developmental approach. They argued that beginning writers’ processes were greatly simplified compared to those in Hayes and Flower’s model. Bereiter and Scardamalia identified two developmental writing strategies. The first is a knowledge-​telling strategy used by novice writers, which means that they tell only what they know about the topic. After defining the topic and the genre of the text, writers collect the available knowledge on the topic and text genre (i.e., discourse convention) they have in their long-​term memory and display that knowledge in the completed text (Figure 3.3). For competent writers, the writing process is expanded to include planning, and writing shifts from pure knowledge-​telling to knowledge-​transforming, which can encompass combining information stored in memory, creating innovative ideas, or making inferences based on the writer’s previous knowledge (Figure 3.4). Bereiter and Scardamalia do not describe how writers progress from knowledge-​telling to knowledge-​transforming (MacArthur & Graham, 2016). However, Hayes (2011, 2012) argues, based on research by Fuller (1995) and Donovan (2001), that there are in fact several sub-​strategies in knowledge-​telling that concern the topical structure of the text. Children show development in how they utilize these strategies in their L1 writing from grade 1 to 9. The first strategy mostly used only by some very young writers is the flexible-​focus strategy. Texts using this strategy have a chain-​like structure where something new said in the previous sentence becomes the topic of the next sentence. The next step in development is indicated by the use of fixed-​topic strategy where each sentence is a new statement related to a common topic; this was the most popular strategy during the first five grades at school. From grade 6 onwards, writers were found

60  The cognitive basis of writing ability

FIGURE 3.3  Knowledge

Telling Model of Writing by Bereiter and Scardamalia. (Source: Bereiter & Scardamalia, 1987, p. 8).

to increasingly use topic-​elaboration strategy more; in this, one sentence starts a new, title-​related sub-​topic, which is then dealt with in the following sentences before the writer starts the next sub-​topic. After this, children move to the knowledge-​transforming phase. Bereiter and Scardamalia’s models can contribute to the diagnosis of writers at different developmental stages. For beginners, diagnosis could focus on how content and discourse knowledge contribute to telling a story. For advanced writers, the focus could shift towards analyzing how the ‘problem’ set in an assignment is translated between content space and discourse space, thereby diagnosing how writers make inferences and develop new ideas, and whether these transformation processes are appropriate. The sub-​strategies suggested by Hayes may help design a more coherent diagnostic approach that aims at tracing writing development. Kellogg’s model (1996)

Both Hayes and Flower’s and Bereiter and Scardamalia’s models are built around information stored in and retrieved from long-​ term memory. In Kellogg’s model (1996, see also 1999), the main resource for cognitive activities in writing

The cognitive basis of writing ability  61

FIGURE 3.4  K nowledge-​ Transforming

Model of Writing by Bereiter and Scardamalia. (Source: Bereiter & Scardamalia, 1987, p. 12).

is the working memory (WM) which enables writers to access knowledge in the long-​term memory, apply it to the text, and temporarily store ideas, text fragments, and other intermediate products to be used in composing a text. The construct of working memory in Kellogg’s model (Figure 3.5) is adopted from Baddeley (1986) and includes three components: the phonological loop for processing verbal information, the visuo-​spatial sketchpad for processing visual and spatial information, and the central executive for coordinating WM operations, directing input to the phonological loop and visuo-​ spatial sketchpad, and retrieving information from the long-​term memory. Kellogg’s model shows certain parallels to models of speech production (Levelt, 1989). It borrows the idea of translation from Hayes and Flower (1980), while formulation can later be found, for example, in Zimmermann (2000). In Kellogg’s model (Figure 3.5), writing is divided into three production systems: formulation, execution, and monitoring, each of them involving two basic processes. The basic processes in the formulation system (formulation processes) include planning of ideas and translating them into verbal language. The processes in the execution system (execution processes) include programming of motor units and executing muscle movements in handwriting or typing.

62  The cognitive basis of writing ability

FIGURE 3.5 Model

of Writing by Kellogg. (Source: Kellogg, 1996, p. 59).

Finally, the processes in the monitoring system (monitoring processes) include reading the existing text and editing both the text and the mental formulations of ideas and linguistic constructions (Figure 3.5). As illustrated by the straight lines in Figure 3.5, the basic processes within each of the three production systems (i.e., Formulation, Monitoring, and Execution) require resources from the WM components (i.e., Central Executive, Visuo-​ Spatial Sketchpad, and Phonological Loop; see top of Figure 3.5; The exception is the execution of muscle movements, which does not depend on working memory). Particularly the demands placed on the central executive are high since it is involved in all three production systems. However, the demands may vary according to the task and the writer’s experience. Simultaneous formulation, execution, and monitoring processes are possible as long as they do not exceed the capacity of the central executive. Planning and translating can use the resources of all three WM components. When planning the text with the help of graphic sketches or mind maps, the visuo-​spatial sketchpad is involved, yet planning is also an executive function with self-​regulatory features, thus placing demands on the central executive as well. Translating ideas to concrete sentences makes use of the phonological loop. However, when writers try to find the right way to express their thoughts, they also rely on the central executive. The more writers struggle with finding structures and words, the more is required from the central executive. Kellogg (1996, 1999) argued that the more the executing processes of writing are automatized, the less they use cognitive resources. All the resources that execution processes require come from the central executive. These resources are

The cognitive basis of writing ability  63

needed for programming the execution but not for moving the muscles during handwriting or typing. In monitoring processes –​reading and editing –​both the phonological loop and the central executive are involved. When writing is monitored, reading is done to find errors and to create feedback for correcting the errors. Reading alone involves many cognitive activities and requires substantial capacity of working memory (see, e.g., Alderson, Haapakangas et al., 2015, pp. 128‒134), particularly when done for monitoring purposes. L1 writing models: Implications for SFL writing and diagnosis

The cognitive models discussed above were designed with L1 writers and –​in the case of Hayes-​Flower as well as Kellogg’s model –​competent writers in mind. However, while a competent and experienced L1 writer remains so despite a change of language, SFL proficiency affects the writing process (Manchón & Williams, 2016). SFL writing is slower and imposes a higher cognitive load than L1 writing, although it becomes more fluent with increased proficiency and experience (Chenoweth & Hayes, 2001). Bi-​and multilingualism can be advantageous for SFL writing as multilingual persons have better selective attention, metalinguistic awareness, and cognitive flexibility (see Ransdell et al., 2001, review). It is still unclear how L1 writing skills are transferred to SFL writing and what similarities exist between writing in different languages (Manchón & Williams, 2016). Silva’s (1993) meta-​analysis of studies comparing L1 and SFL writing concluded that college-​level students use similar writing processes in both languages, but these processes differ qualitatively. SFL writers planned their texts less than L1 writers and used much of that time for generating material for the text instead of organizing the material and setting goals for writing processes. Verbalizing ideas and turning them into sentences was found to be more laborious and time-​consuming for the SFL writers who were also less productive: their texts were shorter, they paused more frequently and longer, and they were uncertain about their vocabulary, which frequently led to consulting a dictionary. Furthermore, SFL writers were found to review their texts less than L1 writers. Thus, at least advanced SFL learners have the same overall structure in their writing process as in L1 writing, as depicted in the L1 writing models, but differences exist, especially in the linguistic formulation of the ideas (also Roca de Larios et al., 2016). Beare and Bourdages’ (2010) study of high-​level bilingual Spanish/​English speakers found that only a few of them made explicit use of their stronger language when generating text in the other language but used that language throughout the writing process. However, the first language is an important resource for most SFL learners. For example, Cumming (2013b) shows how SFL writers use their L1 to solve problems that typically concern linguistic

64  The cognitive basis of writing ability

issues such as vocabulary, phrases, and grammatical structures they want to use. Similar observations of strategic use of L1 have been made in many other studies (e.g., Albrechtsen et al., 2008; Murphy & Roca de Larios, 2010). L1 is not switched off or replaced by SFL but utilized as a versatile tool in the writing process. Thus, L1 provides SFL writers with additional resources and strategic help, but not without side effects, since working simultaneously in two languages taxes their cognitive capacity. The implication of the research is that the focus of diagnosis should change with learners’ SFL proficiency. With beginners, it could lie on surface elements of the text, and on goal-​setting and monitoring strategies. More advanced learners could be diagnosed with a focus on their adherence to genre conventions of a specific discourse community (Chapter 2). Diagnosis and subsequent actions can also be informed by the insights from comparing L1 and SFL writers, with a particular focus on planning, organization, and setting goals, as these seem to be underused in SFL writing. While think-​aloud protocols are useful to gain insights into cognitive processes, we have to concede that thinking aloud itself requires cognitive resources. This additional strain on learners’ cognitive resources, particularly during early writing development, can impede learners’ writing processes, and thus potentially lead to an imprecise diagnosis. Memory and writing Long-​term memory and writing

In all the models introduced above, memory is conceptualized as an integral and essential part of writing processes. Hayes and Flower (1980) described long-​term memory (LTM) as mostly a repository of material where skilled writers store many kinds of information about diverse topics, audiences, and genre conventions. For each task, writers retrieve relevant information from their LTM and create a text for the intended readers using the genre which best suits the topic and the audience. Hayes and Flower referred to the combination of task environment, long-​term memory, and writing processes as the writer’s world: environment and LTM form the context in which the processing model operates. Bereiter and Scardamalia’s (1987) writing model differentiated between the strategies of knowledge-​ telling and knowledge-​ t ransforming, both of which retrieve information from LTM. In the knowledge-​telling model, the retrieved information is directly turned into the content for the text, whereas in the knowledge-​t ransforming model, the information is transformed, that is, adjusted, shaped, and used as a starting point for inferencing or creating new ideas which are then organized into a text. Presumably, the transformation process requires more resources from a writer’s working memory than the

The cognitive basis of writing ability  65

knowledge-​ telling process. Bereiter and Scardamalia did not apparently acknowledge the potential role of working memory. Rather, they saw the two strategies as developmental stages. The connection between writing development and working memory was recognized by McCutchen (1996) and will be discussed in more detail below. The resources of working memory in writing

When working memory was acknowledged in the writing process models (Hayes, 2012; Kellogg, 1996;), the question of its resources, and especially their sufficiency, became essential. According to Baddeley (1986), WM is both a temporary storage for materials and a place where cognitive operations such as combining or manipulating information take place. However, WM resources are limited. Thus, if most of its processing capacity is used by the generation of grammatically accurate expressions in SFL and by translating these expressions into sentences following orthographic rules, no capacity may be left for planning and monitoring the text. This chimes with Just and Carpenter’s (1992) capacity theory that had a strong effect on Kellogg’s model and on our current understanding of how memory works. In Just and Carpenter’s (1992) capacity theory of comprehension, capacity is defined as the maximum amount of activation available to support temporal storage of information and processing it, that is, the two main WM functions. If a comprehension task demands less activation than is available in WM, the task is easily completed in terms of memory functions. However, if the task demands more resources than available in WM, processing becomes slower or certain aspects are dropped from storage, or both. In other words, shortages in WM capacity constrain comprehension. People differ in how much activation is available in their WM for language processing either because their total WM capacity may vary or because people differ in how efficiently they can use their WM resources. McCutchen (1996) applied Just and Carpenter’s capacity theory to explain writing processes. She saw capacity theory as a way to explain the progress from knowledge-​telling to knowledge-​transforming in Bereiter and Scardamalia’s (1987) model. Novice writers are occupied with transcribing verbal ideas into written strings of letters and words. According to McCutchen, the active letter-​ by-​letter construction of orthographical forms of words requires much more WM resources than retrieving spelling from long-​term memory. Hence, only when transcription becomes automatized, are cognitive resources freed for processes such as planning and reviewing. These processes are cognitively very demanding and are known to be absent in young children’s writing (Abbott & Berninger, 1993). This points to the fact that beginning writers in L1 and SFL need to focus their limited capacities on retrieving relevant information from

66  The cognitive basis of writing ability

long-​term memory and writing it down, as depicted in the aforementioned knowledge-​telling strategy. Only after young writers (or beginning SFL writers) become fluent transcribers can attention be paid to planning and, finally, reviewing. Freed WM capacities pave way for knowledge-​ t ransformation strategies. Some research, such as Hoskyn and Swanson (2003), points towards a relation between age and WM capacity: they compared writers at different ages (15, 30, and 77 years) and found that with increasing age, both WM capacity and the complexity of the structures used in the texts decrease. The interplay between the functions, resources, and structures of WM is recognized as a multifaceted phenomenon and has led to different approaches to describing WM components (Torrance & Jeffery, 1999). In Just and Carpenter’s capacity theory, all WM resources are put in one pool and the three WM components share these resources. If, for example, the central executive uses all the available resources, no resources are left for the phonological loop and the visuo-​spatial sketchpad. McCutchen seems to agree, and there is some empirical evidence for it (Just et al., 1996). Kellogg (1996, 1999), however, adopted a different approach to WM resources and based his writing model on the idea that every WM component has its own resources. If the central executive uses up all its resources, the phonological loop and the visuo-​spatial sketchpad can still function, due to their own resource pools. Dividing WM resources this way may seem beneficial for SFL writing: if the different memory components have their own resources, one particularly demanding process cannot block other processes by draining all resources. There still seems to be a lack of consensus about whether WM relies on a single or separate resource pools (Stewart, 2013). Hence, it is difficult to infer implications for diagnosis until a clearer picture emerges. Furthermore, the question remains of how the labour of the overall writing process is divided between the components. This question is discussed next. The division of labour between the working memory components

Kellogg was the first to discuss the division of resources between working memory components in the context of writing (Figure 3.4). He argued that the central executive component of WM is involved in all writing processes except the movements of muscles during the executing processes. He also claimed that the visuo-​spatial sketchpad is needed in planning, because it often includes sketching ideas also in visual form (e.g., mind maps, drawings). Finally, the phonological loop is involved in translating ideas into sentences and in monitoring. More recent research (e.g., Kellogg et al., 2016) has given a far more detailed picture of the role of WM in writing, showing that it is much more complex than was assumed before. The focus of research has moved from connections between WM and separate writing processes to connections between the specialized functions of separate WM components and writing processes.

The cognitive basis of writing ability  67

The role of the phonological loop (verbal working memory) has been studied to some extent in the context of writing, but the results have given an unclear picture of the relationship between the loop and the other components (Olive, 2004). In contrast to Kellogg’s original ideas, Levy and Marek (1999) suggested that the phonological loop is needed more in formulation than in revision processes. Olive et al. (2008) specified the functions of the phonological loop during writing, suggesting phonological and orthographic support when translating ideas into language, as well as support with grammatical and syntactic encoding. Contrary to this research, Vanderberg and Swanson (2007), for example, found that the phonological loop did not predict any of the writing processes. The involvement of the visuo-​spatial sketchpad in writing is also unclear. Kellogg (1996) claimed it to be part of planning, but Vanderberg and Swanson’s (2007) results showed that functions of the visuo-​spatial sketchpad did not predict any writing processes. The only WM component involved in writing, according to Vanderberg and Swanson (2007), was the central executive, which also is the key component in Kellogg’s original model. Kellogg and his colleagues have recently done further studies on the visuo-​spatial sketchpad. They found that word meanings impact strongly on how WM processes them nouns with a concrete meaning create both verbal and visual representation, whereas abstract nouns have a verbal representation only (Kellogg et al., 2007). They also suggested separation of the visuo-​spatial sketchpad into two separate systems, the visual and the spatial WM systems (see also Olive et al., 2008). Their study showed that text composing required both verbal and visual WM in equal amounts. The unexpectedly high demands for visual WM may at least partly be explained by the monitoring of spelling while reading and editing the text. Olive et al. (2008) found only a minimal effect of the spatial memory component. Writing processes differ in how much demand they place on working memory and with regard to their level of activation during writing. Olive (2004) argued that planning and revising a text are more demanding than translating content into language. On the other hand, translating is activated most often during writing. The duration and timing of activation varies as well. According to Olive (2004), translation from thoughts to language is activated constantly during writing, whereas the activation of planning processes decreases, and the activation of revision processes increases towards the end of the writing process. Several points in the above discussion are relevant for diagnosing SFL writing. Working memory plays a central role in the writing process, and its different components are involved to a differing degree in different phases of the process. Regardless of the structure of the resource pool for the WM components, these resources are limited, and so is WM capacity. The availability of resources helps explain changes in the writing processes, such as learners moving from knowledge-​ telling to knowledge-​ t ransforming strategies. Furthermore,

68  The cognitive basis of writing ability

differences among learners’ writing performances can partly be explained by their WM capacity or its components. To use existing memory tests for diagnosing writing or to create new diagnostic measures, more research is needed on the relationship between WM and writing. However, for research purposes and by professional psychologists it is possible to use, for example, neuropsychological tests focusing on memory functions such as Wechsler’s Memory Scale (originally Wechsler, 1945; the latest version WMS-​I V, 2009; see also the “Lexical retrieval in SFL writing” section in this chapter and, e.g., Alderson, Haapakangas et al., 2015, and Alderson et al., 2016, as examples of such research). Cognition and SFL writing –​specific attention to formulation processes Cognitive models of SFL writing

Writing combines the use and gathering of topical resources with applying linguistic knowledge, which both tax the cognitive system. Since the dependence on linguistic knowledge is stronger in SFL than in L1 writing (Schoonen et al., 2003), we concentrate on the role language plays in writing. Undoubtedly, greater writing experience and knowledge of written language conventions in any language help SFL writers, since this experience can be transferred from one language to another (e.g., Hirose 2006; Kobayashi & Rinnert, 2012; Rinnert & Kobayashi, 2009). However, below a certain threshold in language proficiency, writers’ whole attention is focused on linguistic problems, so no resources are left for applying more general knowledge about writing (Schoonen et al., 2003, and Chapter 2). This chimes with the threshold hypothesis suggested for SFL reading (Alderson, Haapakangas et al., 2015, pp. 70‒71). The existing cognitive models of L1 writing emphasize planning and revising. In contrast, the process of translating or formulating –​the conversion of ideas and thoughts into language and further into written symbols –​has been given less attention both in these models and in research. Roca de Larios et al. (2006) suggested that the neglect of the complex processes of formulation is due to an assumption that matching intentions and expressions happens somehow automatically. This, however, is a very simplified view even in L1, let alone in SFL writing. Writers can spend around 70% of the total composition time in formulation processes, regardless of the language of writing, although problem solving during the formulation process appears more prominent in SFL (Roca de Larios et al., 2001; see Chapter 6 for details). One model that looks at the relation of L1 and SFL in the writing process and particularly at the role of translating words and expressions from one language into the other is Börner’s (1989) process model, building upon Hayes and

The cognitive basis of writing ability  69

Flower (1980; who, however, used translation to refer to the conversion of ideas to linguistic expressions). Börner’s model not only considers how instruction might affect SFL writing processes, but it also accounts for the translation processes between L1 and SFL (SFL is called interlanguage in Börner’s model). Furthermore, Börner acknowledges that texts can be written during more than one writing session and that the ‘original text’ can be reformulated and edited (see also Chapter 6). Börner’s model encompasses three main components, which are the writing processes closely modeled on Hayes and Flower, the writing environment referring to the task environment and the writer’s memory, and the instructional environment, which refers to aspects such as learning outcomes, teaching goals, materials, and assessment (Figure 3.6). According to Börner, L1 influence becomes particularly apparent when accessing memory, and during planning and formulating. Checking and monitoring during writing also involves both L1 and SFL. The assumption of L1 influence is also supported by Krings (1986, p. 423), who argued for an interaction of L1 and SFL processes in SFL writing. This interaction might explain L1 transfer in SFL texts (see Chapter 2). When it comes to diagnosing SFL writing processes, Börner’s model can suggest which aspects to focus on in the diagnosis, similar to the implications of the Hayes and Flower model above. Furthermore, Börner’s model suggests instructional and intertextual factors be taken into account when interpreting assessment results.

FIGURE 3.6 SFL

Writing Processes According to Börner. (Source: Börner, 1989, p. 355. Translation from German by the authors).

70  The cognitive basis of writing ability

Another SFL writing model based on Hayes and Flower (1980) is the Formulation model by Zimmermann (2000). Zimmermann put formulating in SFL writing at “the heart of the writing process”, because it is where the gap between intended meaning and the available linguistic means becomes most evident to writers and creates problems for them. Zimmermann’s model of formulation in SFL writing (Figure 3.7) is in part based on Kring’s (1989)

FIGURE 3.7  Zimmermann’s

Model of Formulation (Source: Zimmerman, 2000, p. 86).

in

SFL

Writing.

Note: Tent Form =​tentative sentence formulation; Tent Form x L1 =​Tent Form in L1, preceding first Tent Form in L2; Tent Form1/​Mod =​modified Tent Form; Tent Form1/​Rep =​repeated Tent Form; Tent Form s =​simplified Tent Form; Tent Form 2,N =​new, alternative Tent Forms.

The cognitive basis of writing ability  71

attempts to identify sub-​processes of formulation in SFL writing and describes the production of single sentences. It focuses on the translation phase, placed between planning and reviewing in Hayes and Flower’s model. This does not, however, imply that formulation only takes place during the middle phase of composing a text, but this is the phase where formulation processes are most typically activated. Zimmermann’s (2000) model includes several sub-​processes of formulation. These involve converting ideas into linguistic items, which likely results in tentative formulations of ‘pre-​text’. They are then evaluated and either accepted, modified or rejected. Rejection may result in postponing a particular formulation for later, or it may trigger formulation of new forms. After evaluation, the tentative form may undergo simplification. Sometimes, a tentative form in SFL is first created in L1, which was, however, rare in Zimmermann’s data from university students majoring in SFL English (see also Beare and Bourdages’, 2010, study on bilingual English/​Spanish speakers in this chapter). These results do not, however, mean that L1 is not used during a writing process in SFL. Zimmermann’s informants, too, often used their L1 to reflect on what and how to write. The importance of L1 as a resource for SFL writers is well documented in research (e.g., Albrechtsen et al., 2008; Cumming, 2013b; Murphy & Roca de Larios, 2010). Although Zimmermann (2000) claimed that his more detailed identification of formulation processes turned the original Hayes and Flower’s (1980) model into a model for SFL writing, many aspects in his model also apply to L1 writing. Only the sub-​processes of tentative formulation in L1, simplification of tentative forms, and SFL problem solving can be identified as SFL-​specific. Otherwise, Zimmermann’s model serves as a general elaboration of the formulation processes applicable to both L1 and SFL writing. Despite the similarities between L1 and SFL writing processes, it is, however, important to highlight the special position of formulation in SFL and the cognitive activities required for it. These activities are connected to the so-​called lower-​ level processes of writing, namely to graphic transcription, spelling, lexical retrieval, and syntactic construction (Fayol, 1999), as opposed to the higher-​ level processes, which concern the text composition process as a whole. The formulation processes are closely connected to writing fluency, identified from at least two different angles: the product-​based and process-​ based perspective (Abdel Latif, 2013). In the product-​based definition, fluency is seen as the quality of the final text. Writing is fluent, if the text reads well, it is coherent, contextually appropriate, and acceptable. In the process-​oriented approach to writing fluency, the focus is on the fluency of writers’ processes during writing, since fluent low-​ level writing processes release cognitive resources for planning and other higher-​level processes (Y-​S. Kim et al., 2018; see also Chapter 6).

72  The cognitive basis of writing ability

When evaluating fluency, researchers have focused on different sizes of written units, such as single words, T-​units, sentences, and connected texts (e.g., Abdel Latif, 2013). It is important to distinguish fluency in the different units of writing, since, for example, writing a connected text requires both formulation and text generation which exceed the word level, whereas writing single words requires only low-​level processes (Y-​S. Kim et al., 2018). Fluency can be defined in many different ways. Within the process-​oriented approach, for example, Y-​S. Kim et al. (2018) see fluency of writing connected texts as efficiency and automaticity, where efficiency refers to the accuracy and speed of writing and automaticity to speed, effortlessness, and lack of conscious awareness. The definition by Piolat et al. (2008) refers to fluency in more general terms as “the ease with which a writer mobilizes the processes and knowledge needed to compose his/​her written message” (p. 116). However, they measure it by simply dividing the number of words by the time spent on the writing task. A very narrow definition comes from Snellings et al. (2004a, 2004b), who reduce fluency to the speed of written lexical retrieval. Despite their different emphases, all these definitions include such properties as speed and ease as essential characteristics of fluency. Therefore, lexical retrieval and graphic transcription –​both processes which often pose problems for SFL writers –​seem to be promising aspects to diagnose in SFL writing fluency. Below, we take a closer look at the two key processes in fluency and their diagnosis. Lexical retrieval in SFL writing

During formulation, ideas are converted into linguistic representations in working memory and subsequently transcribed into written symbols (Abbott & Berninger, 1993). The formulation process, thus, depends on the availability of linguistic resources such as vocabulary, grammatical forms and structures together with different meaning potentials associated with them, and typical genre conventions (see Purpura, 2016). Because written text is usually permanent insofar that clarifications can no longer be made once the text is presented to its readers, writing in general puts high demands on the accurate, concise, and precise formulation of ideas (note however that such interactive online text genres as chats may differ in this respect). As Schoonen et al. (2009) noted, writers often feel that their text should be correct in every aspect. The search for acceptable expressions, also in terms of genre conventions, is captured in Zimmermann’s model in the stages of tentative formulations, evaluations, rejection, and simplification of linguistic items (see Figure 3.7). This assumption of correctness may also force writers to narrow their focus on the linguistic features of writing, at the expense of planning or overall text structure (Schoonen et al., 2009). To write fluently requires fluency in

The cognitive basis of writing ability  73

retrieving linguistic items. Semantico-​g rammatical retrieval in SFL writing, however, seldom happens at the same pace as in L1, which places additional demands on working memory when composing text in one’s SFL. Although lexico-​ g rammatical retrieval –​the search for expressions to convey the intended meaning in a given linguistic, social, and genre context –​ is crucial for writing fluency (e.g., Snellings et al., 2002, 2004a, 2004b), it has not received much attention in SFL writing research. Furthermore, studies have often concentrated only on such measurable aspects as speed or number of words. For example, Chenoweth and Hayes (2001) found that experience with the language of writing correlated with writing fluency, identified as “the rate of production of the text” (p. 81), measured as the number of words per minute. They found that limited SFL experience resulted in more effortful lexical retrieval, which restricted the resources available for applying grammar rules. Writers with more SFL experience retrieved lexical items more rapidly and were able to allocate more cognitive resources to grammatical issues, which reduced the need for revision. The more advanced language learners were also more likely to allocate more resources to the topic of their text. Snellings et al. (2002) examined whether lexical retrieval could be practised and whether practising enhanced the fluency of lexical retrieval in SFL. Their study on Dutch-​speaking 9th graders showed that computer training improved the retrieval speed of familiar English words. In a later study, Snellings et al. (2004a) found that retrieval speed of the practised English words correlated positively with learners’ writing fluency. That is, with relatively little training (once a week for four weeks), it seems possible to reduce the cognitive demands of word retrieval and, thus, increase the availability of cognitive resources for other writing processes. Hence, diagnosing the speed of retrieval of familiar words could indicate further training needs. Measures such as Rapid Automatized Naming (RAN; Dencla & Rudel, 1974) and Rapid Alternating Stimulus (RAS; Wolf, 1986) have been created in neuropsychology to assess the speed of lexical access, and used in research on L1 reading acquisition and dyslexia. In both measures, test takers have to name 50 items provided as a matrix as quickly and accurately as possible. In RAN, the items represent the same category (numbers, letters, colours, or objects) whereas in RAS they represent at least two different categories (e.g., numbers and letters). Since the focus is on retrieval speed, the items need to be very familiar to ensure that language knowledge does not interfere with the result. In SFL contexts, the language in which the learners have to respond might matter, and therefore, for example, in the DIALUKI study investigating SFL reading and writing, both L1 and SFL versions of the RAS task were used (Alderson, Haapakangas et al., 2015). The Written Productive Translation Task (WPTT; Snellings et al., 2004b), was created to assess the speed of lexical retrieval in SFL. WPTT is a

74  The cognitive basis of writing ability

computer-​based test where learners complete 67 sentences in SFL by translating and typing, for each sentence, a word which is provided to them in their L1. Only words assumed to be familiar to the learners were selected for translation in the study. The computer program registers reaction times for the first keystroke and for pressing the Enter key (i.e., completion of the item), and correctness of the response. The advantage of this measure compared to RAN and RAS is that it can be administered to groups instead of individuals. Another advantage is that it is specifically designed to assess SFL skills. There are, however, challenges in the WPTT type of tasks. The main one concerns learners’ SFL vocabulary knowledge as this task is premised on test takers’ familiarity with the SFL words whose L1 versions are presented to them. It is difficult to ensure that all the selected words are familiar to all learners. Therefore, separate measures of learners’ SFL vocabulary knowledge, for example, may be needed to complement this approach to diagnosis. Graphic transcription and spelling in SFL

Besides lexical retrieval, formulation entails converting verbal expressions into visual writing, graphic transcription and spelling. Orthographical issues need special attention in SFL writing, particularly if the orthographic system in SFL and L1 differ significantly (Kormos, 2012). Despite their importance, these processes of writing have received little attention (Galbraith et al., 2007; MacArthur & Graham, 2016). In a literate environment, where children have constant access to written material in L1, writing starts well before children write the first conventionally spelled words with recognizable letters (see Tolchinsky, 2016). The emergent writing ‒ scribbling, drawing, and labeling pictures ‒ then gradually develops into producing letters and strings of letters, which usually represent the most salient sounds of the words (Caravolas et al., 2001). These written entities are not yet necessarily interpretable for a reader. Gradually, these fragments start to resemble recognizable words (Bourke & Adams, 2010). Graphic transcription is slow and controlled in young children and requires considerable cognitive resources (McCutcher, 1996). Bourdin and Fayol (1994; also Fayol, 1999) showed that the act of writing interfered with maintaining information in working memory in young children. The same happened in adults when they had to write with capital letters, which is unusual and, hence, demands cognitive resources. Thus, a cognitive “overload” in transcription decreases the execution of other writing processes even in L1, although graphic transcription is assumed to be automatized in adults (see also Just and Carpenter’s capacity theory in this chapter). The cognitive effects that graphic transcription has for beginning SFL writers are obvious. When writers’ L1 and SFL share, for example, the Latin script, there

The cognitive basis of writing ability  75

is no need to learn a new set of writing symbols. However, the correspondence between letters and sounds can differ even between languages that share the same script (consider, for example, the letters Ä, Ö, Õ or Å that are used in some European languages but not in English, for instance). Learning to write in a SFL becomes more effortful if the new language has a different alphabet (e.g., Cyrillic) or the orthography represents a different writing system. The English alphabet, for instance, contains 25 letters, while the Japanese hiragana script, a syllabic writing system, has 48 different symbols. It is easy to see how retaining 48 symbols in one’s memory places higher cognitive demands on the writer than 25 letters. The production of characters by SFL writers is more conscious and controlled compared to children who have recently learned to write letters, which means the graphic transcription alone makes formulation in SFL demanding. Transcription of language into writing is only partly about producing the written characters. Another important aspect is spelling. Some researchers argue that spelling is more challenging to master than decoding and reading (Caravolas et al., 2001; Keilty & Harrison, 2015). However, spelling has been studied less, and, moreover, the research that exists has concentrated on alphabetic orthographies, especially English (Caravolas, 2004), and on children. Our knowledge about how spelling develops in syllabic and logographic writing systems is, thus, very limited. Equally limited is our understanding of how adults learn to spell in a SFL or how they acquire a new orthography, script, or even a whole new writing system. It seems that in alphabetic orthographies, spelling skills are built on similar core elements: phonological awareness, letter knowledge, and reading (Caravolas, 2004). Phonological and phonetic awareness in particular have been found to be the strongest predictor of spelling (e.g., Caravolas et al., 2001; Yeong & Rickard Liow, 2011). However, even alphabetic orthographies are not equally easy to acquire. It is more challenging to learn spelling skills in opaque, inconsistent orthographies such as English than in transparent ones where letters/​ g raphemes and sounds/​ phonemes correspond more closely (Caravolas et al., 2001). Irregular spelling increases the cognitive load of writing (for a review, see Fayol, 1999). The importance of phonological and phonetic awareness changes when spelling skills become more automated and accurate. For example, Jongejan et al. (2007) found that during the first two years of schooling, phonological awareness explained 50% of the variance in spelling in English L1 speaking children’s writing after which, in grades 3–​4, syntactic awareness became dominant, alone explaining 54% of the variance in spelling. For SFL spelling, the importance of phonological awareness is not necessarily the same as in L1. Jongejan et al.’s (2007) study also had participants who were acquiring English as a second language. For them, phonological awareness explained only 24% of the variance in spelling in grades 1–​2 and it continued to

76  The cognitive basis of writing ability

be the strongest predictor of spelling in grades 3–​4 with 35% of the explained variance. The research on spelling acquisition by young children leaves many open questions. In Jongejan et al. (2007), the children learned literacy skills for the first time in their lives. What happens when literate adults learn to write in a SFL? Does the role of phonological awareness vary with the writing system and orthography of the learners’ L1? Learning to be literate requires phonological awareness but the acquired literacy skills also shape phonological awareness. This reciprocal relationship creates different phonological awareness depending on the orthography and writing system. It is very likely that this also affects learning to spell in a SFL. Information about spelling and transcription processes can be obtained by using keystroke logging, which provides a real-​time window to how writing proceeds, what kind of changes writers make, and how often and where they pause (e.g., Antonsson et al., 2018; Uppstad & Solheim, 2007; Chapter 6). Another approach is to administer word dictation tasks (e.g., Uppstad & Solheim, 2007). If the selection of words or creation of pseudowords for the task is done carefully, the task can provide insights into the processes underlying spelling and, thus, yield useful diagnostic information. Main implications for diagnosing SFL writing

The writing models introduced in this chapter present writing as a complicated and multi-​layered network of various sub-​processes. Writing involves interaction between language proficiency, writing skills, and cognitive resources. Working memory and long-​term memory are used when we shape letters on paper, think about correct spelling or apply grammatical rules. They function when we retrieve words, form sentences, and combine new and old information. We need memory when planning and revising texts. Although we can measure the efficiency of our memory, research suggests that memory capacity cannot be enhanced and, therefore, diagnostic implications of such measurements are unlikely to include attempts to improve memory capacity. However, certain memory functions such as the speed of lexical retrieval seem amenable to instruction as are, obviously, writers’ SFL proficiency and writing skills. A key factor in SFL writing is SFL proficiency. Weak SFL skills result not only in erroneous expressions and simplified structures but also in limited content as writers may be unable to express everything they want. From the cognitive perspective, insufficient language proficiency can turn the writing process into a mere formulation process where planning, structuring, and revising are given little attention. Thus, improving SFL skills will affect cognitive processes by freeing them for higher-​level writing activities.

The cognitive basis of writing ability  77

Another way to help cognitive processes is to diagnose and improve the sub-​processes of writing defined in the writing models. Educated L1 writers supposedly have adequate language proficiency, but they still differ in how well they can express themselves in writing (see Hulstijn, 2011). Ability to construct a convincing, coherent argument or to tell a vivid story is not acquired automatically but is usually learned through instruction. Moreover, writing in any language is affected by personal beliefs, institutional values, and cultural conventions about what constitutes good writing (Manchón, 2013). Fitzgerald (2006), based on synthesizing 15 years of research in SFL writing, concluded that most problems SFL writers encounter are not language-​specific; hence, good writers seem to create writing-​specific knowledge. Thus, any writing instruction is important for learning to write in SFL (Hirose, 2006; Kobayashi & Rinnert, 2012). In sum, SFL writing skills can be improved by both increasing SFL proficiency and increasing general writing expertise (see also Chapter 2). For diagnosing writing, this implies that language proficiency, general writing skills, strategies, and writing conventions need to be taken into account. Diagnostic feedback can aim to raise learners’ awareness of effective general writing processes and strategies, beside feedback on language-​ specific aspects. All this helps the essential cognitive functions for writing such as working memory to reorganize and update its functioning and to allocate resources to different processes. There is, however, one facet that we have not yet explored. What if problems in SFL writing are caused by learning difficulties such as dyslexia, and are, therefore, related to basic cognitive prerequisites of literacy skills? While dyslexia in reading has been studied extensively, writing has been given less attention (Morken & Helland, 2013). What is known is that spelling difficulties are particularly persistent (Berninger et al., 2008). To illustrate, in a study with multilingual (Finnish, Swedish, and English) university students with and without dyslexia, Lindgrén and Laine (2011) found that dyslexic students had a significantly higher proportion of errors concerning letter-​ sound correspondence in English, which was their least proficient language. In addition to weaker proficiency, the deep and inconsistent orthography of English was considered to contribute to the students’ spelling problems. Based on the available research, it can be expected that dyslexic learners typically encounter problems in transcription (Morken & Helland, 2013), which is one of the lower-​level sub-​processes of the formulation phase. These sub-​processes, as discussed in the “Cognition and SFL writing” section in this chapter, take a central position in SFL writing. Because dyslexia directly affects those cognitive activities that are vital for lower-​level writing processes, dyslexia very likely increases the cognitive burden of formulation, and thus diminishes attentional resources left for the higher-​level writing processes. The existing

78  The cognitive basis of writing ability

instruments for diagnosing dyslexia are usually designed for L1 learners, which makes diagnosing dyslexia in SFL challenging, especially since it is possible that problems may surface only in SFL. Diagnosing migrant learners’ writing is particularly challenging as their L1 literacy skills may vary considerably due to different educational circumstances. Yet, diagnostic tests, be it for diagnosing language skills or dyslexia, are often unavailable in migrants’ L1. When it comes to diagnosing the causes of SFL writing, the diagnoser has to consider at least three main factors: proficiency in SFL, general writing and literacy skills, and potential learning difficulties such as dyslexia. Personal and contextual variables, such as the relation between L1 and the SFL or proficiency in other SFLs in multilingual learners need to be considered, too. Differentiating reliably between learning disorders, a lack of language proficiency and a lack of literacy skills, is challenging and calls for multidisciplinary expertise for both research and practical diagnosis.

4 HOW WRITING ABILITY CAN BE DIAGNOSED

Introduction

This chapter discusses how SFL writing can be diagnosed. The diagnostic cycle introduced in Chapter 1 allows us both to focus on specific parts of the cycle and link them together. At relevant points, we will refer to themes discussed in other chapters, such as assessing and giving feedback on the processes and products (see also Chapters 6 and 9), direct and indirect assessment of SFL writing (see also Chapters 5 to 9), and contexts of diagnosis (e.g., classroom or large-​scale assessment). Inevitably, this brings into focus the who of diagnosing. Hence, we will discuss agency in the diagnosis (see also Chapters 6, 7, 9, and 10). Finally, to illustrate our discussion of how SFL writing ability can be diagnosed, we will analyze several instruments, some of which we revisit in Chapter 9. We start by reiterating the definition of diagnosis and expanding it. Our definition of diagnosis is rather broad. We are interested in both strengths and weaknesses of language learners. Nevertheless, weaknesses might be more important to understand as they enable teachers and learners to identify areas where improvement is needed. The reasons for weaknesses are many, including learning disabilities, unsuitable teaching methods or material, or a lack of motivation, to name but a few. Hence the goal of diagnosis is not just to determine learners’ weaknesses but, more importantly, to identify their sources so that appropriate action can be taken. We conceptualize SFL diagnosis as consisting of several cyclical stages (see Figure 1.1). First, we need a sufficiently detailed and accurate understanding of the construct and of how it develops and is acquired. Only then can we decide on DOI: 10.4324/9781315511979-4

80  How writing ability can be diagnosed

appropriate instruments and approaches that allow for the detection of strengths and weaknesses in performance and that yield insights into their underlying reasons. That is, the way we define the development of learner abilities determines the use of instruments to collect diagnostic information about the learners. These instruments and approaches then define what information is collected and shape the feedback to the agents in diagnosis as well as the following actions. The effectiveness of the actions can, in turn, be diagnosed again in the next cycle of diagnosis. As the cycle repeats, new insights into learners’ struggles may have emerged during the previous stages, and therefore, the goals, needs, constructs and instruments may need to be adjusted or re-​interpreted. We will next outline the key characteristics of diagnosing writing. We will then discuss the relevance of the CEFR for diagnosing writing. We will finally present and discuss selected instruments that have been used for the diagnosis of writing. Key characteristics involved in diagnosing writing

We will now outline the key characteristics of diagnosing writing: (1) agents and contexts, (2) approaches to diagnosing development as well as writing processes and products, (3) constructs to be diagnosed and instruments used, and (4) cognitive and personality-​related aspects that need to be considered in diagnosis. Contexts and agents

One of the most important factors defining the purpose and the scope of diagnosis is the context in which diagnosis takes place. Different contexts involve different agents. In diagnosis, these will mainly be teachers, learners, or researchers. However, as we have already discussed diagnosis as research, we will concentrate on how diagnosis as research can inform diagnosis by other agents –​teachers and learners. Diagnosis-​as-​research examples discussed in Chapter 3 bring into focus three main implications that research has for diagnosing learner abilities. First, it can inform the development of diagnostic instruments by contributing to the understanding of how to define and assess constructs. Second, research can contribute to establishing the sources of learner problems and how to address them. Third, instruments designed for research can be employed in classroom diagnosis, though not all instruments designed in research might be directly applicable for classroom use considering the time it takes to collect and analyze the usually rich data collected from research participants. When it comes to diagnostic assessment in the classroom, Alderson (2007b) and Llosa (2011) recommended that it should encompass products and

How writing ability can be diagnosed  81

processes, as well as students’ perspectives. With regard to examining processes, Llosa (2011, p. 270) mentioned think-​a loud and verbal protocol as particularly useful tools. Alderson, Brunfaut, et al. (2015, pp. 255–​256) recommended embedding diagnostic assessment within other assessment procedures to avoid focusing only on weaknesses. They suggested five principles of diagnostic assessment for the classroom. These include: 1. Regarding the agent, the test user is the one who diagnoses. If the teacher diagnoses, they are responsible for first, establishing the diagnostic focus, then, for testing and interpreting the problem, and finally, for finding a remedy for the diagnosed problem. 2. Diagnostic instruments should be user-​friendly, targeted, and efficient to be used and interpreted by the trained teacher or other user. They should provide rich feedback and be designed for a particular purpose, with a clear focus and capacity. 3. The diagnostic assessment process should include the views of various stakeholders, including the students (e.g., via self-​a ssessment). 4. The process should ideally encompass four stages: listen/​observe, initial assessment, use of tools/​ tests/​ experts, and decision-​ m aking. Often, the first two stages are omitted, which is acceptable for certain contexts (e.g., university entry). These stages are roughly similar to the stages in the diagnostic cycle outlined in Figure 1.1 in Chapter 1. 5. Diagnostic assessment should relate to treatment or intervention, offering a remedy for the diagnosed problem. These principles offer guidance for teachers who wish to implement diagnostic assessment systematically. Particularly principle 5 is noteworthy since it emphasizes that diagnosis should focus on aspects of ability that are teachable and learnable. However, principle 3 is also important in that it not only acknowledges learners’ contributions to the diagnostic process but also raises the importance of what teachers and learners consider to be good writing, as without having an idea of what the norm is, meaningful diagnosis is not possible. As we discussed in Chapter 1, norms are defined by a plethora of factors, including educational standards, curricula, textbooks, teachers themselves, and different communities of practice in general. It is important to consider learners’ awareness of norms in writing and how awareness may shape norms. Considering the first principle (agency), in the classroom, the teacher and the learner are the two main agents. There are cases when the learner is the leading agent, e.g., in self-​d iagnosis, but usually, teachers have more responsibility in diagnostic assessment. This is unsurprising given the complexity of the diagnostic cycle, so considerable diagnostic competence is required from those

82  How writing ability can be diagnosed

responsible for diagnosis (Edelenbos & Kubanek-​German, 2004). However, because learner autonomy is a major goal of pedagogy, in the classroom context, diagnosis should be conceptualized as a collaborative process between teachers and learners, where learners can gradually assume more responsibility for diagnosing their abilities. Diagnosing development, processes, and products

To gain the most out of learner diagnosis, diagnosing where learners are at a particular moment should be complemented by a diagnosis of where they can go next and how they can get there. This forward-​looking perspective requires understanding how learners’ abilities develop, which can best be diagnosed by analyzing the learner’s writing skills over several points in time. In Chapter 2, we outlined how writing abilities develop through the lens of various theories of development, discussing different models of how the development of writing abilities can be conceptualized. We discussed how the development of learners’ writing skills depends on their developmental stages, both with regard to learners’ cognitive development (i.e., children or adults) and the development of learners’ SFL abilities. Understanding learners’ developmental stage will influence what aspects of their writing abilities should be focused on, and what diagnostic approaches are most appropriate (see Chapter 2 and the “Constructs and instruments used for diagnosis” section in this chapter). However, it need not be assumed that development is a linear process, even if the scales used in language assessment and those found in frameworks such as the CEFR might suggest that. As discussed in Chapter 2, it is useful to distinguish between micro-​and macro-​level development, as it is particularly at the micro level where development is less predictable and can take many different routes and trajectories. The diagnosis of the development of writing skills needs to include information on both the product and the process. To elaborate, the diagnosis of development in SFL (or L1) writing can be approached in three somewhat different but overlapping ways, all of which are longitudinal in nature. First, diagnosis can take a product-​oriented longitudinal approach, where diagnosis is based on analyzing texts and the products of writing (see Figure 4.1). To diagnose development, a series of writing tasks is needed, administered over time in ascending order of complexity. Frameworks such as the CEFR or the ACTFL can be used as sources of information for designing writing tasks that differ in terms of cognitive and social demands and linguistic complexity. The actual diagnosis of learner texts could take place by rating them using analytical rating scales, and the changes in their scores on different dimensions of writing over time across different tasks would indicate how learners’ writing develops. In addition to ratings, learners’ texts can also be analyzed in more

How writing ability can be diagnosed  83

FIGURE 4.1 Longitudinal

Diagnosis With a Focus on the Products of Writing.

detail to shed light on how, for example, specific linguistic and textual features develop over time. Such analyses can, of course, be done manually, but a more practical approach can be using automated tools, if available (see Chapter 8 for a description of some such tools). Whatever the nature of the information obtained about the texts, the teacher and/​or the learner have to be able to interpret such information and draw appropriate diagnostic inferences from them, as well as give relevant feedback to learners. Then, instructors need to decide what to do next in order to develop the learner’s writing ability further. The second type of diagnosing the development of SFL writing longitudinally can be called the process-​oriented approach (see Figure 4.2). Although this approach, too, requires that learners write texts (i.e., products), the focus of diagnosis is on the writing process ‒ what happens when the learners actually plan, generate, and revise texts and how their actions might change over time. The methods that can be used to diagnose the writing process in classroom contexts include the use of portfolios, think-​aloud procedures, interviews, reflective diaries, and self-​a ssessments. It is important that the instructions to the learners who use such instruments steer learners’ attention to the writing process rather than the product. Other methods exist that can tap the writing process such as keystroke-​ logging, eye-​ t racking, and brain imaging, but apart from possibly keystroke-​logging, they are difficult to apply in regular educational contexts. Diagnosing the writing process can take place in two different scenarios. In the first, learners write multiple drafts of one text. Thus, the diagnosis of the writing process focuses on how the process develops from

84  How writing ability can be diagnosed

FIGURE 4.2 Longitudinal

Diagnosis With a Focus on the Writing Process (Across Different Drafts of the Same Text or Different Texts).

one draft to the next based on the diagnosis and feedback that the learners receive on their different drafts. The second scenario involves different texts, each of which can comprise one or several draft versions. The diagnosis, in this case, focuses on how the writing process changes from one text to the next over a longer period of time. Figure 4.2 represents these two scenarios. The third approach to diagnosing writing development longitudinally covers both products and processes. Here, dynamic assessment (see Chapter 2) seems promising, as it integrates diagnosis with mediation and learner-​orientation by analyzing where the learners can go next, that is, analyzing how learners react to assistance, allowing for diagnosing their Zone of Proximal Development (see Figure 4.3). While we differentiate among these three approaches in principle, we acknowledge that there is some overlap. For example, the same task could be used in all three approaches: while the draft product would serve as instigation to elicit further processes through modification and editing over time in the process-​oriented approach, the product-​oriented approach would use the task to elicit one product as the focus of diagnosis. Yet, in the third approach, the task would be the stimulus to discuss with the learners where they struggle and to mediate important areas needed to solve the task. Constructs and instruments used for diagnosis

Diagnosis can focus on various constructs (e.g., strategies, abilities, and types of knowledge) that differ in how specific or general, in how narrow or broad

How writing ability can be diagnosed  85

FIGURE 4.3 Diagnosis

as Dynamic Assessment.

they are, and in the aspects they include. The exact definition of the constructs will depend on the purpose and context of the diagnosis, the agents involved, as well as on learners’ proficiency. Thus, there may be more than one way of defining the aspects a construct should involve, and the level of specificity that diagnostic assessments should target can also vary. Obviously, however, construct definitions for diagnosis should be informed by the best available models of language proficiency and writing ability. Alderson (2005) suggested that diagnostic tests are more likely to aim at specific aspects of proficiency such as coherence rather than broader skills such as writing ability as a whole typically measured in language examinations for certification or other summative purposes. Furthermore, it is important to define if the diagnosis focuses on the product or process of writing, or both. Once the construct is defined, diagnostic instruments can either be selected from existing instruments or developed by the agents of the diagnosis. If we think about learner portfolios, for example, ideally, it should be learners who decide what to include in them and what aspects to focus on. Nevertheless, the main responsibility for useful diagnosis in the classroom lies with the teacher, who needs to possess the diagnostic competence, that is, “the ability to interpret students’ foreign language growth, to deal with assessment material skillfully and to provide students with appropriate help in response to this diagnosis” (Edelenbos & Kubanek-​German, 2004, p. 260). The directness of writing tasks is another important characteristic of diagnostic instruments (see Chapter 5). In direct writing tasks, learners are asked to write a text, such as a letter or an essay. The written performance is the key characteristic of direct tasks. Many direct writing tasks simulate authentic writing tasks and are usually analyzed directly with the help of a diagnostic rating scale or checklist (see Chapter 7). The performance can also be examined

86  How writing ability can be diagnosed

by using automated analyses based on algorithms that estimate, for instance, lexical diversity and syntactic complexity (see Chapter 8). Indirect tasks, as the name implies, measure writing indirectly. They target skills that are assumed to underlie the ability to write (e.g., the ability to order elements into a coherent flow logically) or components that are part of the writing skill, such as the command of specific structures or vocabulary (see ALTE, 1998, p. 147; this approach to construct definition is referred to as a trait-​based approach; see Purpura, 2016). Indirect tasks usually take the form of discrete-​point items – items that are not linked to other items and that focus on a specific feature of language (ALTE, 1998, p. 142). An example of a discrete-​ point writing test is one where a learner identifies which word in a sentence has a mistake. Both direct and indirect tasks focusing on the building blocks of writing can yield diagnostically useful information at all stages of development. At lower levels of proficiency, one may want to diagnose specific aspects of writing using, for example, indirect tests of vocabulary, accompanied by simple direct tasks to shed light on how beginners attempt to construct coherent texts. Later on, direct tasks can become increasingly complex, and learners’ writing can be analyzed by diagnostic rating scales or checklists. Indirect tasks can focus on the more demanding aspects of writing, such as text structuring or register awareness. When learners can write reasonably long texts, also automated analysis tools can yield diagnostically useful information about specific linguistic and textual features of the texts. We will discuss the relation between indirect and direct tasks in more depth in Chapter 5. Inter-​and intrapersonal factors to be acknowledged

Writing is also affected by inter-​and intrapersonal factors such as motivation, anxiety, experience, and metacognitive knowledge. These factors can influence both the writing process and product, the take-​up of diagnostic feedback, and the action following the diagnosis. Hence, where relevant, these factors should be taken into account in the various stages of the diagnostic cycle. Relevant factors can be assessed by a variety of different means and approaches. Metacognitive knowledge, for instance, can be assessed via knowledge tests (Slomp, 2012). The application of writing strategies can be analyzed by portfolio assessment and reflective writing. Dynamic assessment can provide a window to underlying thought and decision-​making processes during the act of writing. Learner variables also play a role in processing feedback from a diagnostic assessment, and learners may misinterpret the purpose of diagnostic feedback. Research on learners’ reactions to DIALANG feedback (Huhta, 2010; Yang, 2003) found that learners often perceived the feedback as mere error correction and disregarded feedback on their self-​a ssessment. Learner beliefs about feedback

How writing ability can be diagnosed  87

and assessment were found to be a major reason for this. Furthermore, Leontjev (2016a) found that some learners, influenced by their learning experience, considered the computerized dynamic assessment as a test of their learning outcomes, not as a learning opportunity. Given the importance of intra-​and interpersonal factors, diagnosis will be more valid if they are taken into account. While teachers will likely be familiar with the most relevant learner variables, in research contexts, the validity of diagnosis can be increased, for example, by employing learner self-​reports or interviews. The CEFR and its relevance to diagnosing writing

In the realm of assessment, the CEFR is a highly influential framework, despite the limitations mentioned in Chapter 2. The CEFR is relevant for both SLA in general and the diagnosis of the development of writing skills, as it can inform where learners are and what they can do with language. However, Alderson (2005, 2007b) is somewhat sceptical about the usefulness of the CEFR for diagnostic purposes. He acknowledges that the levels of the CEFR provide a snapshot of development in time and thus provide a “basis for test development at the level of interest” (Alderson, 2007b, p. 23). However, he cautions that the CEFR levels do not capture specific micro-​skills and their development. Furthermore, when analyzing tasks and specifications from a range of examination bodies, Alderson (2007b, p. 25) found no common features typical of one CEFR level or could be used to distinguish the levels. An understanding of such features, however, would be needed to diagnose strengths and weaknesses typical for a CEFR level. In a similar vein, Alderson (2005, pp. 23–​25) reported that the micro-​skills tested in DIALANG could not be related to specific CEFR levels. Despite its limitations, the CEFR defines core characteristics of writing ability in its ascending proficiency levels that are detailed and fine-​g rained enough to inform task development and learner performance analysis. Moreover, the CEFR does give an account of an idealized model of development in time. Furthermore, research is beginning to shed light on the linguistic characteristics of the CEFR levels (see Chapter 2), which will increase the usefulness of that framework for diagnostic assessment. The CEFR has been successfully implemented in diagnostic assessment, as we will discuss in the following section. Illustrative examples of diagnostic tasks, tests and instruments employed for diagnosing writing

We will now analyze and discuss a selection of instruments and procedures that have been used for diagnosing writing or fit the definition of diagnosis in

88  How writing ability can be diagnosed

this book. We note that we do not claim to give a comprehensive overview of existing diagnostic approaches to writing. Our discussion of the selected instruments and procedures will focus on the following criteria: • focus: broad to narrow, for example, argumentation vs. one grammatical feature such as articles; • tasks: direct, indirect or both, for example, asking learners to write an essay vs. using a multiple-​choice task; • process, product, or both; • development: how development is conceptualized; • assessment modality: human, computerized, or both; • assessment criteria and instruments, for example, analytic rating scales or checklists; • reporting, feedback, and recommended action. We start with those tasks and approaches that have a narrow focus, followed by tasks with a broader focus that often try to capture real-​world writing situations in direct tasks. Finally, we present examples of dynamic assessment approaches. We analyze each instrument in detail before outlining what research has been conducted in relation to its development, validation, and/​or impact on teaching and learning. GraphoLearn learning environment: Word forming task

Ekapeli/​GraphoLearn (former GraphoGame; https://info.grapholearn.com/​) learning environment is an evidence-​ based digital game for practising early reading skills. The game focuses on the basic processes of matching sounds to letters and syllables. The reason why we include this diagnostic system is that in the very early stages of learning to read and write, these two literacy skills are difficult to separate, their learning often co-​occurs, particularly in educational systems where these are taught, and knowledge of letter-​sound correspondence in early age predicts spelling (Chapter 3; Caravolas et al., 2001). Typical GraphoLearn users are children learning to read in Finnish as L1, but a version of the game has been created for young learners of L2 Finnish. However, the L2 version is also used by some adult L2 learners. In addition to the full Finnish version, the learning environment has been adopted for research purposes in more than 20 languages (these are not publicly available). The different language versions are not exactly the same, as the phonology and orthography of each language, and the known problems for learners of that language, need to be taken into account in the design.

How writing ability can be diagnosed  89

Description of the instrument

The GraphoLearn learning environment includes tasks that concentrate on letter-​sound correspondence, word recognition (decoding), reading accuracy, and reading fluency. Most task types include mapping spoken stimuli (sounds, syllables, words) with their written equivalents provided on the screen together with some distractors. After each item, the player gets feedback, which includes guidance for incorrect responses. Task items are selected according to the learner’s previous performance. Basically, every player has a slightly different game content, continuously adapting to fit the player’s skill level, needs for practise, and progress. The method is based on current theories of reading development (see Chapters 2 and 3) and the research findings in the Jyväskylä Longitudinal Study of Dyslexia (www.jyu.fi/​edu​psy/​fi/​laitok​set/​psy​kolo​g ia/​en/​resea​rch/​resea​rch-​ areas/​neuro​scie​nce/​g ro​ups/​neuro/​proje​cts/​J LD). The basic idea behind the learning environment is a phonic approach to reading (and spelling), emphasizing the consistent connections between letters and speech sounds. It has been found to be an efficient approach in languages like Finnish and German with transparent alphabetic orthography (Holopainen et al., 2002; Landerl, 2000). Both analytic tasks –​analyzing similarities and differences between words –​ and synthetic tasks –​first learning the sounds and corresponding letters to be able to recognize and form words –​are used. (See Richardson & Lyytinen, 2014, for more details.) Although the game is designed to support early reading development, there are also word-​forming tasks which require spelling of words. These tasks begin with spoken instructions –​the players always use headphones. Each word-​ forming item starts with a spoken word, and the learner has to form a word from the letters provided on the screen (Figure 4.4). In the beginning phase, only those letters that are needed for the correct spelling are provided, but later in the game; there will also be distractors (Figure 4.5). The player uses the mouse, or touches on a touchscreen device, on the letters to put them in the correct order (Figure 4.5). If the player chooses a wrong letter or selects the letters in the wrong order, the incorrect letters fall back into the bank of letters, and the player has to try again. After all items are completed, the player will be credited with stars, coins, etc., according to the number of correct responses. The criterion for a correct response is that the learner succeeds on the first try. In the example provided, the units to be used in word formation are letters. However, the task can be modified depending on the focus of the practice. For example, if the focus is on learning to express sound durations in writing, both single (short) and double letter (long) alternatives are presented to the learner. The elements provided for the word formation can be single letters, letter

90  How writing ability can be diagnosed

FIGURE 4.4 A

Screenshot From GraphoLearn for Finnish (1).

Note: The learner has just heard the stimulus word (puu ‘tree’) in the headphones and has to choose the letters in the correct order to form the written word.

FIGURE 4.5 A

Screenshot From Grapholearn for Finnish (2).

Note: The learner has selected two letters and put them in their place.

How writing ability can be diagnosed  91

combinations, or syllables, and the distractors can be either randomly selected or fixed in advance. In the latter case, the distractors can be used to guide the learner to think about particular phonological features of the language and how they are written. For example, when the game focuses on consonant clusters such as “ts” in the word “katse” (“a look”, “a glance”), the elements provided may include both “ts” and “st” unit. The same strategy can also be used in practising how to differentiate between letters that look alike (“p”, “b”, “d”; “m”, “n”; “n”, “u”). What do we know about the instrument?

GraphoLearn has been shown to improve L1 reading skills (e.g., Heikkilä et al., 2013; Hintikka, et al., 2005; F. Kyle et al., 2013). Evidence about its effectiveness in improving writing in L1 or SFL is only indirect, but given the intertwined nature of reading and writing at this earliest stage of literacy development, the system is likely to improve some of the basic components of writing, such as spelling. Certainly, the construct assessed by the system is very narrow. However, it provides useful information about young learners’ emerging writing abilities that can be acted upon by teachers. This makes the GraphoLearn system a potentially useful external instrument for the SFL Finnish classroom –​versions of the system based on other languages have the same potential, should they become available for wider use in the future. Certainly, the predefined tasks in the system give less agency to the teacher, and the learner, with regard to what is diagnosed. However, teachers are not directed in any way by the system in how they can approach the problem, as the feedback they receive from the learner profiles is limited to learners’ problems only. This can be seen as both an advantage and an obstacle for teachers who need to develop an idea of how to approach the identified problems. Roxify

Roxify is a diagnostic system aimed at providing feedback about learners’ weaknesses and sketching directions for following actions. The system targets SFL English university students’ academic writing (see Chapter 9). Description of the instrument

The system focuses on marking clichés and vocabulary from the Academic Word List (Coxhead, 2000) and General Service List (Browne et al., 2013) in the analyzed texts, the use of vocabulary in general, correct use of citations,

92  How writing ability can be diagnosed

FIGURE 4.6a  A

Screenshot From Roxify: A Text Written by a Learner With Colour Codes to Indicate Problematic Points.

among other points. Following the analysis, both the learner and teacher see a colour-​coded version of the learner’s text, different colours indicating elements that should be paid attention to (Figures 4.6a and 4.6b). These categories are displayed for both teachers and students. To give an example, upon clicking on the clichés sections, learners and teachers see words that should be avoided in academic writing. Under this category, slang, plague words, and idioms are marked. The feedback under the vocabulary section includes marking features that are indicative of quality academic writing, such as hedges, and those that should be avoided, such as duplicate vocabulary. The authors suggest that the system should, above all, be used as a tool to improve the quality of learners’ writing across different text versions. What do we know about the instrument?

Miller and Miller (2017) reported on the validation process so far. Four intact classes at City University of Hong Kong (n =​53) and their teacher took part. The

How writing ability can be diagnosed  93

FIGURE 4.6b  A

Screenshot from Roxify: Feedback Related to the Text in Figure 4.6a.

authors collected both qualitative and quantitative data regarding the participants’ experiences with the Roxify system. The overwhelming majority of the learners (n =​51) considered the feedback they received useful, noting, for example, that they could learn from the feedback and that it helped them to self-​correct their mistakes. However, some learners found the usefulness of the system limited, as it did not provide correct responses. This raises the issue of the importance of considering learner goals and beliefs of what constitutes useful feedback. We can assume that these students’ orientation to the product prevented them from fully appreciating the feedback that places the responsibility for acting upon the feedback onto learners. We should add that Roxify is not the only instrument that conducts an automatic analysis of vocabulary. However, it appears to be the first such instrument specifically designed to be diagnostic. A similar instrument is Text Inspector (http://​textin​spec​tor.com/​), which reports a number of lexical/​syntactic indices, including lexical diversity, vocabulary difficulty, and metadiscourse markers. The unique feature of Text Inspector is that it assigns a CEFR level to the texts analyzed based on a number of indices. These indices are selected based on the statistically significant differences across the CEFR proficiency levels in a large learner corpus. We should note, however, that it can be problematic to claim that, for example, particular vocabulary ‘belongs’ to a CEFR level (see Alderson, 2007a). DIALANG writing tasks

DIALANG is a multilingual language assessment system designed to provide learners with diagnostic feedback (Alderson, 2005; Alderson & Huhta, 2005).

94  How writing ability can be diagnosed

It was the first computerized large-​scale language assessment system that was created specifically for diagnosing language skills and the first system based on the CEFR. It also pioneered in large-​scale integration of self-​a ssessment and formal tests. DIALANG writing is based on indirect tasks, such as multiple-​choice or gap-​fi ll, which require the production of only a few words at most. DIALANG also includes an 18-​statement self-​a ssessment instrument that users take before completing the main language test such as the writing test. DIALANG provides its users with information about the CEFR level indicated by their test performance and by their self-​a ssessment, which enables learners to compare the two. Furthermore, the users receive advice on how they can improve their writing. Description of the instrument

The DIALANG writing test comprises about 30 items which measure one of three aspects of writing: accuracy, appropriacy, or textual organization. Three screenshots below exemplify such items. The first item is a three-​g ap testlet intended to measure English language learners’ ability to form coherent texts. The learners have to complete each gap with the connecting word that makes the most sense (Figure 4.7). Figures 4.8a and 4.8b display two English items focusing on sociolinguistic appropriacy. The items require the learner to identify a word that differs in style from the rest of the text and write the word in the space provided. In these examples, the word is either too informal (“mess”; Figure 4.8a) or too formal (“displace”; Figure 4.8b). The fourth item concerns the linguistic accuracy of a text (Figure 4.9). In this case, the learner has to spot a word that is grammatically inaccurate (“others”) and correct it.

FIGURE 4.7 A

Screenshot of a DIALANG Writing Testlet Item in English, With English as the Interface and Feedback Language, Intended to Measure Textual Organization.

How writing ability can be diagnosed  95

FIGURE 4.8a A

Screenshot of a DIALANG Writing Item in English, With Spanish as the Interface and Feedback Language, Intended to Measure Sociolinguistic Appropriacy.

FIGURE 4.8b A

Screenshot of a DIALANG Writing Item in English, With Spanish as the Interface and Feedback Language, Intended to Measure Sociolinguistic Appropriacy.

FIGURE 4.9 A

Screenshot of a Dialang Writing Item in English, With Spanish as the Interface and Feedback Language, Intended to Measure Accuracy.

Learners’ responses to items like these can be used to make inferences about their underlying ability to organize their texts in a logical way or to use accurate and appropriate expressions. Since the tasks are indirect, it is not possible to observe whether they can apply the underlying skills that were inferred from their responses. In DIALANG, the inferencing can be done by the learners or their teachers when they study the pattern of their correct vs. incorrect responses to the items testing the different aspects of writing (see Figure 4.10).

96  How writing ability can be diagnosed

FIGURE 4.10 DIALANG

Sample Feedback on the Items in a Writing Test.

What do we know about the instrument?

Research on DIALANG has mainly focused on the design stage of the tests, including the piloting of the items (Alderson & Huhta, 2005), and on the feedback (Floropoulou, 2002; Yang, 2003; Huhta, 2010). However, only some studies report specifically on the writing component of DIALANG. Alderson and Huhta (2005) report that the standard setting of the test items on the CEFR levels indicated that the correspondence between the empirical item difficulty and the expert judgments of item difficulty was very strong for the English reading items. The corresponding results for writing are not reported. The inter-​rater correlations among standard-​setting judges (n =​7) were reported for all the skills for the German tests: for writing items, they were quite high, with only one judge clearly deviating from the others. Overall, about half of the languages in the system are based on empirical piloting, and those that are, the authors state, demonstrate adequate quality. Studies on users’ perception of DIALANG and its feedback have not targeted writing specifically, but the findings shed light on the writing tests and feedback. Huhta (2010) investigated the perceptions of over 550 Finnish and German users of DIALANG. The reactions of the vast majority of the users were positive. The users especially appreciated the chance to see their level of proficiency and to spot particular problems through examining item-​level feedback. The comprehensive nature of the system was also appreciated. Some learners also reported that computerized, ‘anonymous’ feedback felt safer than teacher feedback. The main criticisms concerned the impersonal and partly too vague nature of the feedback. Some learners reported having needed a teacher to interpret some of the feedback for them (see also Chapter 8). Floropoulou

How writing ability can be diagnosed  97

(2002) and Yang (2003) studied university students’ reactions to DIALANG feedback and found out that learners’ goals for studying and previous experience of language tests affected their views. Students whose main aim was to get a degree rather than learn English were unlikely to use the advice given by the system. Other students who compared DIALANG with international language tests did not appreciate the feedback or the purpose of doing self-​a ssessment. Overall, DIALANG illustrates how a broad range of computerized and, at least potentially, diagnostic instruments and feedback can be operationalized. At the same time, studies of its users also shed light on the limitations and constraints of diagnostic assessment if it is detached from a full diagnostic cycle. The Diagnostic English Language Needs Assessment (DELNA)

DELNA aims to diagnose university students’ academic language learning needs after they are accepted to tertiary-​level education (Read & von Randow, 2016). It was developed at the University of Auckland, New Zealand, in 2002 in collaboration with the University of Melbourne, Australia, which has its own version of the system. DELNA has two stages, screening and diagnosis (University of Auckland, n.d., DELNA Handbook, ). The online screening is a short, 20-​m inute test of vocabulary and reading. The diagnostic part is more extensive –​about 2–​3 hours depending on the version –​and includes reading, listening, and writing sections. All first-​year undergraduate students have to participate in screening, and, in principle, everybody below a certain cut score should take the diagnosis part of DELNA. Since 2011, doctoral students have been taking DELNA (Read & von Randow, 2016). All students have to take the screening part of DELNA regardless of their first language, even if one’s L1 is English. The results (a profile) of the diagnostic part of DELNA are shared with both the students and their mentors and are used as the basis for recommendations for further action, such as language courses, content-​specific language tutorials, or visits to a Student Learning Centre (Doe, 2014; Fox et al., 2016). DELNA is, thus, embedded in a process that is similar to the diagnostic cycle that we consider necessary for useful diagnosis. Indeed, describing DELNA, Fox et al. (2016, p. 45) stated, “We use the phrase diagnostic assessment procedure to underscore that diagnostic assessment and concomitant pedagogical intervention are inseparable. We argue that a diagnostic assessment procedure cannot be truly diagnostic unless it is linked to feedback, intervention, and support” (emphasis in the original). Fox et al. (2016) described the theoretical framework underlying DELNA with reference to activity theory (Leontyev, 1981; Engeström, 1987). Activity theory brings together the individual, community, and symbolic and material tools. In DELNA, the individuals are the students, the community consists of students, peers, and teachers engaged in

98  How writing ability can be diagnosed

the language support activities, and the tools are the diagnostic instruments and feedback. Fox et al. (2016, p. 45) argue that there are, in fact, two intertwined activity systems that take place in DELNA: the diagnostic assessment activity and the pedagogical support activity. Description of the instrument

The writing part of DELNA comes in two versions, one for doctoral students and the other for students in previous stages of education. Both are paper-​ based. The version for non-​doctoral students consists of one 30-​m inute (200–​ 250 words) academic essay, based on a graph, table, or diagram (University of Auckland, n.d., p. 8). Students’ essays are rated against three criteria: fluency, defined as coherence, cohesion, and appropriate level of formality, content –​ accuracy and relevance –​and grammar and vocabulary –​variety, accuracy, and complexity. Descriptions of these criteria are given to the students in the DELNA Handbook (University of Auckland, n.d., p.8–​9; see also Knoch, 2009b). The writing component for doctoral students is longer (70 min.) and includes two tasks. Both are based on two brief texts that express opposing views on a non-​specialist topic; in the example given in the DELNA Handbook (University of Auckland, n.d., p. 16–​17), the topic is international students in New Zealand. The first task is to summarize the texts in one’s own words (max. 150 words). The second is to write an academic essay on a given topic related to the two texts. The task instructions state “[Y]‌ou should express a clear point of view and support it with a logical development of your own ideas and an appropriate conclusion. You may refer to ideas in the source texts to build your argument” (University of Auckland, n.d., p. 9). The length of the essay should be about 250–​300 words. The criteria for rating it are organization, academic style, quality of discussion, sentence structure, grammar, and vocabulary (University of Auckland, n.d., p. 9). What do we know about the instrument?

Several studies have been conducted on DELNA (e.g., Elder & von Randow, 2008; Elder & Erlam, 2001; Fox et al., 2016; Knoch, 2009a,b; Read & von Randow, 2016). Here, we focus on those that concern the writing part of DELNA. The scale used for rating the writing samples focuses on the discourse features that characterize academic writing, as was described above. Knoch (2009a) argues that the scale reflects several features typical of diagnostic assessment as proposed by Alderson (2005), namely that it allows the identification of learners’

How writing ability can be diagnosed  99

strengths and weaknesses in their language use and allows a detailed analysis, reporting, and feedback that can be acted upon (Knoch 2009a, pp. 295–​297). Doe (2014) summarized the evidence for the quality of the DELNA rating procedures by stating that while the quality of reading and listening components of the diagnostic parts of DELNA is sufficient, inter-​rater reliabilities for writing are not regularly reported. All performances are, however, double-​rated and a third rater is used to arbitrate differences between the two raters. Doe (2014) added that the texts written by low-​performing students and their ratings are also reviewed by language advisors before meeting with the students. Fox et al. (2016) reported on a multi-​method study of students’ and peer and staff mentors’ experiences with DELNA. Results for writing are not reported separately, but the main findings are likely to apply to all parts of the procedures. The results suggest, for example, that students’ ownership of their language studies increased as evidenced by a clear increase in the voluntary participation in arranging of support activities (such as workshops). VERA8: Writing tasks used in German secondary schools to diagnose learners’ writing abilities

VERA8 stands for Vergleichsarbeiten Klasse 8, a large-​scale diagnostic assessment taking place in grade 8 in German secondary schools. Here, we look at the tasks and assessment checklists developed for diagnosing writing. The approach employs direct writing tasks eliciting products from the learners. It is a paper-​ based test, human-​ delivered and human-​ rated, whereby analytic checklists based on the CEFR are used. Feedback is given on how learners perform in relation to the descriptors in the checklists. The instruments were originally developed at the Institute for Educational Quality Improvement for the purpose of educational monitoring in a large-​scale assessment in Germany (Rupp et al., 2008; Harsch & Rupp, 2011; Harsch & Martin, 2012, 2013), calibrated and formally aligned to the CEFR (Harsch et al., 2010). Selected instruments were used in a diagnostic classroom-​based assessment which took place for the first time in 2009 across Germany. The aim was to support teachers with diagnostic information based on standardized and calibrated instruments, and to enhance school development and teaching quality (KMK, 2018). Three different test packages were compiled to account for the different ability levels of the learners taking the assessment, EFL learners in grade 8 (i.e., 14–​15 years old): lower (A1/​A 2), middle (A2/​B1), and higher (B1/​B2). The diagnostic assessment was administered by the teachers. Depending on the county (Bundesland), writing products were assessed and reported either centrally or by the teachers.

100  How writing ability can be diagnosed

FIGURE 4.11 VERA8

Assessment Approach.

Description of the instrument

The direct writing tasks operationalized specific CEFR levels from A1 to C1. The learner texts were assessed with five checklists that described features relevant for each of the five CEFR levels targeted by the tasks. Hence, the approach was called level-​specific, as Figure 4.11 depicts. This approach allows for the diagnosis of whether a learner can fulfil certain level-​specific demands and expectations, as operationalized by the tasks and checklist descriptors. Since learners work on level-​specific tasks, they may show more or less than is required, yet no inference can be made as to the CEFR level they may best be placed in; hence, the question marks in Figure 4.11. This is why learners always worked on more than one task and more than one level so that overall, teachers could gain a more solid picture of students’ writing abilities. The writing tasks reflected real-​life communicative writing tasks. They operationalized selected characteristics typical of specific CEFR levels (see Harsch & Rupp, 2011). The tasks were piloted, calibrated, and formally aligned to the CEFR (Harsch et al., 2010). Table 4.1 gives an example of a task operationalizing CEFR level B1. With regards to the assessment of the learner texts, four criteria –​task fulfilment, organization, grammar, and vocabulary –​are defined; each criterion is described separately for each of the CEFR levels A1 to C1, and these descriptions can be used as level-​specific checklists. All level descriptions together form a rating scale developed and validated in several rounds using a combination of analytic judgments (Harsch & Martin, 2012; Harsch & Martin, 2013). Figure 4.12 depicts the scale hierarchy.

How writing ability can be diagnosed  101 TABLE 4.1 VERA8 Writing Task Sample Operationalizing Level B

Sports Accident You have just had a sports accident during your stay at your American partner school. Now you have to write a report for the American school. You must include all the following information: • date, time and place of accident • what you were doing just before the accident happened • what happened to cause the accident • part(s) of body hurt • action taken after the accident Write 110 to 140 words. Note: From www.iqb.hu-​ber​lin.de/​bista/​auf ​bsp/​ver​a8_ ​2​0 09/​enfr/​B eis​piel​aufg​abe_ ​S chr​eibe​n _ ​EW​042.pdf (Accessed 24.10.2011, no longer available).

FIGURE 4.12 Outline

of the VERA8 Rating Scale.

For the diagnostic assessment in 2009 and 2010, the descriptors were compiled in a separate checklist for each of the targeted CEFR levels. Tables 4.2a and 4.2b give an example of the diagnostic checklist used for level B1. To the best of our knowledge, the writing tasks were administered in 2009 and 2010. What do we know about the instrument?

Apart from an internal technical report on the instruments used in VERA8, to the best of our knowledge, there are no publications available, neither on

102  How writing ability can be diagnosed TABLE 4.2a VERA8 Checklist for Task Fulfillment and Organization, Level B1 Target for Sports accident: Level B1

Code 0 1 2 3 insufficient below at level above evidence level B1 B1 level B1

Task Fulfilment (TF) TF1: Content points

Student describes most of the following content points: • accident embedded in a situation • cause of accident • type of injury and/​or part(s) of the body hurt • reaction to the accident TF2: Ideas relevant Student makes reference to school to the task environment and an accident during a sports event (sports injury /​car accident /​fight with resulting injury or accident). TF3: Register /​ Student writes consistently in a Tone neutral /​non-​informal tone, since it is a report for the school. TF4: Text type Student writes a factual description Requirements (in most parts of the text); there should not be too many narrative parts. TF5: Communi-​ Student shows a clear picture of the cative Effect sports accident because enough information is provided. Organization (O) O1: Structure /​ Thematic development

O2: Language/​ Cohesion

Student produces a straightforward connected text (narrative or descriptive) in a reasonably fluent manner or links a series of shorter discrete simple elements into a linear sequence of points in a reasonably fluent manner. Thematic development shows a logical order and is rounded off. Student uses a number of common cohesive devices throughout the text, such as articles, pronouns, semantic fields, connectors, discourse markers (like ‘so’ (consecutive), ‘in my opinion’). He/​she shows reasonable control of common cohesive devices.

Note: From www.iqb.hu-​ber​lin.de/​bista/​auf ​bsp/​ver​a8_ ​2​0 09/​enfr/​Kodier​sche​ma_ ​k​ompl​ett.pdf (Accessed 24.10.2011, no longer available).

How writing ability can be diagnosed  103 TABLE 4.2b VERA8 Checklist for Grammar and Vocabulary, Level B1 Target for Sports accident: Level B1

Code 0 1 2 3 insufficient below level at level above level evidence B1 B1 B1

Grammar (G) G1: Range

Student uses a range of frequently used structures (such as tenses, simple passives, modals, comparisons, complementation, adverbials, quantifiers, numerals, adverbs). Sentence pattern shows simple variations (e.g., subordinate and coordinate clauses often beginning with ‘when’, ‘but’; relative clauses and if-​clauses).

G2: Accuracy Student uses structures and sentence patterns reasonably accurately. Some local errors occur, but it is clear what he/​she is trying to express. Few global errors may occur, especially when using more complex structures /​sentence patterns (e.g., relative clauses, if-​clauses, passives and indirect speech).









Vocabulary (V) V1: Range

Student shows a sufficient range of vocabulary (beyond basic) to express him/​herself in familiar situations; some circumlocutions may occur in unfamiliar situations.

V2: Accuracy Student shows good control (i.e., adequate and appropriate use) of basic vocabulary. Some non-​ impeding errors occur. Impeding errors may occur occasionally, especially when expressing more complex thoughts or handling unfamiliar topics and situations. Overall Note: From www.iqb.hu-​ber​lin.de/​bista/​auf ​bsp/​ver​a8_ ​2​0 09/​enfr/​Kodier​sche​ma_ ​k​ompl​ett.pdf (Accessed 24.10.2011, no longer available).

104  How writing ability can be diagnosed

the use of the instruments nor on the outcomes of the diagnostic writing assessment. What is known about the instruments is the research published on the calibration, validation, and standard setting for the educational monitoring study (see above). Since the educational context remains the same for monitoring and diagnosis, the instruments could be meaningfully employed for the same learner group. While no research has been published on validating the instruments for the specific purpose of diagnosis, we can still make a judgment on their potential usefulness for diagnosing writing abilities. The instruments used in VERA8 facilitate the diagnosis of strengths and weaknesses in learners’ writing abilities since direct communicative tasks are employed which operationalize specific, curriculum-​relevant objectives. Furthermore, since the checklists operationalize particular descriptors, they facilitate the analysis of learners’ abilities to structure and organize coherent texts and to appropriately use grammatical structures and vocabulary for the task in question. The level-​specific tasks and descriptors in the checklists allow for insights into how far students meet level-​ and curriculum-​specific objectives and facilitate detailed feedback on students’ strengths and weaknesses. Empirically-​derived Descriptor-​based Diagnostic (EDD) checklist

Y.-​H. Kim (2010; 2011) reported on the development and piloting of a checklist used for scoring learners’ essays, which allowed for building rich diagnostic profiles of EAP (English for Academic Purposes) university learners’ writing. Description of the instrument

The checklist includes 35 statements related to content fulfilment (e.g., “This essay answers the question”), organizational effectiveness (e.g., “The ideas are organized into paragraphs and include an introduction, a body, and a conclusion”), grammatical knowledge (e.g., “Verb tenses are used appropriately”), vocabulary use (e.g., “Vocabulary choices are appropriate for conveying the intended meaning”), and mechanics (e.g., “This essay contains appropriate indentation”). The writing samples used by Y.-​H. Kim (2010) for the development of the checklist came from two retired TOEFL iBT versions. For the initial version of the checklist, 480 writing performance samples were used. Sixteen SFL English teachers, four SFL English academic writing experts, and the author were involved in the checklist development (Y.-​H. Kim, 2010, pp. 63–​65). As illustrated by Y.-​H. Kim (2011), the diagnostic profile designed with the help of the checklist included the overall probabilities that each of the five broad categories was mastered by the learner as well as what the learner was

How writing ability can be diagnosed  105

likely to be able to do (e.g., “use linking words effectively”) and where they might be requiring more effort (e.g., “you might need more work connecting independent clauses correctly”). The checklist statements were accompanied by their relative difficulties (d =​difficult, m =​medium, and e =​easy), as illustrated in Figures 4.13a, 4.13b, and 4.13c. What do we know about the instrument?

Y.-​H. Kim (2010) piloted the instrument with 7 SFL English teachers and 80 TOEFL iBT essays followed by a larger-​scale study with 10 teachers and 480 essays. The author found that the checklist allowed for reliable discrimination between students in terms of their diagnostic profile although some negative

FIGURE 4.13a  Proposed

EDD Diagnostic Profile –​Overall Writing Ability (Source: Y.-​H. Kim, 2011, p. 540).

FIGURE 4.13b  Proposed

EDD Diagnostic Profile –​Writing Skills Profile (Source: Y.-​H. Kim, 2011, p. 540).

106  How writing ability can be diagnosed

FIGURE 4.13c Proposed EDD Diagnostic Profile –​Feedback on Content Fulfillment

(Source: Y.-​H. Kim, 2011, p. 540).

evidence was also found. The participating teachers found the checklist to be a useful diagnostic tool covering various aspects of SFL writing. The author concluded that one of the major implications of using the checklist is that learners rather than teachers can themselves regulate their development based on their profiles. That said, the author acknowledged the danger of dichotomizing writing competence into can or cannot do, which can result in an imprecise picture of learners’ abilities. The author suggested a polytomous version of the checklist be developed in future. The checklist was further validated and refined by Xie (2017) using cognitive diagnostic modeling. The author found that the checklist was effective in identifying learners’ strengths and weaknesses in the five categories proposed by Kim, and it could discriminate between beginner, intermediate, and advanced learners. Overall, EDD is an interesting example of how an external instrument can be used to grade learner essays allowing for different scenarios in which teachers or learners are responsible for the follow-​up action. Potentially, the checklist can also be used longitudinally although –​as the participating teachers noted –​ task differences can affect the resulting profiles. The European Language Portfolio

The European Language Portfolio (ELP, www.coe.int/​en/​web/​portfo​l io) serves to monitor the progress of learning and continuous self-​a ssessment of learning products, which underscores its diagnostic value for learners. There is a variety of such instruments for different target groups and in different source languages, and the ELP is not limited to writing but involves different sub-​skills covered by the CEFR. We include the ELP here as an example of a direct assessment of broad constructs. Learners assess their abilities in doing things with their

How writing ability can be diagnosed  107

L2 with the help of the descriptors provided (these can be in L1 or L2). The learners also can compile a dossier containing examples of what they can do in the L2, which in turn can form the basis for teacher and self-​d iagnosis. This is also an example of an instrument in which full responsibility for assessment is shifted to learners. In the following section, we will concentrate on the self-​ assessment grid in the Language Passport and the can-​do statements of the Language Biography part of the ELP, illustrating these with the example of the ELP developed for young learners (aged 12–​16) in Estonia (version 093.2007). Description of the instrument (parts)

The ELP has three main parts, a language passport, including a self-​a ssessment grid, a language biography, including can-​do statements, and a dossier, and is perhaps best to be analyzed holistically. However, a comprehensive discussion of the ELP from a diagnostic point of view would require a separate chapter. The self-​a ssessment grid and the can-​do statements help learners to understand what they are already able to do and where they should be heading next, thereby, having the biggest diagnostic value for learners. The self-​a ssessment grid of the ELP comes verbatim from the CEFR (Council of Europe, 2001, Table 2, pp. 26–​27). For B1 level writing, for example, the scale states: I can write simple connected text on topics which are familiar or of personal interest. I can write personal letters describing experiences and impressions. These statements serve as a quick way for the learners to evaluate what they can do in writing (and other sub-​skills) and what is required from them at the following level of proficiency, thus setting the following actions. The instructions urge learners to fill in the Language Passport regularly, especially when they move schools, so that they can show their teachers how well they can perform in the L2. Teachers, therefore, benefit from the instrument, too. However, the concise nature of the ELP descriptors to enable learners to make rapid judgments about their language skills may under-​represent the construct of interest. The Language Biography part builds on the self-​a ssessment grid and involves learner self-​d iagnosis. The instructions for this part of the ELP state “[f ]‌i lling in the language biography will help you to think about the way you learn foreign languages, where you use them and what you can do to make your foreign language learning more efficient”. Thus, it aims to address both the diagnosis and the follow-​up action. Furthermore, it conceptualizes both as parts of a recurrent cycle, thus fitting our definition of diagnostic assessment and our understanding of the diagnostic cycle.

108  How writing ability can be diagnosed

The Can-​Do checklists usually offer a binary yes/​no option. However, the Estonian ELP suggests three options, as illustrated below: — I can do this very well — I can do this well — I have to practise this more The instructions state that learners should complete this part twice a year, and when they see that they can do about 80% of the activities listed in the checklist for a particular sub-​skill at a particular CEFR level, they can mark that they have reached that level in their Language Passport. An example of the can-​do statements in the Estonian instrument is given in Figure 4.14 (Alp et al., 2007). Figure 4.14 shows that to help the learner understand what the particular can-​do statements mean, examples are provided. The three options of “can do very well”, “can do well”, and “more practise is needed” are operationalized via pictorial symbols. These symbols, we assume, are used to both save space and to add to learners’ judgment of their abilities –​how satisfied they are with their ability to complete a task. Considering the limitations of binary choices raised by Y.-​H. Kim (2010), the Estonian approach, also adopted, for example, the Austrian ELP version 094.2008, is a welcomed alternative. What do we know about the instrument?

Schärer (2000) reported on a pilot study of the ELP conducted in 1998–​2000 and involved piloting 7 different models across 16 countries. About 30,000

FIGURE 4.14 The

Checklist for Writing at Level A2 in the Estonian ELP.

How writing ability can be diagnosed  109

learners and 1,800 teachers participated in the study. Overall, the participants were positive about the benefits of the instrument as a tool for learning. They especially valued the learner self-​assessment, as it impacted the individuals’ learning through reflection and increased motivation. They, however, reported that training is required for both teachers and learners to benefit from the instrument fully. One major challenge for learners is to decide when they have reached a certain CEFR proficiency level in a specific (sub-​)skill. Little (2005, p. 327) hypothesized that a learner being able to perform just one task at a certain proficiency level could assume that they have achieved this level. In the aforementioned Estonian ELP, learners are guided by the requirement of meeting 80% of the checklist statements for attaining the level. This question is, however, not particular to diagnosis; rather, it pertains to all alignment endeavours. There is no universal agreement regarding what percentage of tasks and requirements of a given proficiency level need to be fulfilled in order to be classified as “having attained” the level, ranging from 50% to near 100% (see, e.g., Council of Europe, 2009). With regard to the development and validation of the self-​ a ssessment checklists of an ELP, Schneider and Lenz (2001) outline several principles. Namely, descriptors should adhere to the criteria of positiveness –​formulations should describe what learners can do rather than what they cannot, definiteness –​ concrete tasks should be included, clarity –​formulations should be transparent and void of jargon, brevity –​they should be short, and independence –​their understanding should not depend on other descriptors. Overall, the ELP can be considered the broadest of the instruments discussed in this chapter. It also places the highest level of autonomy and responsibility for diagnosis onto learners among those discussed. While ELPs usually give detailed instructions and provide can-​do descriptors, the challenge remains in interpreting what “being able to do something” actually means. This, however, is true for any rating endeavour and not a particular challenge of self-​d iagnosis. Similarly, it can be challenging for learners to distinguish between what they already can do and what they want to focus on in the next step. These challenges, however, can be overcome by training and by relating the descriptors to specific experiences. It can be assumed that learners literate in at least one language should be able to interpret what real-​world activities, like writing a postcard, involve, and with sufficient training, they are in the best position to judge whether they can achieve these language activities in the classroom or in the real world, as research into self-​a ssessment suggests (e.g., Faez et al., 2011; Little, 2010; Luoma & Tarnanen, 2003; Ünaldi, 2016). Provided that learners receive training to develop their self-​a ssessment skills and come to realistic judgments, the ELP allows for diagnosing and tracing learner development. Equally, if teachers are trained to use the diagnostic information obtained from the ELP to

110  How writing ability can be diagnosed

act upon in their teaching, it can be a powerful diagnostic tool for both learners and teachers. Illustrative examples of diagnostic tasks, tests and instruments: Dynamic assessment instruments and approaches

In SFL classrooms, the responsibility for the actions following diagnosis is ideally shared and these actions are co-​constructed. There are diagnostic approaches where this shared responsibility and the co-​constructed actions are part and parcel of the diagnosis, namely dynamic assessment rooted in the sociocultural paradigm (Chapter 2). We next discuss two examples of dynamic assessment. The Questions Test

The Questions Test is a computerized dynamic assessment procedure designed to yield insights into learners’ struggles in formulating English wh-​ questions with auxiliaries. The target audience is learners who already have some idea of how questions on English are formed (about level A2 by the author’s estimation). The test is delivered in the ICAnDoiT system hosted at the Centre for Applied Language Studies of the University of Jyväskylä (Leontjev, 2016a). Description of the instrument

The instrument is designed following the sandwich format of interventionist dynamic assessment (Sternberg & Grigorenko, 2002), meaning that the procedure includes a pretest, a treatment, and a posttest and the mediation is standardized. In the pretest and the posttest, learners work independently. In the treatment stage, mediation is provided whenever learners encounter problems. The tasks are contextualized in an imaginary situation requiring written correspondence with a pet shop, in which the learners are asked to formulate questions in emails of inquiry. The pretest and the posttest are the same: (a) question writing according to the given prompts and (b) a gap-​fi lling exercise, meaning that at least partially, it is a direct assessment of the learners’ writing. The computerized dynamic assessment (C-​DA) treatment includes two ordering exercises yielding insights into learners’ problems with the word order of wh-​ questions with modal auxiliaries, and three multiple-​choice exercises assessing the learners’ emerging abilities to formulate wh-​questions with auxiliaries “do”, “does”, and “did” respectively. Example items are given in Figure 4.15. The mediation ranges from implicit “think more carefully” to overt correction and explicit and detailed explanation (e.g., “Sorry, you need do before the word you and the verb are is not needed. The correct answer

How writing ability can be diagnosed  111

FIGURE 4.15  Items

in the Treatment Part of the Questions Test (Source: Leontjev, 2014).

is: …”). Leontjev (2016a) suggested that both learners and teachers make diagnostic inferences from the procedure. Learners build on this guidance in their subsequent performance. Teachers, examining the learner performance logs generated by the system, can see what their learners’ mistakes are and how much assistance these learners require from them (Poehner & Leontjev, 2020). What do we know about the instrument?

The validation of the instrument is discussed in Leontjev (2016a). The operationalization of the assessed construct was informed by previous research on the development of L2 English questions, especially question development in L2 English writing (e.g., Alanen & Kalaja, 2010). The procedures were trialled in a pilot study (Leontjev, 2014; 2016a) with learners studying SFL English in their year 8 of school education in Estonia. The study followed a randomized pretest-​treatment-​posttest design with a computerized dynamic assessment (C-​DA) group (n =​26) and a control group (n =​21) who went through the same exercises as the experimental group learners but were only told whether their responses were correct or not. The results indicated that the C-​DA group learners significantly outperformed the control group learners on the posttest. That is, the learners, as a group, were able to successfully apply what they learned during the C-​DA in their unassisted performance. In a discussion of the diagnostic value of the procedure for teachers, the author suggested that teachers’ observations of their learners working on the

112  How writing ability can be diagnosed TABLE 4.3 Part of a Learner’s Performance Log on the Does-​E xercise

Selected option

Mediation ( feedback)

1. Where do it plays in the shop?

Your sentence: Where do it plays in the shop? Think more carefully! Your sentence: Why do it looks sad in the photo? Look at the highlighted part of your question. Think, is everything correct there? Your sentence: When do it goes to sleep? You used the correct helping verb. But think about the word it. What do you have to add to the helping verb do? What should happen to the verb goes? Correct! Correct! Correct! Correct!

2. Why do it looks sad in the photo?

3. When do it goes to sleep?

4. When does it close on holidays? 5. What does it like to eat? 6. When does he come to work? 7. How long does it sleep at night?

C-​DA and learners’ performance logs can promote teachers’ understanding of their learners’ abilities. The author based this discussion mostly on the outcomes of teacher interviews. We will next briefly summarize how learners’ performance logs can be used to make diagnostic inferences about their performance, illustrating this with a learner’s performance on the does-​ exercise (Table 4.3). It transpires from the performance log that the learner has a specific problem with wh-​questions with “does”, but when their attention is drawn to the subject in the sentence and they are prompted to think about what should happen to the auxiliary and the main verb (item 3 in Table 4.3), they seem to realize what the correct structure should be (items 4–​7). This gives the teacher an idea of how this problem can be addressed when giving feedback on the learner’s writing. Furthermore, the teacher could try giving less explicit guidance should this problem occur in the learner’s writing to account for the learner’s possible development due to the procedure. In fact, all three parts of the instrument can better inform teachers as regards the range of their learners’ abilities. The pretest marks the baseline for the learners’ unassisted performance. The mediated part suggests how learner development can be guided, and the posttest demonstrates whether the learners have improved their unassisted performance with the mediation they received. Leontjev (2016b) proposed reflective discussions with learners where their C-​DA experience is elicited as an action for potentially addressing the issue of learners’ underusing diagnostic feedback due to their goals and beliefs (see the

How writing ability can be diagnosed  113

“DIALANG writing tasks” section in this chapter). The author reported how learners’ performance logs can be used to elicit learners’ experience during the C-​DA and potentially change their perceived usefulness of explicit and implicit guidance. Namely, knowing that a learner responded correctly with some implicit assistance, the teacher can ask leading questions, such as “Do you remember if you received this feedback?” and “Did you understand what the problem was after you saw this feedback?” Overall, the Questions Test is an example of how diagnostic assessment can be extended by providing graduated assistance in the process of diagnosis of writing. A human-​mediated dynamic assessment of L2 writing

Rahimi et al. (2015) reported on a human-​ mediated dynamic assessment procedure aiming at finding out about the sources of and addressing problems in the writing of 3 L1 Persian advanced learners of English. Description of the approach

The procedure involved ten weekly tutorials with the three learners, each lasting about 50 minutes. Each tutorial was organized in the same way. First, the learners were asked to write a text on a topic assigned by the mediator; the aim was to identify what the learners could do independently. During the second stage, the learners collaborated with the mediator while reviewing their texts. Aljaafreh and Lantolf ’s (1994) Regulatory Scale (Chapter 9) informed the mediation in the study. The authors classified the DA episodes in all tutorial sessions as either yielding insights into learners’ problems or having a developmental role. To elaborate, the mediation first aimed at finding out whether learners understood what they were supposed to do at a different part of the writing process and included questions such as “how do you define brainstorming?” Then, the focus was changed to resolving learner struggles, for example, helping a learner to understand the difference between outlining and brainstorming in the planning stage of writing. What do we know about the approach?

In their research report, the authors concentrate on identifying the learners’ conceptual problems, for example, brainstorming, outlining, or developing thesis statements, leading us to classify the focus of the diagnosis as somewhat broad. However, we are of the opinion that a similar approach can be applied to broader or narrower constructs.

114  How writing ability can be diagnosed

The authors illustrated that by mediating the learners’ performance, they were able to identify problems which otherwise would have remained obscured. That is, the mediator inquired if there were problems. They then built the following prompting, leading questions, etc. on the learners’ responses. This allowed the mediator to uncover the reasons for the learners’ impediments in the writing process. These problems were next acted upon in the mediation to follow, where the mediator strived to stay as implicit as possible. The mediator, therefore, shifted the responsibility for performance onto the learner. The authors then classified the mediation based on Aljaafreh and Lantolf ’s (1994) Regulatory Scale and marked how mediation changed across the ten DA tutorial sessions. The participants did not necessarily demonstrate a steady development evidenced in gradually less explicit mediation provided to them across the ten sessions. However, all three students eventually exhibited self-​regulation with regard to the problematic areas identified. The study is a good illustration of how mediation can be incorporated into the diagnostic cycle to yield insights both into learners’ abilities and whether and how the development of these abilities can be directed. Furthermore, it illustrates how Aljaafreh and Lantolf ’s (1994) Regulatory Scale can be used for tracking learners’ developmental trajectories in diagnosing SFL writing. Summarizing the discussed instruments and approaches

In this chapter, we outlined a variety of approaches to diagnosing SFL writing ability and its development. We also examined the main characteristics involved in diagnosing SFL writing, and the role the CEFR can play. We presented a range of instruments in order to illustrate the various ways that the diagnosis of writing has been approached so far. The constructs assessed by the diagnostic writing tasks included in our review vary considerably in breadth and type. The instruments that operationalize these constructs also vary in (in-​)directness. The narrowest component of the writing construct is probably found in the letter-​sound correspondence in GraphoLearn that could be called direct insofar as learners have to produce elements of writing –​letters, syllables –​that are the focus of assessment. GraphoLearn is a system that adapts to the learner’s performance, as do the dynamic assessment approaches. Roxify, the Questions Test, and DIALANG all focus on rather narrow components of the writing construct, viz. different aspects of vocabulary in the case of the first two, wh-​question formation in the Questions Test, and the writing sub-​skills of accuracy, appropriateness, and textual organization in DIALANG. What differentiates the instruments is that in Roxify, learners write entire texts, whereas, in the Questions Test and DIALANG, the diagnosis is based on discrete-​point items assessing aspects of writing indirectly.

How writing ability can be diagnosed  115

Broader aspects of writing are diagnosed in DELNA, VERA8, EDD, ELP, and Rahimi et al. (2015). In these instruments and procedures, learners are asked to write complete texts rated by human raters against a set of analytic criteria for diagnostic feedback, or, in the case of EDD, against a checklist. A complementary approach to diagnosing writing is found in self-​ assessment carried out with can-​ do statements, depicting specific writing activities, strategies or linguistic resources. This approach is operationalized in the ELP and also in DIALANG as it, too, includes self-​a ssessment statements. Self-​a ssessment, even when it is not accompanied by writing a text, can still be seen as direct assessment, particularly when learners refer to actual direct writing tasks they have attempted in the past. An interesting question arises as to the directness of assessment when learners have to self-​a ssess tasks with which they have no first-​hand experience. To give an example, a learner can assume what filling in a form involves, for example, seeing it in a video, without having experience of completing one in either their L1 or SFL. Yet, when asked whether they can fill in a form or not, this learner might as well respond positively, basing their judgment on an estimation of potentially being able to do it rather than on actually doing it. Certainly, this is a hypothetical situation, and in our view, more often than not, self-​a ssessment is based on lived experience. Main implications for diagnosing SFL writing

The instruments that we discussed in this chapter also illustrate different approaches to agency in diagnosis. The VERA8 diagnosis regards teachers as responsible for taking action, as there are materials directed at teachers but no explicit guidance for learners; it is the teachers’ decision whether to include their students or not. Other instruments give agency to both teachers and learners, such as the EDD, where teachers assess the writing samples, and learners are the intended agents responsible for taking action based on the resulting diagnostic profiles, even though teachers can use them as well. The Roxify system also gives agency to both learners and teachers, with the focus more on learners. They are given the choice about which feedback category (see Figure 4.8) they want to attend to, and whether to follow the links to websites containing further information regarding the lexical elements in their writing. Hence, Roxify offers a range of possible agency scenarios, from learners having full responsibility for the diagnostic cycle to teachers being agents in directing the diagnosis. DELNA also takes a mixed approach to agency. The main responsibility lies with the learners, since taking the diagnostic part of the system can be voluntary, and engaging in the support activities based on the results is optional. However, peer students and teaching staff play a significant part in the action that follows diagnosis.

116  How writing ability can be diagnosed

Other approaches, particularly involving self-​assessment, place the main agency with learners. Examples we used are the ELP, where learners are the agents of their own diagnosis, and DIALANG, where learners have the responsibility for acting upon the feedback they receive. Dynamic assessment approaches take an explicitly cooperative stance toward agency. The diagnosis of task performance and problems is conceptualized as co-​ construction between learner and mediator, as is the ensuing mediation. With regard to the actions following dynamic assessment, the agency can lie with both learners and teachers. Yet DA aims at learner development and self-​regulation; hence, it places a firm emphasis on empowering learners as autonomous agents. A third agent can be the instrument developers, whose role is particularly important in computer-​delivered assessments such as the Question Test or DIALANG. Here, developers have to anticipate the kind of problems learners may show and tailor the feedback in advance to anticipated strengths and weaknesses. Systems explicitly addressing learners have to ensure that they provide enough guidance for learners to benefit from the diagnosis and to take sufficient action. The GraphoLearn instrument is unique in that it is responsible for both diagnosis and action, so the main agent of diagnosis is the system. This is possible because of the precisely defined, narrow construct that the system assesses and trains. When creating diagnostic instruments, teachers and external developers should take into account inter-​and intrapersonal learner factors, particularly in approaches where more agency is placed on learners. For example, guidance could be provided with regard to the functions of the feedback, since some learners might misinterpret diagnostic feedback with corrective feedback, as was the case for DIALANG. Reflective discussions following diagnostic procedures, as suggested by Leontjev (2016b), could have the potential to ensure that learners make full use of the feedback provided (see also Huhta, 2010). The responsibility for designing and conducting these discussions lies with the teacher or a researcher. Dynamic assessment procedures like the one by Rahimi et al. (2015) are promising insofar as they have the potential to yield insights into the obstacles in learners’ writing process. We have to concede that reflective interviews and DA may be most feasible in research settings, as lack of time can prevent teachers from using such procedures to their fullest potential in the classroom. We now turn to the use of instruments and approaches for the diagnosis of processes, products and development. When it comes to diagnosing writing via products, all instruments eliciting components of writing can be used, be they indirect like the DIALANG tasks or direct like the VERA8 tasks. DA procedures working with tasks can also yield insights into what learners can produce and where they struggle.

How writing ability can be diagnosed  117

Self-​ a ssessment approaches which involve learners’ diagnosis of their own written products are also a means to product diagnosis. Writing processes can best be diagnosed by introspective or retrospective approaches, such as think-​a loud protocols or reflective interviews. As mentioned earlier, these approaches are rather time-​consuming, but there are alternatives such as questionnaires or checklists that ask learners to reflect on their writing strategies, which often yield insights into relevant processes. With regard to tracing learners’ development of writing skills, all the above-​ introduced instruments can be employed in longitudinal diagnostic designs. Furthermore, the diagnostic instruments we covered here can be complemented by ethnographic research methods, as Slomp (2012) suggested, such as interviews or extensive reflective writing pieces for learners to reflect and demonstrate their development of certain aspects of writing, which in turn can be used as illustrative texts in learner portfolios. These longitudinal designs require time and a certain level of systematicity, which can perhaps be achieved in research projects more easily than in school contexts. It seems that it cannot be presupposed that all teachers receive the necessary training to develop their diagnostic competences. Llosa et al. (2011), for example, found that teachers seemed not to know what diagnostic feedback information they would need to analyze students’ writing or how to use diagnostic information. We, thus, agree with Alderson, Haapakangas et al. (2015) that there is a need for teacher training in all areas of diagnosis (see Fulcher, 2012a; Hasselgreen et al., 2004; Huhta et al., 2005; Vogt & Tsagari, 2014), the more so since teachers need to pass on diagnostic competences to their learners. Alderson, Haapakangas, et al. (2015) also see the need for the development of diagnostic instruments, the effective integration of self-​a ssessment, as well as classroom-​ based assessment research and effectiveness studies when employing diagnostic assessment. In classroom contexts, models are needed that integrate systematic longitudinal designs, diagnostic observations, and feedback into further action and school grading. In this chapter, we presented and analyzed several concrete SFL writing tasks that can be considered diagnostic, even if not all of them claim to be so. We also discussed some factors that need to be taken into account when designing diagnostic assessments. To more fully understand a key component in diagnosis –​ the task –​we will review the literature on task characteristics in Chapter 5 to see how that might inform the design of diagnostic SFL writing tasks.

5 CHARACTERISTICS OF TASKS DESIGNED TO DIAGNOSE WRITING

Introduction

This chapter takes up the discussion of product-​ oriented diagnosis from the previous chapter. In order to elicit written products for the purpose of diagnosing writing skills, direct or indirect tasks can be used that either elicit a written text or that focus on underlying sub-​skills of writing. Such tasks need to be designed carefully in order to tap into those aspects that are to be diagnosed. In this respect, task design is related to the conceptual stage in the diagnostic cycle that was introduced at the onset of the book. At the same time, since tasks serve to elicit written products, this chapter also relates to the phase of the actual diagnostic assessment in our diagnostic cycle. In this chapter, we focus on task characteristics and their implications for generating diagnostic information, with specific attention to eliciting written products. While we acknowledge the dual nature of writing insofar that process-​related aspects are discussed here where necessary, yet they are taken up in more depth in Chapter 6. The written product is influenced by a complex interaction of variables relating to task, context, scoring, and rater variables, as much as by variables within the writer. In this chapter we focus on task variables, and where necessary on selected context and writer variables; aspects concerning scoring and rating will be covered in Chapter 7. Here, we also report research from the fields of testing, pedagogy and second language acquisition to explore implications of insights into task properties on the diagnosis of SFL writing.

DOI: 10.4324/9781315511979-5

Characteristics of tasks designed to diagnose writing  119

Task design is usually informed either by a curriculum or by theory, that is, writing (development) models or cognitive processing theories; those relevant for diagnosing writing are outlined in Chapters 2 and 3. With regard to designing diagnostic tasks, the tasks should aim at eliciting such components from curricula or theories that are relevant for a given context and purpose. Minimally, the following aspects would need to be considered when designing diagnostic tasks: • Language-​related aspects: a diagnostic task must offer the possibility for the writer to show the required linguistic and strategic competences; furthermore, it must allow for differentiation among different levels of ability. Language-​ related aspects in diagnostic assessment should be captured in as much detail as is required by context and purpose. The different ways of capturing these features are discussed below. For instance, it may be necessary to employ discrete tasks focusing on isolated linguistic aspects such as the use of cohesive devices in one context, while another context may require using a direct task accompanied by an analytic rating scale that depicts relevant analytic criteria (see Chapter 7), while yet another context may ask for automated assessment of certain linguistic features, such as lexical variety or sentence complexity (see Chapter 8). • Cognitive aspects of text creation: If the focus of the diagnosis lies on cognitive aspects, which we equal here with strategic aspects, a diagnostic task must allow the writer to go through relevant cognitive processes such as planning, formulating or revising, and it should allow the writer to employ certain strategies (depending on age and maturity, as discussed in Chapters 2 and 3). This can be achieved by giving writers planning and drafting time, or by explicitly asking them for revisions; another option could be to use portfolios, where a series of drafts, along with reflections, is collected (see Chapter 6 for process-​oriented diagnosis). Yet other ways of capturing and analyzing cognitive processes can be found in keystroke logging, eye tracking and intro-​/​retrospective approaches. We take these aspects into consideration here, when and where they are relevant for task design. • Discourse features and the social dimension of writing: If these aspects are relevant for diagnosis, the task must allow writers to demonstrate, for example, their genre knowledge, their ability to communicate in a specific contextual domain with members of that community, or their knowledge about the appropriate register. Hence, the task should account for relevant aspects such as the targeted readership or the required genre, or the task must contextualize the requested writing in a meaningful way (for more on socially-​oriented views of writing, see Chapter 2). • Personal characteristics of the writer: variables which could have an influence on SFL writing such as age, maturity, working memory capacity, motivation,

120  Characteristics of tasks designed to diagnose writing

or affective aspects (such as anxiety) need to be taken into account when designing a diagnostic task. Certain aspects such as topic unfamiliarity may cause anxiety. While it is difficult to control personal characteristics, they should nevertheless be considered in the overall design. Hence, it is important to design tasks that elicit construct-​relevant characteristics like engagement or motivation, and to make sure that the tasks do not cause construct-​irrelevant characteristics like cognitive overload. Writing was once assessed mainly by indirect tests, often by means of multiple-​ choice items targeting discrete linguistic aspects (as is also the case, for example, in the computer-​ based DIALANG; see Chapter 4), which were understood as indicators of overall writing ability rather than measures of features contributing to writing ability. However, between the 1970s and 1990s, there was a turn towards using direct writing tasks, that is, learners produce a written text, the evaluation of which serves as a measure of overall writing ability. Moreover, tasks were increasingly designed to reflect real-​world demands. This chapter starts with what is now the predominant approach, that is, direct writing tasks. We then shift attention to a critical discussion of indirect formats, as the latter are reported to possess diagnostic potential. This is followed by a discussion of task demands and task complexity, also with a view towards discussing what features are characteristic for which developmental stage of writing (see Chapter 2 for the development of writing). The chapter then turns to task specificity and discusses implications for diagnosing writing. Much of the research on writing assessment on which generalizations can be built is conducted in large-​scale contexts, which is why we examine implications of the insights drawn from such research for diagnostic purposes first for large-​ scale context. Yet we will also explore classroom contexts, not least because diagnosing students’ strengths and weaknesses is one of the concerns in the language classroom. Finally, we compare computer-​ based delivery with paper-​and-​pencil tests and explore the implications of the delivery mode on diagnostic task design. This chapter is dedicated to task design principles only. Readers who are seeking guidance on practical task development are referred to books and guides on task development, such as Council of Europe (2011), Downing and Haladayna (2006), Fulcher (2010), or Weigle (2002). Variables for designing direct writing tasks

Drawing on insights from frameworks of task characteristics from the testing field, we now briefly outline variables which are generally taken into consideration in large-​ scale assessment studies in order to develop

Characteristics of tasks designed to diagnose writing  121

reliable and valid direct writing tasks which are designed to elicit construct-​ relevant performances. These variables are also applicable to classroom-​based assessment, and hence will serve as the backdrop for all further discussions in this chapter. The framework of language task characteristics by Bachman and Palmer (2010) serves as our general frame. It delineates characteristics for five areas: setting, assessment rubric, input, expected response, and relationship between input and expected response. We will specify and illustrate these areas with characteristics relevant for writing tasks, thereby drawing on work by Hamp-​Lyons (2003) and Weigle (2002). A task is usually embedded in a diagnostic assessment, for which all five areas of the aforementioned framework are relevant. In a narrower sense, a specific writing task as we understand it encompasses all information given to the test taker, including general instructions/​rubrics, such as “read the following”, “you have 20 minutes”, “write 350 words”; the stimulus, for example, a text, a graph or a picture provided as stimulus for writing; and the prompt –​the outline of the writing situation, context, audience, purpose, etc. Examples of direct tasks are the VERA8 tasks introduced in Chapter 4. Exploring task variables relevant for designing direct writing tasks in a diagnostic setting, we use Bachman/​Palmer’s (2010) five aspects to systematize the different types of characteristics and to illustrate their interrelatedness. We focus particularly on rubric and input; the areas in [brackets] do not focus particularly on the development of direct writing tasks, but are more relevant for the broader assessment design. 1. Setting • [physical characteristics, participants and time of assessment] • delivery mode: paper-​and-​pencil, computer-​mediated environments 2. Rubric • instructions [general for the overall assessment], specific for one task, wording of instructions • structure of assessment, including number and sequencing of tasks, possible choice of tasks • duration: • time allowance overall and per task, differentiated for planning and for writing • space for planning and for writing • required length of response • recording method: • [type of record: description for meaningful feedback] • scoring criteria: information on scoring provided in task or in advance so that test takers know about the expectations

122  Characteristics of tasks designed to diagnose writing

• [procedures for producing a record (cf. Chapters 7 and 8), including rating conditions (human raters or machine-​ rated) and recorders (teachers or external raters)] • any other information relevant for the learners to understand what the assessment is about and what they have to expect. 3. Input –​here, we digress from Bachman/​Palmer’s (2010) suggested subcategories of format and language of input, as we find the following subcategories more relevant for the design of direct writing tasks: • subject matter, also called topic or topic domain • stimulus: optional, a task does not need to have a stimulus • a stimulus could be a picture or a graph, a one-​liner or a longer text, or a combination of different stimuli • wording and length of the stimulus • prompt and prompt wording, referring to how the rhetorical task and the pattern of exposition are worded, e.g., implicit, explicit, question, statement, etc.: • rhetorical task, also called discourse mode, for example, description, exposition, narration, argument, etc. • pattern of exposition, for example instructing the test taker to compare, contrast, outline causes and effects, etc. • amount of context and specifications provided with regard to, for example, audience, writer’s role, genre, tone, style, or register • wording and length of the prompt 4. [Expected response: We are dealing with this aspect in more depth in Chapters 7 and 8. • format, including length and speediness of response • expected genre • expected language • expected topic] 5. Relationship between input and expected response • interrelatedness: can be non-​reciprocal (if the writer does not get any feedback from an interlocutor), reciprocal (e.g., if administered in a dynamic approach), or adaptive • scope of relationship: ranges from broad (processing a lot of input, e.g., from an extended stimulus or prompt) to narrow (processing a limited amount of input) • directness of relationship: ranging from direct (successful response based mainly on task input) to indirect (successful response requires drawing also on information outside the input) • cognitive demands: amount of attention, memory and processing the task requires, not always stated explicitly in instructions or prompt, such as

Characteristics of tasks designed to diagnose writing  123

reproducing, re-​organizing, summarizing, synthesizing, etc. (see also the “Task demands and task complexity” section in this chapter). One seminal feature that occurs in several subcategories is language, namely the wording of instructions, stimulus, and prompt. The language needs to be of a level which is accessible and understandable for the targeted learner group. This request holds true both for instruction in the target language and in cases where the language of instruction of the learners is used in the tasks to ensure their intelligibility. For a more in-​depth exploration of language features, see Bachman and Palmer (2010, pp. 70–​78). Another important aspect refers to the stimulus: Task designers need to carefully select graphs and pictures with regard to the cultural and conceptual load they might bear for certain learner groups. If the aim is to diagnose writing skills and the underlying building blocks, diagnostic tasks need to be designed in such a way that they do not confound writing with reading skills, graph deciphering skills or knowledge of cultural and other concepts needed to interpret and understand images. If the diagnosis, however, also aims at diagnosing such image or graph deciphering skills, care has to be taken that these aspects can validly be differentiated from the writing skills. In this regard, diagnostic tasks may differ from tasks often used in proficiency tests that attempt to simulate contexts in which learners have to integrate, for example, reading and writing or work with multimodal materials. The task variables that are grouped within the subcategories are more closely related to each other than to other variables. For example, genre, rhetorical task, and patterns of exposition tend to influence one another. Certain (combinations of ) variables are reported to have an effect on task difficulty, that is, they affect the written output and the scores. Effects have been reported for the topic, its un/​familiarity and its level of abstraction (e.g., G. Lim, 2010), for discourse mode (e.g., Weigle, 1999), and for cognitive demands (see the review in the “Task demands and task complexity” section in this chapter). Task difficulty is, however, not determined by task variables in a unidirectional way; rather, it is a function of task demands and learner abilities. Hence, what may be more challenging for one learner may be easier for another, depending on their developmental stage and prior experience or exposure. Task difficulty is furthermore affected by the way the diagnosis is administered. For example, in interactive or dynamic approaches, the interlocutor or the mediator can alter and simplify the tasks if needed. Moreover, scoring approaches and rater variables affect task difficulty. This complexity makes research on the effects of certain variables on task difficulty very challenging since task, writer, interlocutor, scoring and rater variables interact (see, e.g., Hamp-​Lyons, 2003). In order to disentangle this inherent complexity, we focus on task characteristics

124  Characteristics of tasks designed to diagnose writing

and selected writer characteristics in this chapter, whereas rater and scoring aspects are taken up in Chapter 7. While all the above-​mentioned variables matter for task design, not all of them need to be addressed in every diagnostic assessment. Depending on one’s purpose and focus, the task designer has to make a conscious decision to address certain aspects while excluding others. Those variables of highest diagnostic interest have to be selected in order to design tasks which target specific diagnostic aspects and proficiency levels, operationalize certain task demands, and possess a specific level of complexity for a particular target group. Comparing indirect and direct tasks

Our considerations have so far concentrated on direct, performance-​based tasks, since this is currently the most widely used approach to assessing writing. However, indirect tasks have also been used to assess selected subcomponents of writing. The most commonly used indirect approaches include closed item formats such as multiple-​choice or formats that allow only very restricted responses such as gap-​fi lls, or their combinations, for example, gap-​fi ll items with multiple-​choice options. These task types typically target discrete aspects of grammar, syntax, vocabulary, register or textual organization (see Chapter 4 for examples of diagnostic tests that use these item formats, such as DIALANG or the Questions Test). When comparing direct and indirect task types, research –​mainly conducted up to the 1990s –​yields diverse results, depending on the study design. Some researchers regard indirect and direct measures as complementary, while others report that common traits exist across both approaches, and yet other studies find that the two approaches measure different constructs; one could also say that the approaches produce different data which can represent different constructs, such as receptive knowledge or productive abilities. Some studies came to the conclusion that indirect formats are not adequately reflecting problem-​ solving skills and the integrated nature of writing; yet others concluded that writing consists of differentiable components which should be assessed by discrete items (see, e.g., the overviews in Heck & Crislip, 2001, or M. Miller & Crocker, 1990). In what follows, we look at three more recent studies to illustrate that such differing research results need to be contextualized in order to understand how assessment contexts and purposes impact the choice of elicitation methods in research studies, which in turn impacts research results. We focus on the implications of these illustrative studies with regard to the feasibility of different approaches for diagnosing writing in different assessment contexts. Support for the use of direct tasks for school monitoring purposes is reported by Heck and Crislip (2001). Their study sampled 3,300 third graders

Characteristics of tasks designed to diagnose writing  125

(8 to 9 years old) from a variety of individual and school backgrounds. The researchers used standardized indirect measures and direct writing tasks to examine which of the two showed greater equity and fairness in a school setting, that is, the fairer measure should be less sensitive to background variables a school cannot control, but more sensitive to the variables being taught at school. They came to the conclusion that the multiple-​choice formats used in their study may disadvantage certain student groups; furthermore, multiple-​choice formats were less helpful in diagnosing student achievements in relation to the curriculum and what was taught in writing classes, while the direct formats allowed for “a more valid comparison between schools” and reduced “achievement differences … observed on multiple-​choice test for some student groups” (Heck & Crislip, 2001, p. 288). While only referring to one educational context, and acknowledging that the two approaches elicit very different data, this study nevertheless raises some serious issues with regard to multiple-​choice formats when it comes to using diagnostic tests for large-​scale studies monitoring school progress in SFL writing. However, this result does not necessarily mean that multiple-​choice-​based measures of writing could not be used for diagnosing individual learners’ receptive knowledge in the realm of writing as long as the assessors are aware of the possible effect of the item format. As research reviewed in Chapter 2 points out, learners’ linguistic and other background variables need to be considered in the interpretation of diagnostic information. Support for the use of indirect formats in a diagnostic test for formative, self-​ assessment purposes is reported by Alderson (2005) for the DIALANG writing component (see Chapter 4). DIALANG aims to assess SFL learners’ potential for writing rather than their actual writing abilities. Being administered online, measures were sought which could be scored automatically. Hence, DIALANG resorted to indirect, closed formats such as multiple-​choice and gap-​fi lling. The DIALANG writing items target the awareness of aspects which may cause problems when writing, operationalizing three subcomponents of writing ability: accuracy of grammar /​vocabulary /​spelling, knowledge of register /​ appropriacy, and knowledge of textual organization. The items are classified using a system derived from the IEA International Study of Writing (Gorman et al., 1988), encompassing cognitive aspects, purpose, content and audience. DIALANG reports the diagnostic results in terms of the proficiency levels of the Common European Framework of Reference (Council of Europe, 2001; see Chapter 2 for an introduction of the CEFR). In an analysis of the pilot data for English, no relation, however, was found for the three writing subcomponents and the items’ empirically calibrated CEFR levels. What was found was a significant relationship between the indirect writing items and the discrete grammar and vocabulary items (i.e., correlations of .77 and .79, respectively). Factor analysis resulted in one factor for all sub-​skills across

126  Characteristics of tasks designed to diagnose writing

the indirect writing items and the discrete grammar and vocabulary items. Alderson (2005) recommended including items targeting subcomponents of writing, as well as items targeting the sub-​skills of grammar and vocabulary in a diagnostic test, since the latter could explain large amounts of variance in the indirect writing items. It has to be conceded, however, that these indirect and discrete means measure linguistic resources, and were at no point compared to direct measures of writing ability. Support for combining direct and indirect approaches was reported by Gustilo and Magno (2015). Taking the writing model by Chenoweth and Hayes (2001; see Chapter 3) as a starting point, they examined the influence of text production processes (measured by a self-​report survey), topical knowledge (multiple-​ choice test), linguistic knowledge (productive grammar items, multiple-​choice test targeting vocabulary and spelling knowledge), writing approach and experience (self-​report survey) on writing ability (measured by two direct essays scored holistically). They found that writing ability is directly affected by topical knowledge, linguistic knowledge, and text production processes. Text production processes were also found to mediate the effects of different writing approaches and experience, as well as topical and linguistic knowledge. The findings corroborate Alderson’s (2005) recommendation to also capture linguistic knowledge when diagnosing writing. The results also imply that controlling for topical knowledge, writing approaches and experience, and capturing text production processes via, for example, self-​report, are required if a direct observation of processes is not feasible (see Chapter 6 for further details on diagnosing processes). Implications of (in-)direct approaches for diagnostic assessment

To sum up, there seems to be agreement that indirect approaches yield diagnostic insights into declarative knowledge and awareness of different subcomponents of writing, targeting rather narrow constructs, while direct approaches allow for insights into the application of such knowledge and awareness, and facilitate the diagnosis of constructs with a broader view on writing ability. In view of the existing research, diagnostic assessment seems to be best informed by combining the two approaches. Direct tasks are suitable for diagnosing complex writing processes and cognitive operations related to the activity of writing, while indirect approaches lend themselves to the diagnosis of precisely defined subcomponents of knowledge about writing. Like all decisions in task design and development, the decision about the most appropriate task format depends on the purpose and context of the diagnosis. If the diagnostic interest lies in the ability to perform real-​world writing tasks, direct tasks are most suitable, in order to elicit all relevant interacting components of the complex composition process, taking a rather

Characteristics of tasks designed to diagnose writing  127

broad stance on writing. The products can then be assessed by rating scales (see Chapter 7) and/​or by objective measures obtained through automated analyses of learners’ texts (see Chapter 8). The more complex and broad the construct is, the more elements the task should cover and elicit, and the more tasks one will need to cover the range of genres and writing activities in question. On the other end of the spectrum, if the interest is in diagnosing certain underlying subcomponents of the writing ability, or on the knowledge and awareness of relevant subcomponents measuring the potential for writing, indirect tasks may be more suitable to yield these kinds of data. The more specific, precise and narrowly defined the area of diagnostic concern is, the narrower, more specific and discrete the approach can be. However, automated analyses of learners’ texts can, to some extent, bridge the gap between direct and indirect approaches to diagnosing SFL writing: when learners can write real-​world texts that are analyzed automatically, the target of such analyses determines the degree of discreteness of diagnosis and, thus, feedback provided to the learners and/​or their teachers (see the Roxify and Text Inspector systems reviewed in Chapter 4). When deciding to include indirect approaches, one needs to be aware that indirect approaches provide complementary information about knowledge or awareness of sub-​components, but the individual parts alone may not be suitable to yield a diagnostic picture of writing ability as a ‘whole’. The same applies to the automated analysis of texts in terms of discrete linguistic and other elements: even a wide collection of such elements may not equal an overall view of learners’ writing ability. We agree with Alderson (2007a) that some measures may be better diagnostic indicators than others, yet further research is needed on the power of different indirect and direct measures of components of writing proficiency to diagnose writing difficulties and development. Task demands and task complexity

We will now explore factors affecting task demands, and thus the difficulty of tasks, since this is a central concern when designing diagnostic tasks. Task demands refer to the aforementioned cognitive requirements that tasks impose on writers, with regard to the amount of attention, memory and processing they require from a writer for a successful response. With regard to direct tasks, there is a wealth of research related to task complexity and task demands situated in pedagogical and second language acquisition contexts (e.g., Brown et al., 1984; Prabhu, 1987; Robinson, 2001a, 2001b, 2005; Robinson & Gilabert, 2007; Skehan, 1998; Skehan & Foster, 1999, 2001). While some of the studies seem dated, their findings still have implications for task design. Findings imply that task demands increase with task complexity. Referring to the framework of writing task characteristics outlined above, more demanding and hence more

128  Characteristics of tasks designed to diagnose writing

complex tasks are characterized by an increase in the number of elements to address, the broadness of the scope and indirectness of the relationship between input and response (i.e., an increase in the amount of processing input and referring to information external to the input), the amount of reasoning required, the level of abstractness of the topic, the level of complexity of the required operations, the number of perspectives to be taken, or the distance of the context. Less demanding tasks are characterized by access to topical knowledge (e.g., by choice of known topic or by a direct relationship between input and response), familiarity with the demanded cognitive operations, a narrow scope of the relationship between input and response (by providing a higher amount of information) and a higher degree of precision in tasks and instructions. While many of the studies and models reported in the literature target oral speech production and interaction, findings have been applied to the realm of writing, albeit to a lesser degree. We will review research focusing on writing, where possible, while also taking into consideration models stemming from the realm of speaking, where there are no such models for writing. Task demands and pedagogical tasks

In linking research on task demands to findings from cognitive research and cognitive models of writing (see Chapter 3), it seems reasonable to assume that higher task demands will have an effect on cognitive resources, and thus may impact text production processes and outcomes. Research and theoretical considerations targeting the consequences of increased task demands on linguistic output, however, are inconclusive. While Skehan and Foster (2001) suggest in their Limited Attention Capacity Model (LACM) that more complex tasks would cause learners to produce linguistically less complex output with more errors due to competing attention resources within a limited attention capacity, Robinson (2001a, 2001b,2005) in his Cognition Hypothesis claims that increased task complexity would not necessarily impact negatively on linguistic form since learners would possess different, non-​competing attention resources, and tasks demanding higher attention might actually lead to an increase in correctness and complexity. The availability of resources that are compensatory, in this case for the higher cognitive demands required by the task, is in fact supported by the Dynamic Systems Theory discussed in Chapter 2. The Cognition Hypothesis is partly supported by two studies by Kuiken and Vedder (2007, 2008), who found fewer errors in texts elicited from writing tasks with higher cognitive demands, while there were no differences in linguistic complexity. Even though Kuiken and Vedder (2007, 2008) report effects of

Characteristics of tasks designed to diagnose writing  129

task complexity and student proficiency on accuracy, no interactional effects between students’ proficiency levels and task complexity were found. With a view to systematizing variables that affect pedagogical tasks, Robinson (2007) proposed his Triadic Task Taxonomy, which entails three dimensions: i) task complexity characterized by cognitive demands, ii) task conditions characterized by interactional demands (relevant for interactive or collaborative tasks where learners might have to negotiate meaning), and iii) task difficulty characterized by learner factors such as their abilities, background or motivation. With regard to designing diagnostic writing tasks, particularly the two dimensions of task complexity and task difficulty, while developed quite a while ago, still have the potential to inform diagnostic task specification and design about relevant task characteristics and learner variables. Task difficulty in direct writing tests

Task characteristics are also of great interest to the testing field. Here, endeavours can be found to use characteristics known to influence pedagogical task demands in order to design test tasks which target certain proficiency levels. Once the tasks’ actual level of difficulty is known from administering it to test takers, one can investigate how far certain characteristics can explain task difficulty, e.g., via regression analysis. While this is a well-​established research field for the receptive skills, difficulty-​determining characteristics and their predictive power for task difficulty have been less thoroughly investigated for writing test tasks. We also have to concede that several studies point towards the difficulty of applying task characteristics found in pedagogical contexts to test contexts. This includes the studies by Iwashita et al. (2001) or Norris et al. (2002), both focusing on another productive skill, speaking. Test task characteristics are usually informed by models or frameworks of task characteristics, such as the ones mentioned above. The anticipated effects of certain characteristics on task demands and hence task difficulty are informed by research such as the studies outlined earlier in this chapter. Furthermore, there are now tools available in the context of the CEFR which facilitate specifying task characteristics and task demands in relation to CEFR levels, such as the Dutch Grid for reading and listening, or the ALTE Grids for speaking and writing. These grids are intended to facilitate the characterization and specification of assessment tasks, so that test developers or participants in benchmarking events can systematically analyze relevant task features in order to come to their estimation of the CEFR level that the task most likely operationalizes. Using the CEFR to inform task design has to be done with care, as the Dutch Grid project group came to the conclusion that the “CEFR in its current form may not provide sufficient theoretical and practical guidance to

130  Characteristics of tasks designed to diagnose writing

enable test specifications to be drawn up for each level of the CEFR” (Alderson et al., 2009, p. 5). We now exemplify how SLF writing tasks can be designed to operationalize specific CEFR levels, by using two large-​scale studies which report that certain task characteristics did have the anticipated effect on task difficulty. One study was situated in the secondary school system in Germany. It aimed to evaluate the German Educational Standards, which are based on the CEFR (Rupp et al., 2008). Harsch and Rupp (2011) report the writing task development. Direct writing tasks were designed to operationalize the CEFR levels A1 to C1. The task design made use of the following task characteristics: writing purpose, audience, social context, level of abstractness of content, text type and register, cognitive operations, speech acts, linguistic /​ organizational /​strategic expectations, time allotted, and expected number of words. For instance, a task operationalizing A2 would address only concrete and familiar content and require about 40 words, while a task operationalizing B2 would ask the test takers to write about abstract and also unfamiliar topics and produce about 200 words. The task characteristics were specified in the test specifications for each targeted CEFR level, so that the level of complexity and demands increased with each CEFR level. The assumption was that higher task demands would lead to higher task difficulties (as outlined above), which in turn require a higher level of proficiency. The underlying test specifications also made use of the ALTE writing grid. The writing tasks showed the anticipated order of difficulty when administered in a large-​scale assessment. The tasks were designed to operationalize specific CEFR levels. In a formal standard-​ setting endeavour, a panel of experts confirmed that the tasks were indeed operationalizing the targeted CEFR levels (Harsch et al., 2010). While this test-​oriented approach supports the claims of the tasks operationalizing certain CEFR levels, there was no examinee-​centred standard-​setting approach used to also support these claims on the basis of an analysis of the responses. Nevertheless, since these tasks were specified with regard to their target audiences, text types, communicative events and linguistic functions, and accompanied by detailed analytic ratings scales and descriptions of expected outcomes (Harsch & Martin, 2012), they have the potential to serve as diagnostic tasks. As illustrated in Chapter 4, selected tasks were in fact used in a nationwide low-​stakes diagnostic assessment (VERA8, Leucht et.al., 2009), where teachers administered them in the classroom and assessed the performances with analytic level-​specific checklists which were adapted from the original rating scales, so that teachers could evaluate how far the features in the checklist were met by their students’ performances. Unfortunately, no reports are available on how teachers made use of the diagnostic information. A similar approach to task design is reported in Shaw and Weir (2007) for the Cambridge English main suite exams, where level-​specific exams claim

Characteristics of tasks designed to diagnose writing  131

to operationalize certain CEFR levels. For the dimension of writing, certain task variables are operationalized in an ascending order of complexity. For example, the provided level of control and guidance decreases with more challenging tasks; the discourse mode becomes more challenging, ranging from descriptive at the lower levels to discursive and argumentative at the higher levels; the demanded length increases for the more challenging tasks, and the intended audience progresses from known persons such as writing to a friend to unknown communication partners. Moreover, the tasks are designed to elicit specific lexical and structural resources, which are informed by relevant CEFR descriptors. Empirical analyses of task difficulties support the assumed levels of task demands (Shaw & Weir, 2007). We have to concede that using task characteristics based on the CEFR to predict task difficulty (and hence student ability necessary to successfully respond to a specific task) may have caveats and can best lead to claims (that certain tasks operationalize certain CEFR levels) that need to be supported by empirical evidence such as formal standard-​setting endeavours. Notwithstanding these caveats, there are examples such as the two given above, where writing tasks were successfully developed targeting particular levels of proficiency by operationalizing a specific and carefully selected set of task variables and by taking into consideration findings about the effects of task complexity and task demands from the realm of SLA studies. The anticipated task difficulty and the targeted levels of proficiency were supported by further empirical evidence, such as empirically estimated difficulties and abilities, and formal standard-​setting procedures. This is a promising approach for designing direct diagnostic writing tasks that target certain proficiency levels. Indirect tasks and task complexity

Another perspective worth exploring regards the link between indirect tasks and task complexity (i.e., the cognitive demands a task requires): An early study by Morris-​Friehe and Leuenberger (1992) links the aforementioned continuum of task complexity to a continuum of writing difficulties when comparing direct and indirect tasks of writing. That is, they explain writing difficulties, and hence writing abilities, by a continuum of task complexity: indirect, discrete tasks focus on a “finite set of rule applications” (1992, p. 293) while direct tasks require learners to generate language which poses higher demands on writers as there are no “pre-​determined rule boundaries” (ibid.). Their study found that students with writing-​related learning disabilities experienced the greatest challenges with writing, writers with non-​w riting-​related learning disabilities were placed in the middle of this continuum, and writers with no disabilities were found to experience the least difficulties. This chimes with the above-​ reported results from SLA and language testing research that found task difficulty a function of writing ability. While

132  Characteristics of tasks designed to diagnose writing

SLA research generally refers to direct tasks, it is intriguing to open the task complexity continuum to also encompass indirect measures. What is also interesting is to open the writing ability continuum to encompass students with learning disabilities (see our discussion in Chapter 3), in order to design tasks of ascending complexity to match students’ continuum of writing difficulties. We will explore this continuum below, when we discuss the implications of developmental perspectives of writing (outlined in Chapter 2) on task design. Implications of task demands and complexity for designing diagnostic tasks

Findings on the complexity of pedagogical tasks, models, and taxonomies such as the Triadic Task Taxonomy (Robinson, 2007), together with the task variables from the language testing field delineated above can be helpful for designing test specifications and diagnostic tasks targeting specific expectations, demands, or proficiency levels. Such models and taxonomies can also facilitate considerations of what further contextual and learner variables to control for in the design of diagnostic assessments. Moreover, frameworks such as the CEFR (Council of Europe, 2001) or educational standards (e.g., the Common Core Standards in the USA www.corest​anda​rds.org/​, the Finnish National Core Curriculum www.oph.fi/​engl​ish/​curri​cula​_​and​_​qua​l ifi​cati​ons, or the German Educational Standards www.kmk.org/​the​men/​qual​itae​t ssi​cher​ung-​ in-​schu​len/​bildun​g sst​anda​rds.html) can be used as reference, with the potential to inform task demands, required functions and operations, and/​or expected communicative writing activities at a certain level of proficiency for certain learner groups. The most relevant features and variables for a given diagnostic context can be selected based on relevant models of writing and conceptualizations of task complexity. They can then be operationalized in tasks which by design are geared towards eliciting the writing dimensions and phenomena which are of most interest for the diagnostic assessment. Piloting the tasks will yield an estimation of their empirical difficulties, so that tasks can be sequenced to facilitate diagnostic assessment, following pedagogical rationales (e.g., Robinson & Gilabert, 2007). This will also be helpful for dynamic assessment procedures, where tasks focusing on specific phenomena are presented in an ascending order of difficulty or challenge. Thus, research on task variables, complexity, and other features which affect task difficulty and dimensionality can inform diagnostic task design. A useful way of conceptualizing the multidimensional concept of task complexity on a continuum ranging from very specific, narrowly defined, often indirect measures with a low level of complexity regarding cognitive task demands to highly complex, usually direct tasks with a high level of cognitive demands is suggested by Morris-​Friehe and Leuenberger (1992).

Characteristics of tasks designed to diagnose writing  133

This continuum of complexity can help design more precise tasks mirroring the continuum of writing abilities and writing difficulties, including writers with learning disabilities. What is implied in the above considerations of ascending task demands and complexity is the intricately interwoven nature of ascending complexity and skills development. If task difficulty and task demands are a function of learner ability, then they may also be conceptualized as a function of learner development, supposing that ability develops over time and that this development can be traced over time. SLA research has not only the potential to inform our understanding of increasing levels of task demands and complexity; it also can shed light on the acquisition and development of writing abilities, which in turn can be a further helpful source to inform task design. These aspects are now explored in more depth. Task design to capture the development of writing

We now explore what insights from the developmental perspective on writing (reported in Chapter 2) can be used for diagnostic task design. As discussed in Chapter 2, the development of writing ability is highly individualized. Different individuals will acquire different components at different points in their development as writers, even if they should receive the same kind of formal instruction. Development is influenced not only by instruction and teacher characteristics, such as the quality and amount of explanations, input, exercises, interventions, assistance, scaffolding, or feedback, but also by personal factors, such as age, educational background, cognitive maturity, or motivation, besides contextual and affective variables. Hence writing development takes different pathways with different learners: writing development in a child’s first language differs from writing development in (young) adult SFL learners in instructional settings, and yet different developmental tracks are to be expected with adult learners who did not acquire writing skills in their first language. Diagnosing writing development in SFL in (young) children, adolescents, or adults thus has to take into account their differing pathways with regards to general education and cognitive development, as well as their differing sociocultural backgrounds, living conditions and knowledge of the world. Furthermore, the development of writing ability is situated and contextual, meaning that instruction and discourse communities convey the values of what constitutes ‘good writing’ and thus have an influence on the developmental pathways. This has implications for the design of diagnostic tasks, since they have to take not only developmental but also instructional aspects into account. Writing development is multidimensional, taking place along a multitude of subcomponents such as vocabulary breadth and depth, sentence complexity

134  Characteristics of tasks designed to diagnose writing

or textual cohesion and argumentation, to name but a few (see Figure 2.4 in Chapter 2). With a view to capturing the development of the different subcomponents of writing ability, not much is known about how these subcomponents develop or how the development of subcomponents could be captured diagnostically (Alderson 2005, p. 156). Certainly, more research is needed to explore how specific subcomponents develop in certain contexts and learner groups. Can the descriptions of ascending proficiency levels of the CEFR help shed light on the development of writing and its subcomponents, and how can the development be captured? As many researchers and users of the CEFR have stated (e.g., Alderson, 2007a; Harsch, 2007), the levels of the CEFR provide a snapshot of development in time, yet they are not based on a theory of development. Moreover, the CEFR takes a performance-​ based stance, and hence, does not offer insight into characteristics relevant for diagnosing development over time. What the CEFR does provide, however, is a good “basis for test development at the level of interest” (Alderson, 2007a, p. 23), as illustrated by DIALANG described in Chapter 4, for example, and by the research conducted by the SLATE group on the linguistic basis of the CEFR (see Chapter 2 for a discussion). While we acknowledge the wealth and range of different factors which affect the development of writing ability and its subcomponents, here we focus on SFL learners and on factors which can be captured by writing tasks, in order to explore which developmental aspects are most informative for task design. We first look at the implications of selected development models discussed in Chapter 2 for the design of diagnostic writing tasks. We then turn to insights of SLA research on acquisition stages, before we discuss implications for diagnostic task design. Models of writing development in SFL

In Chapter 2 and 3, we discussed a range of cognitive and socially-​oriented theories of writing development. We now take up those models that are particularly relevant for task design. Since writing development takes place predominantly in educational settings or within discourse communities, it will be influenced by instructional progressions. Take the development of genre knowledge as an example: certain genres, such as writing a story, are taught rather early, while more complex, analytic genres, for example, an argumentative essay, are taught at a later stage. Hence, diagnostic task design should, wherever possible, take instructional progressions and curricula into account, as sequencing of instruction will influence development. There are, however, insights into developing writers transferring certain genre-​related aspects taught early on to more complex genres usually taught later on, as Llosa et al. (2011) report for skills such as

Characteristics of tasks designed to diagnose writing  135

using source texts, adhering to language conventions, structuring, providing supporting evidence, or using a thesis statement (2011, p. 270). Therefore, transfer theories may offer interesting insights into the design of transfer tasks targeting the generalization of learning from a known to a new context. Such tasks are typically used in dynamic assessment, where a series of tasks ascending in their cognitive demands and complexity is followed by a so-​called transcendence task, which requires learners to generalize the issue at hand to an unknown context. The ordering of tasks along a continuum of cognitive complexity reflecting a learners’ development reflects the insights from SLA research on task complexity which we have outlined above. One aspect that has been widely used in SLA to indicate development is growing syntactic maturity. While conceptualizing writing development as a process of syntactic maturity may be far too simplistic (as it reduces the highly complex concept of writing proficiency to just one aspect), it may be useful in so far as it can inform task response demands in terms of required sentence structures, which in turn are required for certain communicative functions such as developing a complex argument, or weighing pros and cons. While SLA research indicates a growth from simple to complex, the development of syntactic growths, however, is not linear, since highly complex sentences may not be a signal of maturity. Rather, complexity has to be assessed and interpreted in relation to the discourse, genre, and social context it is situated in (Haswell, 2000). Hence, socially-​oriented theories of writing development, such as genre-​models of writing development (e.g., Schleppegrell, 2004), models of expertise development (e.g., Beaufort, 1999, 2007), or models of domain learning (Alexander, 2003) can offer insights into relevant contextual factors for diagnosing development. These factors and their developmental order can inform diagnostic tasks tracking development or targeting (a range of ) different developmental stages. SLA research from a developmental perspective

Writing ability in a SFL develops in different ways in children and adults, due to factors such as cognitive maturity and literacy levels in one’s first language. Interestingly, however, there are indications that children (L1) and adults (FL) show certain parallels in their language acquisition, as stated by Robinson and Gilabert (2007), who report similarities in the development of linguistic expression from simple to complex concepts, from concrete to abstract topics, and from the ‘here & now’ to the ‘there & then’. They come to the conclusion that either FL learners in pedagogical settings and first language learners are exposed to similar language use, or beginners in a FL are not yet able to access and express complex ideas in the FL. Hence, it may make sense to order tasks in that sequence in which observed language acquisition takes place. Not only should this enhance language learning,

136  Characteristics of tasks designed to diagnose writing

but it could also inform diagnostic assessment, since tasks can be designed so that they better match learners’ developmental stages. This in turn leads us back to the above-​described insights on task complexity and task demands –​there is at least some indication that this continuum of ascending task complexity mirrors the learners’ developmental continuum to some extent. Having said this, we are aware of the interaction between instructional progression, the use of pedagogical tasks (which are offered in a certain sequence of assumed ascending demands) and learner development. Approaches to capturing development

Development can be captured as a process over a period of time; here, process-​ oriented assessment approaches such as a portfolio –​contextualized and situated in a specific instructional context –​or reflective self-​a ssessment over time can be suitable (see Chapter 6 for diagnosing processes). Development can also be tracked by using snapshots of the abilities in question at different points over a period of time. To elicit these snapshots, tasks of ascending complexity and demands can be employed, which mirror the developmental trajectories identified as relevant for the development of writing in a given diagnostic context. Here, SLA-​based models of task complexity, models of writing development and the CEFR can be informative. While the proficiency levels of the CEFR do not describe development, they do provide snapshots of performances at one moment of time; several such snapshots over time can provide insights into development. Hence, as detailed earlier, the CEFR can serve as a starting point to develop tasks targeting ascending levels of development. The two basic approaches to capturing development, namely the process-​ approach and the product-​oriented snapshot approach can also be combined, as is done for example in dynamic assessment, where one-​moment snapshots (tasks) are used with a longitudinal view to diagnose weaknesses and work on the development of these weaknesses via mediation. During mediation, the focus is shifted to cognitive processing; this is usually followed by another cycle of more challenging tasks and situations, more meditation if needed, and it usually ends with a transcendence task to stimulate and diagnose transfer and generalization of knowledge and skills (see Chapter 2). Implications of developmental perspectives for diagnostic task design

Tasks used to diagnose development could be ordered in a sequence which mirrors developmental stages, proficiency levels, or instructional progression. The above-​outlined continuum of task complexity, ranging from indirect items focusing on discrete features to ever more complex tasks which increase in their cognitive demands and the number of components they integrate, is a

Characteristics of tasks designed to diagnose writing  137

FIGURE 5.1 Development

and Diagnostic Tasks.

helpful guide to designing a sequence of tasks targeting a selected range within this continuum. Sequencing, and here particularly aspects regarding the level of details provided in the instructions, can also be informed by scaffolding principles (e.g., Gibbons 2002), as well as by considerations of transfer (see Chapter 2 on Dynamic Assessment, where the Zone of Proximal Development is also diagnosed with the help of transfer tasks; see James, 2008, 2014, for overviews of transfer in EAP contexts). Figure 5.1 captures our suggestion to use different diagnostic tasks at different points of development (developmental aspects are reflected in the cone, see Chapter 2 for Bereiter’s 1980 model and developmental stages), situated on a continuum ranging from narrow-​focused and simple to ever more complex and cognitively challenging tasks from knowledge-​oriented, discrete formats to ever more abilities-​oriented direct tasks, from known to unknown transfer tasks. At the beginning of SFL learning, discrete tasks may be most helpful which focus on those building blocks which are most relevant for that developmental stage. This is owed to the fact that the different building blocks underlying writing ability do not develop simultaneously and in parallel, but rather build on one another. Alderson (2007a, p. 27) points out that vocabulary knowledge may be the best diagnostic indicator at the initial stages of learning, while grammatical knowledge may become a better indicator at later stages. Yet in high-​level disciplinary writing, when the grammatical threshold has been surpassed, disciplinary-​specific vocabulary plays a strong role again, which can

138  Characteristics of tasks designed to diagnose writing

be taken into consideration when diagnosing this kind of writing. At this point it is interesting to note the relation between learning and transfer; as James (2014, p. 3, quoting Gick & Holyoak 1987, p.10) argues, learning and transfer can be viewed as a continuum, where “the consequences of prior learning can be measured for a continuum of subsequent tasks that range from those that are mere repetitions (self-​transfer), to those that are highly similar (near transfer), to those that are very different (far transfer)”. The more the underlying building blocks become automated, and the more components become integrated in one’s emerging writing ability, the more complex the diagnostic tasks need to become. Simultaneously, indirect tasks focusing on awareness and knowledge increasingly need to be complemented with direct tasks focusing on the application of this awareness and knowledge. The products elicited by direct tasks can be diagnosed by diagnostic rating scales (see Chapter 7) and by objective measures focusing on countable features and semantic analysis (see Chapter 8). While the latter often require a minimum text length to produce reliable indices, this can be achieved by using several texts produced by the same learner. At the upper end of the development and proficiency spectrum, we find ‘expert’ texts, the diagnosis of which require an increasing focus on analytic rating scales and expert judgment (see also Alderson et al., 2015). Level-​specific and multi-​level approaches to task design

Given the considerations above on the continua of task complexity and task demands, and the considerations on development and proficiency levels in writing, we now explore different approaches to task design, focusing in particular on tasks targeting a specific proficiency level as opposed to tasks spanning several levels. We discuss two ends of a possible continuum of approaches by exploring two examples from one large-​ scale assessment context –​9th graders in the secondary school system in Germany –​yet with different purposes, in order to illustrate the potentials and limitations of the two approaches. The first example illustrates the multi-​ level approach. It is taken from the DESI study, Germany’s first ever large-​scale assessment of 9th graders’ proficiency in English as foreign language and German as first language; for details on the writing tests see Harsch et al. (2007) and Harsch et al. (2008; DESI stands for Deutsch-​ Englisch Schülerleistungen International, i.e., German-​English student abilities international). Writing tasks were designed to reflect the curriculum of 9th graders across all of Germany’s three school tracks (lower, middle, higher), with no prior empirical information available about students’ abilities, yet an assumption that students’ abilities in English spanned a wide range across the three tracks. Hence it was decided to use

Characteristics of tasks designed to diagnose writing  139

writing tasks which are as open as possible in their demands to allow students with a wide range of writing abilities to show what they can do in a foreign language (English). The performances elicited by the open tasks were assessed with an analytic rating scale spanning multiple performance bands. The analytic assessment criteria were informed by relevant writing models and the curricula, while the band descriptors defining each criterion were informed by an analysis of student performances and by relevant CEFR descriptors. Results were reported on proficiency levels which were derived via multifaceted Rasch scaling, that is, a probabilistic statistical way of estimating task difficulties, learner abilities, and rater severity on the same scale, thus depicting a direct relation between tasks, learners, and raters. The open task approach seems appropriate in this context where no empirically informed assumptions exist towards what student subpopulations can do, and where no specific standards exist for these subpopulations. In such an open approach, tasks are characterized by brief and simple prompts on familiar topics that can be addressed by learners at different ability levels and that allow for a range of answers, for example, a picture prompt with the task to tell a story that relates to these pictures, where the answers could range from brief postcard-​like responses to longer descriptions of events, feelings and relations. One has to concede that it is not feasible to ask for specific genres in such an open approach. With the release of educational standards for the different school tracks in Germany (KMK, 2004, 2006), and informed by the DESI study, the succeeding evaluation of the educational standards in Germany for the three existing school tracks (lower, middle, higher track; see Harsch & Rupp, 2011; Harsch et al., 2010; Rupp et al., 2008 for details on the tests), could take a different approach to assessing writing. A level-​specific approach was considered feasible as the educational standards describe specific learning outcomes for the different school tracks (KMK, 2004, 2006). The writing tasks were designed to operationalize these specific educational standards, which in turn were derived from the CEFR. Hence, specific features relevant for each targeted CEFR level were defined in level-​specific test specifications, taking into account relevant CEFR descriptors, writing models, and research into task demands. These features were operationalized into tasks targeting a specific level; the elicited performances were assessed with level-​specific rating scales (for more details, see Harsch & Martin, 2012, and Chapter 7). The rating data were subjected to multifaceted Rasch scaling, and a formal standard-​setting study confirmed that tasks aligned to their targeted CEFR levels. The overall aim of the large-​scale assessment was to generalize writing abilities of different student subpopulations for which different educational standards existed. The level-​specific approach is appropriate in this context where students from a specific school track work on specific test booklets so that specific writing task demands can a priori be matched to students’ anticipated ability level.

140  Characteristics of tasks designed to diagnose writing

With regard to the level of specificity or openness of tasks, open tasks, that is, tasks that students at different levels of writing ability can successfully complete and which are assessed by rating scales spanning several levels of ability, lend themselves to diagnosis when students’ development or proficiency level is not known, for example, for an initial diagnosis of a new learner group. If more details about the learners’ ability levels are known, or the achievement of specific learning goals or educational standards are to be diagnosed, level-​ specific approaches, whereby tasks operationalize specific requirements of a targeted proficiency level and are assessed with rating scales that target performance expectations characteristic for a specific performance level, seem to be able to yield more precise diagnostic information. Diagnostic tasks used in large-​scale vs. classroom assessment contexts

Diagnostic assessment purposes and conditions vary, with a broad distinction generally made between large-​ scale and instructional classroom-​ based assessment. There is a third realm, namely diagnostic assessment, and diagnosis, for research purposes, which could be situated in both large-​scale or classroom contexts. We now explore what constraints these two contexts pose for designing tasks for diagnosing writing skills. Diagnostic writing tasks in large-​scale assessment

We will now explore critical issues arising from employing diagnostic writing tasks in large-​scale assessment. Examples for large-​scale diagnostic writing assessment are the aforementioned DIALANG test, the post-​ enrolment university assessment DELNA, and the VERA8 tasks in Germany; these tests are presented in Chapter 4. The diagnosis of writing in LSA in general is characterized by a focus on text products rather than the underlying writing processes (Fulcher & Davidson, 2009). Hence no diagnostic insights on writing processes or strategies can be gained. The products are usually elicited by a limited number of standardized tasks due to a concern for comparability of tasks and responses. Responses are usually assessed against a rating scale by raters who have undergone training and standardization, and tasks and rating scales are generally piloted and analyzed statistically for their properties. These steps aim to ensure that the variation in the responses is due to test takers’ proficiency rather than to a variation in task or rating scale dimensions or in the raters. While standardization contributes to the quality of the assessment, it has some limitations: The task standardization and the limited number of tasks often results in a limited construct coverage.

Characteristics of tasks designed to diagnose writing  141

Another critical issue in LSA is that task development often relies on theory-​based approaches since large-​scale studies usually do not cover one specific educational context with a specific curriculum. A related issue is that test takers are unknown to the assessors. Hence the question arises whether test takers know about the expected norms of ‘good’ writing they are assessed against, since assessors and assessees do not form one discourse community. Particularly if the assessment is to yield diagnostic information –​as is the case with post-​u niversity entrance tests such as DELNA (see Chapter 4) –​ the writing norms and expectations, and hence the task demands need to be carefully tailored to yield relevant diagnostic information in relation to what can realistically be expected from the test takers at the beginning of their university career. Particularly because test takers are not known to the assessors, it is important to control for background variables such as age, L1, gender, or educational background, as well as relevant educational contextual variables (e.g., Alderson, 2005; Heck & Crislip, 2001). Thus, the tasks have to be analyzed for their potential to reduce bias and their sensitivity to educationally relevant variables (Weigle, 2002). Diagnostic writing tasks in the classroom

Classroom contexts, in contrast to large-​scale assessments, are characterized by frequent assessments employing a range of tasks and topics (Weigle, 2002), often with the purpose of giving diagnostic feedback that focuses on selected aspects relevant for a particular learner or a particular teaching phase rather than on a set of standardized rating criteria that may not be known to students (as may be the case in large-​scale assessment). The teacher can observe development over time, teacher and students know each other and they share expectations, assessment criteria, and values regarding what constitutes “good” writing. This context makes it easier for teachers to design tasks fitting their classroom, the curriculum, their students’ cognitive development, sociocultural background, knowledge and level of proficiency. The main concerns in the classroom lie in fostering learning, achieving the teaching aims, and preparing for the real world. Here, diagnostic feedback is most useful if assessment tasks are aligned to teaching goals, activities in the classroom, and real-​world writing activities. The challenge in the classroom may be to incorporate relevant research findings and relevant writing (process) models into the design of writing tasks. Teachers also need to be aware of the task variables and complexity effects in order to take them into consideration when designing diagnostic tasks. Teachers’ knowledge about their students, their background and prior learning, the curriculum and what they have taught in the (writing) class forms the background for diagnosis. This can take many different avenues, such as indirect tests, direct writing tasks, observations, conversations (group

142  Characteristics of tasks designed to diagnose writing

and individual), learner diaries, portfolios, and many more. As discussed in Chapter 4, diagnostic assessment in the classroom should also encompass students’ perspectives, as well as processes, for example, via think-​aloud or verbal protocol approaches, as Alderson (2007a) and Llosa (2011) recommend. Particularly the technique of asking students to think aloud and talk the teacher through their reasoning chimes with Dynamic Assessment mediation approaches as outlined in Chapter 2. It can stimulate learners to reflect and articulate their thinking processes, and it offers teachers insights for diagnosing the level of understanding of a concept, or the origin of problems; yet the issue of practicability remains to be solved. We repeat here the five important principles of diagnostic assessment for the classroom by Alderson et al. (2014, p. 255f ) that we have outlined in Chapter 4 for readability reasons: 1. The teacher, if s/​ he is the one who diagnoses and uses the results, is responsible for the diagnostic cycle: s/​he first needs to listen and observe to establish the diagnostic focus, then to test, interpret the problem and feed back, and finally to find a remedy for the diagnosed problem. 2. Diagnostic instruments should be user-​friendly, targeted, and efficient to be used and interpreted by the trained teacher or user; they should provide rich feedback and be designed for a particular purpose, with a clear focus and capacity. 3. The diagnostic assessment process should include the views of various stakeholders, including the students, via self-​a ssessment, for example. 4. The diagnostic process should ideally encompass four stages: listen/​observe, initial assessment, use of tools/​tests/​experts, decision-​m aking. 5. Diagnostic assessment should relate to a treatment or an intervention which offers a remedy for the diagnosed problem. These principles fit with our understanding of the diagnostic cycle, and offer a systematic approach for diagnosis in the classroom that starts with analyzing the needs and establishing the goals of the diagnosis, and ends with a treatment that offers a remedy for the diagnosed problem and stimulates learner development if tailored in such a way that learners can take it on board. Teachers need specific diagnostic competences, which are part of teachers’ professional competences (see, e.g., Edelenboss & Kubanek-​German, 2004), yet often these are not part of teacher education. For example, Llosa (2011) reports that the teachers in her study seemed to not have a clear idea about what information they would want from an assessment in order to diagnose students’ writing skills, nor did they seem to know how they could make use of such information. Here, there is a call for teacher education to develop the necessary diagnostic competences amongst teachers.

Characteristics of tasks designed to diagnose writing  143

Comparing computer-​based vs. paper-​pencil delivery modes

With the increasing use of computers in educational contexts, a number of questions arise: Does it make a difference whether diagnostic writing tasks are delivered on paper or on a computer? Do the two modes have an effect on the cognitive processes involved, the language produced, the raters’ behaviour and ultimately on the assessment scores and results? Does students’ familiarity with the medium and their computer skills have an effect on their writing abilities? Typing has more or less become ‘normality’ in many parts of the world (e.g., Endres, 2012; Mangen & Velay, 2012). The foreseeable development towards an increasing use of mobiles, smart phones, and apps, and their effects on the diagnosis of writing are yet to be examined. Here, we will discuss implications from existing research. We focus in particular on the two main issues reported in the literature: the impact of the mode of delivery on writing processes, products and scores on the one hand, and students’ computer literacy on the other (Barkaoui, 2013; Wolfe & Manolo, 2005). Cognitive aspects

From a cognitive processing aspect, low computer abilities or writing in an unfamiliar mode may affect higher-​order processing. If computer-​t yping skills, for example, are not (yet) automated, or if students are not used to a specific keyboard layout, students need to pay additional attention to typing, which will strain their attention resources (Torrance & Galbraith, 2006). This in turn could influence text quality and scores (Alves et al., 2007; Connelly et al, 2007; Fayol, 1999; Horkay et al., 2006; Wolfe & Manalo, 2005). Moreover, if students are not familiar with a specific mode of writing (be it paper-​based or computer-​ delivered), effects on execution speed have been reported (Bourdin & Fayol, 1994; Olive & Kellogg, 2002). Such effects are not attributed to students’ writing skills but rather to familiarity issues or computer skills, which can pose a serious threat to validity and to score interpretation (Bennett, 2003; Burke & Cizek, 2006; Chapelle & Douglas, 2006). Comparability studies

There are many studies comparing the two delivery modes, often in a design where students choose the mode of delivery. Some studies control for background variables such as students’ L1 or their L2 proficiency. Other studies also look at the perceptions of students and raters, as this may have an effect on students’ choice and rater behaviour. Computer-​delivered mode is reported to produce lengthier texts when no word limit is given (Horkay et al., 2006; Li, 2006; Russell & Haney, 1997;

144  Characteristics of tasks designed to diagnose writing

Russell & Plati, 2002; Wolfe et al., 1996). Some studies indicate that the complexity of writing seems to be comparable across modes (Paek, 2005; Russell, 1999; Russell & Haney, 1997), while others suggest differential effects, such as fewer paragraphs and different punctuation in computer mode, depending on computer experience or familiarity with text processing tools (Chambers, 2008; Wolfe et al., 1996). In a meta-​analysis of 26 studies (conducted 1992–​ 2002), Goldberg et al. (2003) found computer-​based writing to produce longer and qualitatively better texts. When it comes to writing processes, different studies report different findings, yet caution is called for due to the often small sample sizes. Lee (2002) and Li (2006), for example, found that students revised more often and differently when taking a writing test on the computer. King et al. (2008), however, did not find such differences. The meta-​analysis by Goldberg et al. (2003) reported mixed results for the six studies which examined revision behaviour. Score comparisons

A large number of studies comparing computers with paper administration concentrate on score comparability, particularly in contexts where tests are involved which measure the written products or where an examination body investigates the comparability of the two modes. Studies often do not control for choice of mode, computer skills and other relevant variables, which may in part explain the at points contradictory findings. Some studies report higher scores for the computer mode, such as Lee, H.K. (2004) or Li (2006), who both conclude that it may be easier to manipulate the organization of an essay on a computer. Yet other studies report higher scores for handwritten essays, such as Green and Maycock (2004), or Breland et al. (2004), who report little difference in holistic scores, but find small effects for handwritten outperforming typed TOEFL essays when taking the students’ English proficiency level into account. Wolfe and Manalo (2005) found differential effects on holistically scored TOEFL essays: students who score lower on multiple-​choice items gain higher scores when handwriting, while students with higher multiple-​choice scores show no difference in their writing scores across the two modes. Yet other studies report no score differences, such as Blackhurst (2005), Harrington et al. (2000), King et al. (2008), Mogey et al. (2010), or Weir et al. (2007). Rater effects

Studies examining the effects of the two presentation modes on raters also yielded different results. In early studies where raters gave handwritten essays higher scores, some explanations covered surface features such as typed texts looking

Characteristics of tasks designed to diagnose writing  145

shorter (MacCann et al., 2002), typed essays containing fewer paragraphs or errors being easier to spot in typed text (Chambers, 2008; Shaw, 2003), or raters being more lenient and sympathetic to handwritten texts (Russel & Tao, 2004; Shaw, 2005; Whithaus et al., 2008). Importantly, it seems that when raters are familiar with the different presentation modes, familiar with computer writing and online scoring, and have received sufficient training, rater bias towards one mode or the other can be eliminated (King et al., 2008). Controlling computer /​keyboarding /​word processing skills

In order to disentangle the potential conflation of writing skills and computer skills, the latter have to be controlled for (Douglas & Hegelheimer, 2007). Studies taking computer skills into account, however, still yielded discrepancies in their findings: Some studies found that students with higher computer skills or more familiarity with computers score higher on computer-​delivered writing tests (e.g., Horkey et al., 2006; Russell & Haney, 1997; Wolfe & Manalo, 2005), while others found no effect between computer familiarity and delivery mode (King et al., 2008; Maycock & Green, 2005), while yet others found that tasks had more effect on scores than delivery (Barkaoui, 2013; Burke & Cizek, 2006). However, limitations of some of these studies are that they rely either on self-​ report data, or use of very different measures for computer skills. As a remedy, tests measuring editing skills, and the speed and accuracy of typing were suggested (Connelly et al., 2007; Ballantine et al., 2003; Horkay et al., 2006). Implications of the computer-medium on diagnosis

The studies reviewed here suggest some differential effects of computer skills on writing scores, depending on tasks and student background variables, which cannot be ignored, particularly not when it comes to diagnosing writing skills in an online environment. In order to control for potential effects of computer skills, diagnostic computer-​based assessment should take these into account via robust measures and via controlling expected length and time in the task instructions. Furthermore, raters need to be familiar with the mode of delivery and need to receive sufficient training (see also Chapter 7). One area which only recently has been taken on board is the analysis of keystroke logging, which offers a unique window into the writing, editing and revision processes in real time. While this approach may perhaps be less feasible for the classroom, keystroke logging has been used in dyslexia studies, revealing how dyslexic writers struggle with spelling and with the writing process in general (e.g., Wengelin, 2002). These studies also yielded important information for remediation. This line of research is highly promising for gaining deeper insights into real-​t ime processing and for facilitating the development of process

146  Characteristics of tasks designed to diagnose writing

models for computer-​based writing. This line of research could also facilitate the development of better techniques and indicators for diagnosing writing processes. A further interesting aspect is the impact of using word processing tools with revision facilities such as spelling and grammar checks or thesaurus assistance on the writing process and products (Oh, 2018). Oh examined the use of linguistic tools in SFL academic writing assessment. She found that tools that are relevant to the academic domain increased authenticity and validity, and lead to construct-​relevant assessment results. She also came to the conclusion that the academic writing construct needs to be extended to include the strategic use of such tools. More research, also with regard to the impact of articifial intelligence on text production, is needed to better understand the potential effects of these tools, and how these tools could be incorporated into the diagnosis. One aspect which has hardly been examined so far arises from linguistic issues with online speech and conventions when internet-​based formats are used in computer-​delivered environments. While the literature reports differences in punctuation, spelling and paragraphing across paper-​based and computer-​ delivered writing assessment (e.g., Chambers, 2008; Wolfe et al, 1996), little is known about relevant features of internet or email discourse. There is the possibility that tasks modeled on an online environment, such as emails, blogs, or chats, will have an effect on students’ writing, particularly when it comes to assessing online speech (e.g., Chambers, 2008; Ferris, 2002). As yet however, we know very little about salient discourse features or the rules of punctuation in online language environments (e.g., Crystal, 2006), and less so about the effect online speech could have on raters. More research is needed here. Main implications for diagnostic task design

We now summarize the main implications of this chapter on how to design writing tasks for diagnosis. When designing diagnostic writing tasks, one needs to consider language-​and discourse-​related aspects, cognitive aspects of task processing and text creation, the social dimension of writing as well as personal characteristics of the writers. The framework of writing task characteristics, which is based on Bachman and Palmer’s (2010) framework, can guide the task development, thereby ensuring that all relevant aspects receive the necessary attention. It encompasses the aspects of setting, rubric, input, expected output and the relationship between the latter two. Each of these aspects delineates a set of characteristics that can inform task development in order to operationalize certain characteristics, and that help specify writing tasks.

Characteristics of tasks designed to diagnose writing  147

Task specification with regard to relevant characteristics allows to transparently describe a task’s complexity with regard to the amount of information that has to be processed, the ratio of information given in the input vs. information that the writer has to provide, the familiarity with and complexity of the cognitive operations that have to be performed for successful task completion, the number of elements and perspectives a writer has to address, the distance to the context in which the writing task is situated or the complexity and level of the language that has to be processed. It is assumed that the higher a task’s complexity, the higher the task demands, and the more difficult it will be to successfully respond to a task. Task difficulty is conceptualized as a function of writing ability, that is, the higher a writer’s ability, the more likely it is that the writer will successfully accomplish a task. With regard to diagnosing writing ability at different levels of proficiency or the development of writing abilities, it may be helpful to consider the multidimensional concept of task complexity on a continuum from simple to increasingly complex tasks. The task format can also be integrated into this continuum. This allows matching task demands and student abilities, and it allows to meaningfully sequence tasks in longitudinal diagnosis approaches. In order to capture longitudinal development, one can use process-​oriented approaches (e.g., a portfolio), a series of momentary product-​oriented snapshots over time, or a combination of these two.

6 DIAGNOSING THE WRITING PROCESS

Introduction

In this chapter, we focus on the two general understandings of the process of writing, namely writing processes involved in the production of one text, typically in one writing session, and process writing, related to production of several versions of a text. We build upon the cognitive theories of writing in Chapter 3. The writing process is somewhat difficult to relate to the diagnostic cycle, the overarching framework underlying the book. Understanding the writing process is relevant to the planning stage in the diagnostic cycle, as that stage involves construct definition. The assessment stage is also relevant since assessment can also focus on the process rather than the product of writing. However, the feedback stage can differ depending on which meaning of the process is discussed. There may be no (external) feedback involved in producing a text in one session, whereas, in the process writing approach, feedback is crucial for revising the text. Let us consider some typical writing tasks. We usually have a reason for writing, which may arise from our personal needs or from the requirements of our work or studies. Usually, we plan our writing in some way: we may simply think about the task or list the main points that we want to cover. While generating text, we may stop to consider what to write next, to read what we have drafted, and to change our text. Some of us can write very fast, particularly if the task is familiar. However, writing can be slow and cumbersome, particularly when we write in a language that we do not know

DOI: 10.4324/9781315511979-6

Diagnosing the writing process  149

well. Furthermore, writing requires practice. If you do not write much in any language, writing can be burdensome. Moreover, some writing tasks are simply so challenging that they cannot be completed without considerable time, effort, and guidance from others. The outcome of these actions is a written text. However, in certain contexts, the written text is not the final step. Many texts are read and commented on by others, such as teachers, superiors, or colleagues; and the writer is often required to modify the text. In effect, the cycle of planning, writing, and modifying will start again, and the writer will eventually produce a new version of the text. This description illustrates the two basic meanings of the writing process. The first refers to the production of one full text (or part of it), which includes the act of generating text as well as planning and revision. The second way to interpret the writing process involves the production of several versions of the same text. This is often called process writing, and it is both a natural way many real-​life texts are written and a specific approach to teaching writing. The writing process in the second meaning can involve multiple planning, text generation, and revision stages. Figure 6.1 illustrates the two meanings of the writing process. The left side of the figure depicts the first meaning (writing one text). In fact, there are two ways in which text production can happen: a writer may write the text in one session (the lower path from Task Y to Draft 1 in the figure) ‒ this is typical of short texts. A writer can also draft the text incrementally over two or more sessions (the upper path from Task X to Draft 1). This is typical of longer texts. The right side (from Draft 1 to Draft 2) in Figure 6.1 illustrates the key aspects of process writing. Its main components are an evaluation of the text by the writers themselves or by external agents such as teachers or peers, and the incorporation of their feedback in a new version of the text.

FIGURE 6.1 Overview

of the Main Elements in a Typical Writing Processes.

150  Diagnosing the writing process

Some implications for diagnosing writing can be deduced from Figure 6.1. Diagnosing processes requires focusing on one or more of the following aspects: • • • •

Writer’s planning strategies; Writer’s text generation strategies; Writer’s text revision strategies; Writer’s responsiveness to feedback on a draft and uptake of feedback in the ensuing draft; • Changes from Draft 1 to Draft 2. Diagnosing different strategies requires a means to identify the strategies employed by a writer and to evaluate their effectiveness. When it comes to comparing different drafts (the right side of Figure 6.1), the diagnosis benefits from analyzing the drafts as products (see Chapters 7 and 8), together with whatever information (e.g., notes, observations) is available about the process that occurred between Draft 1 and 2. Here, the focus lies on evaluating the drafts against the task demands, the writer’s plan, and other assessment criteria relevant to the teaching context. The strengths and weaknesses of Draft 1 that have been diagnosed are then fed back to the learner, who ideally responds to the feedback by revising Draft 1. Draft 2 can then be diagnosed as a product in light of Draft 1 based on the feedback provided. In addition, the writer’s responsiveness to feedback (i.e., uptake of feedback) during the revision of Draft 1 is worth being diagnosed (see Chapter 9). Here, to link diagnosis to meaningful action, the diagnosis needs to consider the learner’s proficiency level and the focus of learning and teaching. Texts –​the products of writing –​have been investigated much more comprehensively than the underlying writing processes, which makes addressing the diagnosis of the writing processes challenging. How do the various writing processes differ? How might the writing processes at different levels of SFL proficiency differ? What information about the processes is useful diagnostically? How can we obtain such information? These questions are addressed in this chapter. Characteristics of the writing task that impact processes

To comprehend the writing process better, we start by analyzing some key features of writing tasks that emerge from Chapter 3 and Figure 6.1. As Table 6.1 illustrates, the writing process differs depending on the nature and purpose of the task and the availability of external feedback. Clearly, typing a text message to a friend on a mobile phone, penning an essay in class, or writing a master’s thesis differ both as tasks and as processes. Therefore, diagnosis should consider these differences.

Diagnosing the writing process  151

Table 6.1 presents task (and contextual) characteristics that can influence the writing process. The main organizing principle of the table is the type of writing task (the first column) because research shows that both the text type (e.g., narrative, expository, argumentative) and the topic affect writing processes (Deane et al., 2008; van den Bergh et al., 2016). Complexity (i.e., cognitive demands of a task due to the number of elements and the amount of processing required) is a key characteristic of writing tasks. As Chapter 5 discussed task complexity, suffice it to say here that in simple tasks, it is usually enough to consider only a few requirements such as factual correctness and appropriately informal style in a message about a meeting to a friend. In contrast, complex tasks require attention to several different requirements for their successful completion. Obviously, writing in one’s SFL adds to the complexity of most tasks. Length (i.e., the number of words and/​or pages) is another key characteristic of written texts, as it makes certain activities more likely and may prevent others. The number of sessions the writing process typically requires, and whether external feedback and revision of text are expected, all relate to text length. Text complexity and length usually go hand in hand but there are exceptions: summaries and abstracts are short but typically complex and time-​ consuming to write. Many short texts (e.g., notes and letters) are written relatively quickly during one uninterrupted session. The brevity of the text production usually curtails the time available for planning and revision, and precludes significant feedback from others during the process. In contrast, longer texts cannot be written in one session. Therefore, the process takes longer and forms a series of stages, as depicted by the upper path on the left side of Figure 6.1. Composing longer texts often involves external feedback from supervisors, reviewers, and examiners, but also evaluation by the writers themselves. Revision of the text is usually expected. It is also important to consider how complete the text is expected to be at the end of a writing session, or before external feedback is solicited. Obviously, relatively short, non-​complex texts are usually finalized in a single writing session, as are compositions written in the classroom. However, if the process writing approach is used (see the “Reviewing and revising across drafts” section in this chapter), students first write a text that they are expected to revise after receiving feedback from the teacher or peers. Really long texts such as theses and dissertations are typically commented on and revised section by section. Further characteristics of writing tasks important for the writing process include (1) the number of writers involved in producing the text, (2) whether writing is integrated with other language skills, and (3) whether writing is part of a multimodal composition. While these are relevant features of many writing tasks, they are outside the scope of this volume as we focus on the

newgenrtpdf

Type of writing task

Task complexity Text length

Number of Existence and source of external writing sessions feedback

Revision

Nature of the writing product (completeness; final vs non-​final)

messages, letters (free time; working life)

Low

Short

one

not expected

first version is complete final version

summaries, abstracts

high

Short

varies

possible

first version is a complete but not necessarily the final version first version is final version

no (except possibly in educational contexts by teacher) possible (teacher; colleague)

essays, compositions intermediate /​ relatively short one yes; teacher (mark, possibly (type 1; educational high comments) contexts) essays, compositions intermediate /​ relatively short two (or more) yes; teacher (comments (type 2); home essays high & mark; possibly peer (educational contexts) comments) routine reports low /​ Variable probably possible; recipients of (working life) intermediate several the report (possible co-​authors) non-​routine reports high Variable many likely; recipients of the report (working life) (possible co-​authors) academic articles high relatively long many yes; journal editors; peer (working life) reviewers (possible co-​authors, colleagues) theses, dissertations high long many yes; supervisor(s) (educational context) (thousands examiners to tens of thousands of words)

not expected

at least one revision expected possible

possible

first version is a complete text but only the second etc. version is final first version is complete final version

can vary (first version may or may not be a complete text) one or more first version (submitted to a revisions journal etc.) is already a expected complete text but usually not the final version revisions for supervisor(s): first incomplete expected; (often part by part) until possible completed /​ also after for examiner(s): complete text examiners’ (intended to be final) feedback

152  Diagnosing the writing process

TABLE 6.1 Characteristics of the Task That Affect the Writing Process

Diagnosing the writing process  153

contexts in which individual SFL learners write texts without integration into reading, listening, or speaking and where the texts are analyzed in terms of their linguistic and textual content (however, see our brief discussion of integrated tasks in Chapter 10). We direct the readers to Storch (2013) for an overview of collaborative writing and to the special issue of the Journal of Second Language Writing (volume 47, 2020) for research on multimodal writing. For example, Hafner and Ho (2020) introduce a process-​based model for assessing digital multimodal texts, which involves repeated evaluations of learners’ work during text production. Other contributions to the special issue focus on, for example, metalanguage needed in multimodal writing (Shin et al., 2020), discipline-​ specific multimodal writing ( J. Lim & Polio, 2020), learners’ language skills (Unsworth & Mills, 2020), and learning content (Grapin & Llosa, 2020). The following characteristics may be more relevant for the overall design of diagnostic assessment of the writing process than for its detailed implementation. First, who is the diagnoser? For most short tasks, the writers themselves may be the only ones who can assess the process. For the longer, more complex tasks, the teacher or a peer/​colleague may review the draft, provide feedback, and possibly re-​review how the writer has used the feedback. Second, what aspects of the writing process (and product) to diagnose? A longer writing process may enable the diagnoser to target different aspects of writing at different times, for example, first focusing on the content and organization and later targeting language and style. Furthermore, a longer writing process allows for focusing first on the planning of the text, and when a draft text is ready, on revision. The diagnosis of simpler writing tasks will probably place more emphasis on content and language, while diagnosing complex tasks may also focus on overall text structure or how effectively the targeted readers are addressed. Methods for diagnosing writing processes

Various methods are available for diagnosing the writing process, ranging from observations and interviews to think-​a loud protocols and keystroke-​logging. These approaches vary considerably in their practicality and the nature of the information they elicit. They also differ with respect to the stage of the writing process at which they can be meaningfully applied. Some approaches, such as the think-​a loud protocol, can be used across planning, text generation, and revision while others are more stage-​specific: keystroke-​logging or spell-​ checkers, for example, are mostly applied to investigate actual text generation. Not all diagnostic methods are available for everybody and in all contexts. Table 6.2 distinguishes three types of diagnosers –​learners, teachers, and researchers –​and lists the methods that are typically available for them. We wish to emphasize that, in principle, learners themselves are in the best position to diagnose their writing process since only they have direct and constant access to

154  Diagnosing the writing process

what they do when they write. Therefore, developing SFL learners’ awareness of their writing processes so that they can draw meaningful diagnostic inferences about them may be a particularly important aim of the diagnosis of SFL writing ability. As Table 6.2 indicates, the same diagnostic methods of the writing process, such as checklists, diaries, and portfolios, are often available for both learners and teachers. Some approaches, such as interviews or discussions and think-​ aloud protocols, are probably more easily used by the teacher or a peer. Researchers and developers of diagnostic instruments may use the same methods as learners and teachers, but they have access to more sophisticated procedures, such as keystroke-​logging, eye-​tracking, and even brain-​imaging. One very commonly used research approach –​think-​aloud protocol –​can potentially be used by the teacher to gain insights into learners’ writing processes. We now describe the typical diagnostic methods in Table 6.2 in more detail. Observation. In this procedure, teachers or peers watch what learners do when they plan, write, and revise texts. This is probably the most common approach used in the classroom and it is also commonly used in research (e.g., Hawe & Dickson, 2014; Llosa et al., 2011). The act of monitoring writing can also include self-​observation (see Chapter 3). Checklist. Checklists are used to ensure that specific points are systematically considered (Seow, 2002). This makes checklists very useful for self-​d iagnosis; going through the checklist will remind writers about what they should consider when planning, formulating, and revising their texts. Teachers (and peers) can also use checklists to structure diagnostic discussions with writers. Checklists need to be comprehensive, clear, and practical for the particular context (e.g., Chang, 2015; Struthers et al., 2013). Diary. This method allows writers to record their activities, problems, and thoughts concerning writing (e.g., Klimova, 2015). As writers create the content, the diagnostic usefulness of diaries depends on the relevance of that content. Therefore, discussions between writers and other agents (e.g., teachers) can help the writers gain insights into their writing processes and how these could be improved. Portfolio. Here, a portfolio refers to a physical or electronic ‘container’ that writers use to store (different versions of ) their texts to document for themselves and their teachers what texts they can write or how they have revised texts (see the upper left part of Figure 6.1). The portfolio is particularly helpful for both learners and teachers to gain insights into the editing and revision stages of the writing process (e.g., Romova & Andrew, 2011). The portfolio also enables participants to see how feedback has been utilized (e.g., Lam, 2016) and to serve as reference in teacher-​learner diagnostic discussions.

newgenrtpdf

TABLE 6.2 Overview of the Main Methods for Diagnosing the Writing Process (Including the Process Writing Approach)

Stage and sub-​stage PLANNING Goal-​setting, idea generation, organizing

Diagnosis (self)

Diagnosis (teacher, peer)

Diagnosis (researcher)

(self-​) observation; checklists; observation; checklists; interviews diaries; portfolio /​discussions; diaries; portfolio; think-​a loud

think-​a loud; interview; keystroke-​logging; eye-​t racking; brain-​imaging (but also the other methods listed under ‘self ’ and ‘teacher’)

(self-​) observation; text analysis tools

keystroke-​logging; eye-​t racking; brain-​ imaging; think-​a loud (also other methods)

TEXT GENERATION

Redrafting /​rewriting

(self-​) observation; checklists; observation; checklists; interviews think-​a loud; interview; keystroke-​logging; diaries; text analysis tools /​discussions; diaries; think-​a loud eye-​t racking; brain-​imaging (also other methods) (self-​) observation; text as above plus portfolio; text analysis as above analysis tools; checklists; tools diaries; portfolio

Diagnosing the writing process  155

REVISION Reading & evaluating

observation; interviews /​ discussions; text analysis tools

156  Diagnosing the writing process

Interview and discussion. The teacher –​or a peer or colleague –​can use these approaches to find out how the writer produces a text. A diagnostic interview can resemble a research interview in that it needs to have some pre-​ planned content and structure that aims at eliciting the desired information from the interviewee. The text produced by the writer is available and can be referred to as an aid to memory. Diagnostic discussion can take place in the form of mediation (see Chapters 2, 4, and 9), aiming to both diagnose the learner’s performance and improve it. Think-​ a loud and verbal protocol. Thinking aloud while planning, writing, and revising a text is a common way for eliciting information about processes (e.g., Barkaoui, 2010a, 2010c; Chapelle et al., 2015; Mahfoodh, 2017). Think-​a loud and verbal protocols can be done in real-​time, or writers can record their thoughts for a later diagnosis. Their drawback is that they may interfere with the very processes one is interested in. Possible alternatives are retrospective interviews and stimulated recalls, where writers are asked to recall what they did and thought during particular stages or activities. During such interviews, it is helpful to use stimuli (e.g., the texts, screen capture videos, or eye-​t racking recordings) to facilitate remembering the processes. Text analysis tools. Modern technology provides writers with a wide range of tools for text analysis. Word processing tools come with built-​in spell and grammar checkers that provide instantaneous feedback and advice, and several more sophisticated feedback tools have been developed for both L1 and SFL writing (see Chapter 8). Feedback from such tools can be addressed either when a text has been completed or it can push the writer to revise the text as it is being written (see the discussion of Hayes & Flower’s model in Chapter 3). Keystroke-​logging. In keystroke-​logging, a computer programme captures all input from the keyboard, the mouse or other input devices (e.g., Leijten & van Waes, 2013; van Waes et al., 2016; Wengelin, 2002; Wengelin et al., 2009, 2019). Keystroke-​logging has almost exclusively been used to investigate writing speed and variation in it, frequency and placement of pauses, writers’ attention, and movements. This information is relevant for understanding text generation processes but also sheds light on planning and revision. Eye-​tracking. In eye-​tracking, a special camera monitors and measures where one is looking at any given moment and how one’s gaze moves from one point to another. Of special interest are fixations –​points at which our eyes stop –​and saccades –​rapid movements of eyes between fixations. Eye-​t racking, too, is a research tool and has often been used together with keystroke-​logging to provide a more comprehensive picture of the writing process (Wengelin et al., 2009). Brain-​imaging or neuroimaging. A range of techniques can be used to directly or indirectly image the brain structures or how the brain functions during an activity such as writing. Some procedures detect electrical signals produced

Diagnosing the writing process  157

by neurons (Electroencephalography or EEG), others measure oscillations in magnetic fields produced by such signals (Magnetoencephalography or MEG), still, others analyze the magnetic properties of blood to track neural activity (Functional magnetic resonance imaging or fMRI). These tools are used for research purposes only (e.g., Nelson et al., 2009). It is beyond the scope of this chapter to cover all possible diagnostic approaches; we rather outlined the most prominent ones. The different methods discussed above can be combined to gain the best possible diagnostic insights into the different stages of writing. The writing process

We now turn to a detailed analysis of the main stages of the writing process and introduce two models that are particularly useful for diagnosing the planning (Flower et al., 1989) and revision/​review (Hayes et al., 1987) stages. They have been extensively used in writing research and they underlie some models focusing on SFL writing (i.e., Börner, 1989; Zimmermann, 2000; see Chapter 3). Planning

Hayes and Flower (1980) and Hayes (1996) divide planning a text into goal-​setting, idea generation, and organizing. Writers typically have one or more goals in their minds when they start writing (Hayes & Flower, 1980; Hayes, 1996), ranging from conveying information to creating positive impressions to earning good grades. How they address their goals depends on the writers’ understanding of task demands (Flower et al., 1989) and the intended audience(s) (Hayes, 1996). When planning, writers generate ideas. For this, they turn to their long-​ term memory and external resources such as the Internet, other texts, or other people. Idea generation depends on writers’ topical knowledge, but possibly also on their general language ability (Glynn et al., 1982). During organizing, writers arrange their ideas into a logical structure. The tripartition of planning into goal-​setting, idea generation, and organizing is already useful for diagnosis, as it can help identify the overall stage at which a writer has weaknesses. For a more comprehensive understanding of planning, we turn to the theory of constructive planning by Flower et al. (1989). They identify three kinds of planning strategies: schema-​driven, knowledge-​driven, and constructive planning. These provide an overall framework for writing as they direct writers’ goal-​setting, search for knowledge, and activities at the local-​ strategy and text-​production level (Flower et al., 1989, p. 5). Script- or schema-​d riven planning is characteristic of familiar writing tasks that have a structure shared in the particular discourse community, such

158  Diagnosing the writing process

as fairy tales or standard reports. For these tasks “the writer can call up a richly instantiated schema for the task complete with detailed prototypes …, conversational frames for narrative or argumentative discourse … and a script which suggests appropriate plans and goals …” (Flower et al., 1989, p. 4). Knowledge-​d riven planning is efficient for tasks on familiar topics for which writers can draw on an extensive memory storage (see also Beaufort’s 2012 model). Bereiter and Scardamalia (1987) call this knowledge-​ telling strategy (see Chapter 3). The limitations of the two types of planning are obvious: not all tasks have ready-​m ade schemas. The writer’s knowledge may also be inadequate for successful knowledge-​ d riven planning. Furthermore, knowledge-​ based planning does not suit all contexts, for example, when the writer fails to consider the needs of an audience that is less knowledgeable of the content matter than the writer (Flower et al., 1989). Constructive planning is needed for tasks that “require adaptive use of knowledge or for tasks which are more complex than available scripts or schemata” (Flower et al., 1989, p. 5). Good examples of these are argumentative tasks that typically involve a complex set of considerations regarding the audience and context, and require reasoning and critical thinking (Deane et al., 2008). Constructive planning resembles Bereiter and Scardamalia’s (1987) knowledge-​ transforming model (Chapter 3), and it should be understood as an overall approach to producing texts. It does not preclude using schema –​and knowledge-​ driven strategies within the overall text, whenever they are appropriate. Since the available schema- and knowledge-​based strategies vary across individuals, particularly inexperienced writers may be forced to use constructive planning even with texts for which well-​established schemas exist. Instructing writers to utilize such schemas could contribute to successful diagnosis. Furthermore, Flower et al. (1989, p. 16) describe five critical features in successful constructive planning: (1) building an initial task representation, (2) generating a network of working goals, (3) integrating plans, knowledge and goals, (4) instantiating abstract goals, and (5) resolving conflicts. Analyzing these features provides us with a detailed basis for diagnosing planning and for meaningful feedback (see Table 6.3 and Chapter 9). (1) Building an initial task representation is a key part of successful writing, according to Flower et al. (1989), who argue that writers facing a complex task have to decide on their own global goals for the text, create a clear representation of the audience, and decide what knowledge is relevant. Bereiter and Scardamalia’s (1987) knowledge-​transforming model (Figure 3.3 in Chapter 3) depicts how this task representation is formed. (2) Generating a network of working goals elaborates on the initial global goals. Flower et al. (1989) state that expert writers spend considerable

Diagnosing the writing process  159

time on this before starting to write but also during text generation, and review stages. This chimes in with van den Bergh et al. (2016), who found that goal-​ setting improves text quality if writers do it also in the middle and final stages of writing. This also applies to writing across drafts (see the “Reviewing and revising across drafts” section in this chapter). Diagnosing working goals should focus on establishing whether and when the writer generates these goals; advice could include various examples of such goals ranging from keywords and sentences to graphs. (3) Integrating plans, knowledge, and goals can be achieved by several strategies (Flower et al., 1989). Simply reading the task and listing ideas can help writers develop a coherent text structure. A more systematic approach would be to create top-​down representations of the text (e.g., hierarchical charts with (sub)goals). Experienced writers can achieve an integrated plan by monitoring how their emerging text compares with their goals and by modifying their plan or text accordingly. Intention setting is another strategy used by expert writers: Here, writers can decide to wait for their text to unfold to see if a solution to a problem emerges. The final strategy is consolidation, defined by Flower et al. (1989, p. 29) as “the mental act of pulling selected plans, goals and ideas into attention as a freshly integrated whole”. Flower et al. (1989) argue that a combination of intention-​setting and consolidation strategies is the most effective way to achieve an integrated plan for the text. The first step in diagnosing this aspect of planning is to establish how writers try to achieve integration, perhaps, by using checklists or interviews. Feedback can aim at increasing writers’ awareness of the different integration strategies and encourage them to try out different ones. (4) Instantiating abstract goals is the fourth major phase in planning, according to Flower et al. (1989). At this stage, writers attempt to bridge the gap between their goals and the production of the text. Flower et al. (1989) mention two common strategies. First, writers can use specific words or concepts as pointers to packages of knowledge in their long-​term memory. For example, when planning to write a text about theories of communicative competence for university language majors, a writer might write down the names of researchers, such as “Hymes”, “Canale”, “Swain”, “Savignon”, and “Bachman” who have created such theories. All these names stand for specific bodies of information in the writer’s mind, representing intermediate-​level goals between the more abstract goal –​increasing language majors’ understanding of communicative competence –​and the actual text, namely the description of the theories proposed by those researchers. Creating concrete how-​to elaborations (Flower et al., 1989) is the second strategy to bridge the gap between abstract goals and the text. The writer working on the above-​mentioned review might ask themselves such questions as “Should I start by describing the oldest theory and then move to the more recent views?” This

160  Diagnosing the writing process

can be the first step in generating actual text, and it illustrates how planning and text production are often intertwined in practice. Diagnosing the problems writers have when turning their ideas into text should first involve an attempt to establish, for example, with checklists, interviews, or verbal protocols, whether the writer struggles with doing that and whether they attempt to use either of the two strategies described above. The obvious recommendation to the writer is to try out these strategies. (5) Conflict resolution is the last major feature in Flower et al.’s (1989) theory of constructive planning. Conflicts arise typically from clashes between competing goals and the text, or from tensions between writers’ aims and their (limited) topical or linguistic knowledge. More specifically, writers may realize that the intended readers may not understand something they produced because of its content, or that they have used incorrect or inappropriate expressions. The conflict is typically triggered by the text, often during the review stage, and this is therefore another example of how planning, text generation, and revision are intertwined. However, conflicts can arise purely at the planning level when, for example, writers struggle to address competing goals. For instance, writers may hesitate between a desire to present their expert knowledge accurately and the goal of creating an interesting text for lay audiences. This may require making compromises in how comprehensively one should tell one’s knowledge. How do writers try to resolve such conflicts? A simple solution is to select one of the conflicting goals and give up the others. Another, word and sentence level strategy, is to produce a new piece of text hoping it would address the discrepancy. Making a new (local) plan that avoids the conflict is a further way to tackle a specific problem. The most ambitious approach to conflict resolution is to consolidate the discrepant goals in a new (global) plan. In the Flower et al. (1989) study, focusing on one of the goals was the most common strategy by all writers. In contrast, creating a new local plan was used almost exclusively by the experienced writers, as was designing a new global plan. Many novices, but also some accomplished writers, resorted to the straightforward production of new text to address local, text-​based conflicts. Thus, diagnosing writers’ approaches to conflict resolution must first try to ascertain the nature of the conflict, for example, whether the text does not match the plan. Feedback can then be based on the above-​mentioned strategies that successful writers use to address these problems. It is clear from the preceding discussion that the planning stage alone is a complex, multi-​stage, and multidimensional part of the writing process. Furthermore, planning is often intertwined with text production and revision. The complexity of planning makes diagnosis all but straightforward, and, therefore, useful diagnosis needs to have a solid basis in research. The planning stage mostly concerns the use of different strategies and no production of SFL text may be needed during this phase. Therefore, learners’

Diagnosing the writing process  161

limited SFL knowledge may not be a significant obstacle as they can use their L1 for whatever lists, charts, or figures they produce when planning. All diagnosis of planning, thus, need not necessarily be language-​specific. However, SFL knowledge does play a role in some sub-​stages of planning, particularly among beginners. For example, when building an initial task representation, learners have to consider their topical and discourse knowledge, and their limited SFL ability can make the treatment of a particular topic challenging. Hence, the impact of SFL proficiency does need to be considered. Flower et al.’s discussion of learners as active agents in setting their goals and relating their performance to these goals chimes with the socially situated theories of learner development (Chapter 2) and creates a strong argument for extending the diagnosis of learner writing to the whole writing process. Indeed, Lee (2017) discussed that student-​centred learning involves both teachers and learners having responsibility for goal-​ setting, monitoring, and addressing challenges in achieving the goals. Teachers, then, need to, in Jones’ (2010, p. 176) words, help learners “develop a clear understanding of the learning goals and success criteria against which their writing will be evaluated” (Lee, 2017, p. 43). This is echoed by Cumming (2006), who studied learners’ and teachers’ goals in SFL writing and argued that understanding students’ learning goals is imperative to understanding how to help them improve their writing. Elaborating on Cumming’s (2006) comprehensive work as regards learner goals, however, is beyond the scope of the present book. Research on planning in SFL writing

Theories of writing are usually based on L1 writing so it is important to review research on planning in L1 and SFL. We cover the studies by Manchón, Roca de Larios, Murphy, and Marín (2009) but also those by Breetvelt et al. (1994), van Weijen (2009) and van den Bergh et al. (2016), for example. Finding 1: Amount and timing of planning in L1 and SFL writing tasks is quite similar. Manchón and Roca de Larios (2007) found that learners’ planning of L1 and SFL writing was similar: both planned mostly during the first third of the writing session. Van Weijen (2009), too, discovered similar planning patterns for L1 and SFL. These findings are in line with Sasaki and Hirosi’s (1996) conclusion that writers’ composing competence may function similarly across languages. Novices’ planning behaviour is similar in their L1 and SFL, at least if they are relatively young, as in the studies above. Experts, too, plan similarly in their L1 and SFL, though differently from novices (see Finding 3). Finding 2: Timing of planning can affect the quality of the writing product. Breetvelt et al. (1994), Manchón et al. (2009), and van den Bergh et al.

162  Diagnosing the writing process

(2016) report that the timing of planning can be important. Some processes (e.g., studying the task to understand it) improved text quality if writers engaged in them only during the start, while others (idea generation) had a positive effect if carried out in the middle or final writing phases (e.g., re-​reading text) (van den Bergh et al., 2016, p. 58). Finding 3: Learners’ level of SFL proficiency (and age and/​ or experience in writing) affects planning. Manchón and Roca de Larios (2007) studies confound SFL proficiency and learners’ age and/​or experience as writers, so the effect of proficiency on planning is not clearly separable from these other factors. Nevertheless, their findings revealed that the youngest and least proficient group (secondary school students) planned only 2–​4% of the time in either L1 or SFL whereas the older and more proficient university students planned up to 12% of the time. When the two groups differing mainly in their English proficiency were compared, the more proficient group was found to spend more time on planning. Furthermore, the least proficient group reported focusing on the topic and text length whereas the experienced/​ proficient groups paid more attention to text organization. Van Weijen (2009) found that variation in L1 writing processes was related more to the task demands whereas the variation in SFL processes was associated more with the individual learners, as their SFL proficiency varied. Finding 4: Individuals’ planning styles vary. Manchón and Roca de Larios (2007) discovered at least two broad planning styles: some writers seem to plan in advance while others start writing quickly and plan while generating text. The latter style bears some resemblance to some integration and conflict resolution strategies in the Flowers et al. (1989) model (see also Cumming, 1989, and Deane et al., 2008, on the top-​down and bottom-​up approaches to writing). The implication of these findings for diagnosing planning is that learners’ SFL level and their L1 writing experience need to be taken into account. Furthermore, as planning can take place also during text production and revision, it is important to establish both which planning strategies learners use and when they use them. Text generation

Text generation concerns the actual act of writing; this stage of the writing process is referred to by various terms such as drafting, formulation, or translation (Chapter 3). Text generation is not necessarily a separate phase between planning and reviewing: it may even be the writer’s strategy to explore a particular goal or to solve a conflict (Flower et al., 1989). More recently, Deane et al. (2008, p. 9) have called this interweaving of planning and text production as a bottom-​up planning-​while-​w riting approach that highlights

Diagnosing the writing process  163

the role of text production as a way to discover new ideas. The text is used to complement a plan rather than vice versa. Torrance (2016) refers to this as reactive planning to distinguish it from deliberate planning that happens before writing starts. Generating text also takes place when writers revise their texts, therefore, also these two stages of the writing process overlap. Models of writing reviewed in this book do not depict text generation in as much detail as they describe planning and reviewing, or how all three aspects of the writing process interact. This is a challenge for determining how text generation should be diagnosed. It may be that the most useful diagnosis of text production needs to link it with planning or reviewing, or both. Our analysis of research summarized below suggests that at least these aspects of text production have diagnostic potential: • • • • • •

time spent on generating text (in comparison with the other stages); pausing while writing (length and placement of pauses); speed of writing (number of letters, syllables or words per time unit); length of bursts (amount of text generated between pauses): focus of writer’s attention; changes to text in the middle of production.

We next summarize research on L1 and SFL text generation. As Table 6.2 shows, this research relies on such methods as keystroke-​logging, sometimes combined with eye-​tracking (e.g., Leijten & van Waes, 2013; Spelman Miller, 2000; van Waes et al., 2016; Wengelin et al., 2009) or think-​a loud protocols (e.g., Roca de Larios et al., 2001, 2006, 2008). Research on text generation

Finding 1: Pause length indicates the nature of planning during text generation. Writers’ pauses can reveal what kind of planning happens during text generation. Torrance (2016) argues that understanding how planning and text generation relate requires attention to the timescales within which the mind operates. The author refers to Newell’s (1992) distinction between Cognitive Band operations that happen between 100 milliseconds and 10 seconds, for example, retrieving material from memory or performing one or a few simple actions, and Rational Band operations that exceed 10 seconds. The latter, longer timescale is thought to be typical of reasoning and problem solving, which are core elements in the advance planning of text. However, such activities can obviously happen during text generation, too. Research indicates that pauses are generally shortest between individual words, longer between sentences and longer still between paragraphs (e.g., Spelman Miller, 2000; Torrance, 2016; Wengelin, 2006). This pattern seems to

164  Diagnosing the writing process

hold for both L1 and SFL writing, even if pauses are longer and more varied in SFL (Spelman Miller, 2000). Finding 2: Text generation dominates SFL writing at the expense of planning and reviewing. Roca de Larios et al. (2001) found that text generation dominated learners’ writing; advanced learners spent about 60% of task time on formulating text and less proficient learners up to 80%. Furthermore, Roca de Larios et al. (2008) discovered that beginners spent an equal amount of time on drafting text across the entire writing session. In contrast, advanced learners spent more time on formulating in the middle stage of writing and less at the beginning or end of the session, which means that they planned and reviewed more in those stages. Roca de Larios et al. (2001) also detected that considerable time was spent on solving problems, particularly in SFL, which is consistent with Zimmermann’s model. Furthermore, Roca de Larios et al. (2006) found beginners spent ample time on attempts to compensate for their limited linguistic resources. In contrast, advanced learners focused on improving the lexis and style of their texts. Finding 3: Pausing behaviour can indicate dyslexia or other learning difficulty. Keystroke-​logging research on dyslexia in L1 shows how attention to text generation can help diagnose whether learning difficulties underlie some SFL learners’ weaknesses. Wengelin (2002, 2007) and Wengelin et el. (2014) found that the writing process of Swedish L1 speakers with dyslexia resembled SFL learners’ process: dyslexic writers wrote more slowly, paused more, and revised spelling more than writers without such difficulties. Particularly interesting was the finding that dyslexic L1 writers paused longer between the last letter of the sentence and the period than between the period and the next sentence (Wengelin, 2002, p. 270). Wengelin hypothesized that dyslexic writers wanted to read their sentences carefully, as they were aware of their writing difficulties, before concluding the sentence. Such pausing behaviour by an SFL learner may indicate learning difficulties such as dyslexia. Another indicator of possible learning difficulty is pausing within words (Wengelin, 2002). In summary, text generation is the stage that differs most between first and second/​foreign language writing, particularly among novice writers. The implications of research for diagnosing SFL writing are varied. First, studies on pause lengths suggest that longer pauses are likely related to the planning or review of more than just the immediate clausal context. Therefore, never stopping in the middle of text production may indicate less than adequate planning and review. Second, diagnosis needs to address the balance between planning, generation, and revision. If writers spend almost all of their time on producing text, an obvious approach is to try to increase their awareness of the importance of planning and reviewing. Third, certain pausing behaviour appears to be a symptom of dyslexia in L1 and, probably, also in SFL. Finally, studies show it is possible to address text generation problems with

Diagnosing the writing process  165

interventions: Snellings et al. (2002, 2004a) managed to improve SFL learners’ writing by enhancing their speed of lexical retrieval with simple computerized exercises (see Chapter 3). This suggests that the speed of lexical retrieval can be a useful focus for diagnostic assessment and intervention. Reviewing and revising across drafts

We now turn to the second meaning of the writing process, that is, the process of producing different versions of a text, often with the help of feedback from somebody else. Since reviewing and revising drafts are key features of process writing, we combine below the analysis of the review stage of the writing process with the discussion of process writing. While reviewing as a part of the writing process in its first meaning (see “The writing process” section in this chapter) is considered, the focus is on promoting our understanding of how external agents, above all the teacher, can diagnose writing by taking into consideration what happens in between the drafts. Process writing as a pedagogical approach (e.g., Heald-​Taylor, 1994; Larsen-​ Freeman, 2000; D. Jarvis, 2002; Seow, 2002) assumes that writing happens in cycles of planning, writing, and reviewing. Each cycle is directed by the teacher through individualized assistance. The writing process is often shaped in collaboration with peers through peer evaluation and feedback. As a result, novice writers learn about different stages of writing and consider the text from different perspectives: the writer, the reader, and the evaluator/​reviser. The pedagogical practice of process writing chimes with the notion of the diagnostic cycle. Different insights into learners’ abilities emerge and shape feedback and instruction at different stages of process writing. However, the notion of process writing can stretch beyond the pedagogical approach. Hence, we refer to the whole process depicted in Figure 6.1 as process writing regardless of whether it is a pedagogical practice or the process of, for example, writing a doctoral dissertation. In this section, we focus on what happens when Draft 1 is turned into Draft 2 (or Draft 2 to Draft 3), as this is the most relevant phase of process writing for diagnosis because the feedback provided at this stage should ideally be informed by a diagnosis of the writing process that led to this draft. Figure 6.2 zooms into the right side of Figure 6.1 and illustrates what is involved in moving from the first version of the text (Draft 1) to the next (Draft 2). In this scenario, we focus on an external agent (e.g., a teacher) who evaluates Draft 1, which is then revised by the author. In school contexts, the texts to be reviewed (Draft 1) are usually relatively short but complete pieces of writing. For longer, more complex texts such as dissertations, reports, etc., parts of text are often evaluated first, and the whole text only after it is complete (see Table 6.1).

166  Diagnosing the writing process

FIGURE 6.2 Writing

Process Across Drafts.

Diagnosis of the review stage of the writing process can focus on the following points: • • • •

what teachers/​d iagnosers pay attention to in their review; how writers evaluate their own texts; how peers evaluate other writers’ texts; how writers respond to feedback and revise their texts.

We illustrate process writing with an example of a typical classroom situation in which a learner is tasked with writing an application for a summer job. Upon its completion, the writer submits Draft 1 to the teacher or a peer, who then gives them feedback either immediately after reading the text or in the following lesson. The writer’s revision process differs depending on the place and time. When revising in the classroom, the writer can ask for assistance in interpreting and acting on the feedback. Revising the text at home is different, as the writer lacks immediate access to the teacher or a peer but can consult other sources of information. The time available for digesting feedback before starting revision also affects the process (e.g., Seow, 2002).

Diagnosing the writing process  167

Responsiveness to feedback (Figure 6.2) is an important element of revision. When the external evaluator is the teacher, the learner is likely to accept all or most of the feedback because of the teacher’s authoritative position (see also Chapter 9). However, in peer feedback, responsiveness may differ: Depending on how much of an expert this peer is considered by the writers, they may disregard some or all of the feedback (MacArthur, 2016, p. 275). A model for the review process

The Hayes and Flower (1980) and Hayes (1996) model that we use for organizing the discussion of the writing process divides the process into planning, formulation, and evaluation (review). For clarity, here we use the terms evaluation and review interchangeably for the process of evaluating a text to diagnose it and to give feedback to the writer. We use the term revision for the process of implementing the feedback to turn Draft 1 into Draft 2. Figure 6.2 presented an overall picture of the review stage involving external reviewers. However, the model equally applies to contexts where writers themselves evaluate and revise their texts (Hayes et al., 1987; see Figure 6.3). The revision model in Figure 6.3 was later modified (e.g., Hayes, 1996, p. 17), for example, by adding reference to memory structures. However, the earlier, simpler model is sufficient to conceptualize the revision process across drafts.

FIGURE 6.3 Process

Model of Revision (Source: Hayes et al. 1987, p. 185).

168  Diagnosing the writing process

Defining the task of revision

The “task definition” part in the Hayes et al. (1987) model refers to the writer’s and the reviewer’s understanding of what it means to improve a specific text. Task definition for revising is somewhat similar to building an initial task representation in the Flower et al. (1989) model of constructive planning discussed in “The writing process” section in this chapter, except that now the writer/​reviewer has the text available. Both involve building an understanding of what the task requires and how to construct a text that is compatible with the goals and context. Let us return to our earlier example of the learner who was given the task of writing an application for a summer job. Upon completing the first draft, the writer submits it for review by the teacher. The teacher then gives feedback to the writer who is asked to revise the text. Of course, the writer can also review and evaluate their own text and revise it accordingly (Figure 6.3). Importantly, the learner and the teacher may understand review and revision differently, depending on how they understand what the task (i.e., a job application) should entail. The learner may associate the review with reading the text sentence by sentence to find surface-​level problems. The teacher, however, may review the text by reading it in its entirety for comprehension, content, and style. Differences in understanding what the review requires depends on the reviewers’ knowledge of goals, criteria, and constraints of plans and texts. This is where expert and novice reviewers/​revisers are markedly different (Hayes et al., 1987). Experts have developed richer knowledge of how to set goals for review and attend to more global problems in texts. This difference can be explained with reference to the development of expertise discussed in Chapter 2. Once the knowledge in the domains (i.e., discourse community, writing process, and/​or subject matter) becomes part of writers’ repertoire, they gain access to strategies allowing them to evaluate the texts with reference to these domains (Beaufort, 2007). For diagnosis, it is important to understand how the writer understands the review and revision process. The preceding example should not be interpreted such that we urge teachers to always review at a more global level. Rather, the revision task should be defined with reference to writer abilities and teaching goals. Applying the revision model to SFL contexts calls for considering another reason for novice SFL writers focusing on surface-​level problems, namely the threshold in SFL proficiency (see Chapter 2). Peer reviewers who are SFL writers may have difficulty in transferring their L1 writing knowledge to SFL until they have reached a certain level in their SFL proficiency. A diagnostically competent teacher should consider the writer’s SFL proficiency as a potential constraint for planning and writing. Therefore, teachers should perhaps focus on surface problems in the review of novice SFL writers’ texts since it improves

Diagnosing the writing process  169

such learners’ linguistic proficiency and helps their progress towards writing expertise. Furthermore, such features as contracted forms, indirect questions, and passive voice, can be discussed with reference to genre conventions on a more general level, thus serving as a bridge to developing writing expertise beyond linguistic proficiency. To summarize, defining (i.e., understanding) the task of revision is important for diagnosing SFL writing, and it should happen with reference to writers’ abilities. Hayes et al.’s (1987) findings about how novice vs expert writers define their revision task suggest that teachers should try to learn how learners define and plan the revision process and modify their own review and feedback so that these suit learners’ needs. Evaluation

The key part in Hayes et al.’s (1987) model for this section is the evaluation (review) stage depicted in Figure 6.3. Reading is the fundamental activity in evaluation, but this is not just reading for comprehension, as the reviewer’s task is to improve the quality of the text or to help the writer do so. Hayes et al. (1987) argue that reading for evaluation and revision entails assessing whether the text meets certain criteria, identifying problems, and possibly diagnosing the reasons for the problems. However, they suggested that review need not necessarily involve diagnosis, and simple, straightforward rewriting is far less time-​consuming and cognitively demanding both for an external reviewer and the writer: For example, the writer can just accept the teacher’s reformulation without thinking why. While this strategy can be efficient to complete the text, it is less useful for diagnosing and developing learners’ writing ability. If writers do not understand why the reviewer proposed certain changes to the text, the writers are unlikely to apply this knowledge to a new text. Indeed, as Hayes et al. (1987) continued, expert reviewers can turn diagnosing problems into a powerful reviewing strategy. Thus, a way for teachers to increase their diagnostic competence would be to develop their expertise in reviewing. Furthermore, if peer review is used, writers as peers, too, need to develop skills in helping other writers (e.g., MacArthur, 2016). The importance of reading for writing, and the similarity of reading strategies and text-​reviewing strategies, are emphasized by many researchers. McCutchen et al. (1997), similar to Hayes et al. (1987), found that weaker writers read texts at the sentence level and often failed to identify higher-​level problems. In contrast, better writers skimmed through the text, which enabled them to spot bigger issues. This links back to our previous discussion of how differently novice and expert writers define the task of revision. Reviews by Deane et al. (2008), Shanahan (2016), and van den Bergh et al. (2016) also underline the importance of reading both for reviewing texts and for developing writing

170  Diagnosing the writing process

skills more generally. When SFL learners peer review, they can learn useful reading comprehension strategies, which can promote their own revision and, thus, writing abilities. Hayes et al. (1987) discussed three major types of evaluation by the writers themselves or others: (1) evaluation at the surface level, (2) evaluation of whether the text expressed the intended meaning (i.e., evaluation against the plan), and (3) evaluation of the plan against the set criteria. The authors noted that an external reviewer does not really have to know the author’s intended meaning to perform a surface-​level evaluation. They also argued that the third type of evaluation is the most useful. Indeed, as Bereiter and Scardamalia’s (1987) model suggests (Chapter 3), the first and probably the second type of evaluation fall under the ‘knowledge-​telling’ model of writing, where less evaluation is required than in the knowledge-​transforming writing process, where writers (and reviewers) also revise their understanding of the plan. However, SFL writers’ linguistic proficiency can impede their ability to transfer their expertise from L1 and/​or attend to their knowledge of the SFL discourse community, genre, and rhetorical conventions. Hayes et al. (1987), too, noted that expert writers attend to meaning more often than novices. The above discussion supports our suggestion that teachers should consider writers’ SFL proficiency in their reviews and suggested revisions. Teachers’ feedback can be based on their knowledge of what learners at a specific proficiency level can typically do. Writers at CEFR levels A1 or A2 are less likely to benefit from feedback on coherence and cohesion beyond simple connectives such as “and” and “but”. Furthermore, to what extent writers can incorporate the feedback on the first draft in their subsequent drafts, is, too, meaningful diagnostic information which allows for adjusting the subsequent feedback and instruction (see Chapter 9). Modifying the text

Text modification is most likely executed by the writers themselves, based on their self-​generated feedback or feedback from external reviewers and their evaluation of the relevance of the feedback (see Figure 6.2). The writers’ strategies for addressing the review depends both on how well the problem has been diagnosed by the reviewer and on the writer’s own expertise. When revising, novice writers tend to apply a limited number of more general strategies –​for example, deleting problematic sections –​to all problems, while expert writers, who have internalized many strategies, apply these selectively depending on the nature of the problem (see Chapter 2). Hayes et al. (1987) differentiated between four types of revision strategies: ignoring, delaying, searching, and revising/​rewriting.

Diagnosing the writing process  171

Ignoring a problem can happen if it is considered insignificant or if the writer/​revisor is unsure how to address it. These two scenarios can also be explained with reference to individuals’ zone of proximal development (ZPD; see Chapters 2 and 9). To be able to ignore a problem as insignificant, one has to possess enough expertise to identify the problem and know that it is minor. If a writer is unsure of how to address a problem, feedback may be outside the writer’s ZPD. Delaying as a revision strategy can be a conscious decision based on the writer’s plan, or it can be an ad hoc modification of the revision process. An example of a deliberate delay is when the writer decides to first pay attention to the meaning and only afterwards to surface problems (see also Hayes et al., 1987, pp. 224–​225). Searching refers to attempting to better represent a problem through memory search, building on one’s experiences, and through text search. This strategy is often triggered when writers receive external feedback that only identifies a problem, but does explain it nor suggest a revision. When writers act on the evaluation, they can choose between two modification strategies: revising and rewriting (Hayes et al., 1987). Revising refers to minor changes not altering the author’s plan. Deane et al. (2008) suggested that this kind of modification should be called copyediting or proofreading. Rewriting in Hayes et al.’s model involves more global changes to the original text and even to the plan. Hayes et al. (1987) separate the rewriting process into paraphrasing and redrafting. Paraphrasing involves rewriting sections, often on the sentence-​by-​sentence basis, while keeping their meaning. Redrafting entails attending to the text level, possibly abandoning large portions of text and modifying the plan. These text modification strategies relate to the three types of evaluation outlined earlier: surface evaluation, evaluation of the text against the plan, and evaluation of the plan against the criteria. As Hayes et al. (1987) found, expert writers/​revisers tend to adopt the redrafting strategy quite extensively, for example, when a text does not fit genre conventions or it has a number of problems. Redrafting by novice writers is challenging and requires guidance on how to rewrite and how to relate the rewritten sections to the whole (see Hayes et al., 1987, pp. 228–​229). It should be repeated that proposals for reformulation are often not useful diagnostically unless accompanied by explanations. Text modification strategies that writers attempt can be a valuable source for diagnostic feedback. The insights into how writers modify their own texts can be gained through, for instance, learner diaries, retrospective interviews, and think-​a loud protocols, as well as by comparing different drafts. Knowing when and why writers revise, paraphrase, and redraft their or peers’ texts can also help diagnosers see whether these strategies are effective.

172  Diagnosing the writing process

Research on revision in L1 and SFL

We now summarize findings from research on revision in both L1 and SFL. We start by outlining the key results of Fitzgerald’s (1987) extensive review of research on revision in L1 writing (equally extensive reviews of SFL revision are lacking, as are more recent reviews of revision in L1 writing). Fitzgerald’s review (pp. 491–​492) showed that: • • • •

beginning L1 writers do not revise very much in the first place; particularly younger writers need teacher or peer support in order to revise; individuals, both novices and experts, vary in the amount of revision; more competent writers tend to revise more than less competent or younger writers; • all writers tend to make more mechanical and surface-​level revisions rather than global and meaning-​related changes; • older and/​or more competent writers tend to revise meaning more and make sentence –​and theme-​level revisions more often than less competent or younger writers. Cumming and So’s (1996) small-​scale review of studies on tutoring sessions indicated that surface-​level revisions are very common also in SFL writing. Only about 10% of the revisions discussed in the meetings between learners and their tutors concerned global meaning. More recently, Stevenson (2005) investigated the revisions of Dutch lower secondary school students’ L1 Dutch and SFL English writing with keystroke-​ logging and think-​ a loud protocols (see also Stevenson et al., 2006). The students were found to revise mostly language and surface-​ level features, particularly in the SFL, and to make relatively few content changes. Most language revisions in both languages concerned punctuation and phrasing. Revision of spelling, grammar and vocabulary was also quite frequent but clearly more common in the SFL. Most revisions were made at the word and clause level, particularly in the SFL. The writers usually revised the text –​especially typing errors –​immediately after they had generated it. Substitutions were by far more common than deletions or additions, particularly in the SFL. Stevenson (2005) also investigated if the students’ writing fluency and type of revisions were related to the quality of their L1 and SFL texts (see also Schoonen et al., 2009). The findings suggest that writers paid less attention to conceptualizing in SFL and possibly did more local reading, which resulted in their SFL texts being rhetorically less well-​ developed. Schoonen et al. (2009, p. 88) concluded that language learners spend more time on addressing linguistic problems, which can hinder their attempts to create rhetorically more

Diagnosing the writing process  173

complex and effective structures to their texts (see also the “Text generation” section in this chapter). Diagnosing the nature of revisions is rather straightforward, if one has available two or more versions of the text that can be compared, such as in a portfolio, although several other methods exist for direct tapping of the revision process, such as observations, checklists and interviews. Research on revision suggests that a common issue in both L1 and SFL is the writers’ focus on the surface level of the text. Feedback aimed at drawing writers’ attention to the global text level seems particularly important for novice and less proficient SFL writers. However, writers’ limited SFL proficiency can hinder their use of organizational and textual-​level feedback. Here, teachers’ familiarity with their learners’ needs can guide their decisions on what processes to focus on in diagnostic feedback. Main implications for diagnosing SFL writing

We conclude by summarizing the key takeaways for diagnosing SFL writing processes. This includes suggestions for feedback and actions that aim at improving learners’ writing processes. Methods for diagnosis. Diagnosing writing processes requires that information about them can be obtained in the first place. In some cases, it may be possible to draw inferences about the processes by analyzing texts, the products of writing (see Chapter 7). However, many methods can tap various writing processes, many of which are feasible for classroom use, such as checklists, observation, interviews, portfolios, and automated text analysis tools. It is important to remember that the writers themselves have the most direct access to what they do when writing, so training them to self-​d iagnose their writing processes can benefit them significantly. Scope of diagnosis. Research shows that SFL writers tend to plan only to a limited extent and make mostly surface-​level revisions. However, this does not mean that learners cannot plan and revise the content and organization at the more global text level. To do so, they need to become more aware of that level, and have opportunities to practice systematically. Integration of the stages of the writing process. For teaching, it is convenient to divide the writing process into planning, formulation, and revision. In practice, however, these stages overlap, and intertwining them is typical of experienced writers. Hence, diagnosis may also examine how far emergent writers integrate the different stages, and diagnostic feedback may provide integrative strategies such as checking how one’s text fits the plan during and after actual writing. Level of learners’ SFL proficiency. Research indicates that learners’ SFL proficiency impacts their writing processes. For example, beginners spend more

newgenrtpdf

Stage and sub-​stage PLANNING Goal-​setting (e.g., Building initial task representation; Generating network of working goals) Idea generation Organizing (e.g., Integrating plans, knowledge and goals; Instantiating abstract goals)

Monitoring (e.g., Conflict resolution)

Strategies (action) for the writer that may be useful to address problems occurring at particular stages Analyze the writing task for its purpose(s) and intended audience(s); Analyze what you know about the topic; Analyze what you know about the genre (type of text); Decide an overall strategy and main & sub-​goals; Use keywords, figures, images. List what you know about the topic; Consult external sources (e.g., Internet, peers). Draw a (hierarchical) plan /​chart or mind map with goals and sub-​goals; Compare your emerging text and your plan; Summarize your plan into a few sentences; Write more text and check if that helps you organize your text; List keywords/​terms/​names that you are likely to use; Ask yourself concrete questions about what to write next. Compare your text and your plan (do they match?); Do you have conflicting goals? (Can you focus on only one of them? Can you change the problematic part of your plan?); Consider writing more text to see if a solution emerges as you write.

TEXT GENERATION Revisit your plan as you write to check that you follow it; Use keywords etc. in your plan for ideas for concrete text; Use tools to check your spelling and grammar; Use dictionaries and thesauruses for checking vocabulary and for varying expressions. REVISION Reading & evaluating Redrafting/​rewriting

Compare your plan and text to see how they match; Does the text match the intended goals, audience, and genre? Can you get feedback from somebody? Compare your new text and the plan to see if the new text fits the plan better.

174  Diagnosing the writing process

TABLE 6.3 Examples of Strategies (Actions) at Different Stages of the Writing Process

Diagnosing the writing process  175

time generating text and solving linguistic problems and process all SFL tasks in more or less the same way even if the topic or genre might make different strategies more appropriate. Taking writers’ proficiency into account in the diagnosis is, thus, important. We will discuss the implications of proficiency for diagnosing SFL writing further in Chapter 10. We conclude this chapter by summarizing in Table 6.3 the types of activities or strategies that may be useful for writers diagnosed having weaknesses at particular stages of the writing process. In the next two chapters, we cover two important approaches that are used in diagnosing SFL writing: diagnosis by rating writing performances and diagnosis by analyzing the performances automatically. Usually, both approaches concern the products of writing but in some cases, they can also yield insights into the writing processes.

7 ANALYZING WRITING PRODUCTS BY RATING THEM

In this chapter, we return to the product of writing and approach it from the perspective of ratings carried out by human raters such as teachers or professional assessors. With regard to the diagnostic cycle introduced in the first chapter, this chapter is situated in the realm of assessing the elicited products by means of assessment criteria, with a specific focus on rating scales and other such coding schemes that can yield diagnostically useful information about learners’ strengths and weaknesses in their SFL writing. This latter aspect taps into the feedback phase of the diagnostic cycle, yet due to its importance for diagnosis, we dedicated Chapter 9 to the feedback phase. The chapter here first outlines the properties of rating scales designed for providing diagnostic feedback (e.g., Knoch, 2009a; Knoch, 2009b), and discusses some fundamental design principles. We present and analyze a number of scales available in the literature on writing assessment that have been (or could potentially be) used for diagnostic purposes. Examples of such scales include scales based on the CEFR that were designed and used for large-​scale diagnostic assessment in Germany (Harsch & Martin, 2012; 2013), and scales specifically designed for diagnostic purposes in Australia. The chapter then discusses what challenges human raters might face when striving for diagnostic ratings such as the halo effect and the raters’ ability to make relevant and sufficiently detailed observations about the written products. We will also discuss the implications and challenges of designing and using rating scales and coding schemes both in large-​scale and classroom-​based assessment contexts. Within the latter, we will also look at using rating scales for self-​a ssessment. The chapter is rounded off with an outlook on the limitations of diagnostic scales and possible ways to address the aforementioned challenges. DOI: 10.4324/9781315511979-7

Analyzing writing products by rating them  177

It is important to differentiate between scale purposes because different purposes require different scale features. Alderson (1991) distinguished user-​ oriented proficiency scales (such as the CEFR scales) from assessment-​oriented rating scales, where the latter can further be differentiated for specific assessment purposes such as diagnostic ones (Pollitt & Murray, 1996). One scale type cannot directly be used for other purposes without appropriate modifications. In order to use, for example, the CEFR scales for diagnostic purposes, the context-​independent, generic and abstract can-​do CEFR descriptors first have to be adapted to the specific local assessment context in which they are to be used, as for instance Harsch and Martin (2012) have shown. Such diagnostic scales need to describe specific features relevant and expected for this context. This chapter focuses on the properties and design principles of diagnostic rating scales, while referring to proficiency scales such as the CEFR scales where and when necessary. Properties of diagnostic scales

When thinking of developing rating scales, there are a number of fundamental questions that should be considered right at the start. Knoch (2011b) lists the following five questions: 1. What type of rating scale is desired? 2. Who is going to use the rating scale? […] 3. What aspects of writing are most important and how will they be divided up? The scale developer needs to decide on what criteria to use as the basis for the ratings. 4. What will the descriptors look like and how many scoring levels will be used? There are limits to the number of distinctions raters can make. Many large-​scale examinations use between six and nine scale steps. This is determined by the range of performances that can be expected and what the test results will be used for. Developers also have to make decisions regarding the way that band levels can be distinguished from each other and the types of descriptors to be used. 5. How will scores be reported? Scores from an analytic rating scale can either be reported separately or combined into a total score. This decision needs to be based on the use of the test scores. The scale developer also has to decide if certain categories on the rating scale are going to be weighted. (p. 82, based on Weigle, 2002, pp. 122–​125) In what follows, we will address relevant issues pertinent to these questions, with particular attention to diagnostic assessment contexts.

178  Analyzing writing products by rating them

Holistic and analytic approaches

If written performances are to serve diagnostic purposes, they need to be analyzed for their strengths and weaknesses by diagnostically oriented rating scales or performance checklists. These instruments need to be constructed in such a way that they allow a fine-​g rained view on relevant textual, linguistic and discourse properties, so that meaningful feedback can be provided to examinees. Doe (2014) and Knoch (2009a,2009b) state that diagnostic rating scales should be analytic rather than holistic as they need to provide detailed information. While there are mixed reports on the reliability and validity of the two approaches (see Barkaoui, 2011 or Harsch & Martin, 2013 for an overview, also Weigle, 2002), there are criticisms of holistic scoring based on the feasibility of using them for diagnostic purposes. For example, holistic approaches may mask disagreement between raters that analytic approaches can show (Harsch & Martin, 2013; Smith, 2000), and they may lead to halo effects and impressionistic scoring. Hence in diagnostic assessment, holistic approaches could perhaps constitute a first step in providing a coarse estimation of the level (CEFR or other) of the text and/​or learner’s proficiency, before a more fine-​ grained checklist or rating scale is used to assess more specific areas of strengths and weaknesses. Analytic approaches, whether they be in the form of a rating scale or a performance checklist, offer the advantage of guiding the raters by stating clearly what features are expected at a certain performance level. The expectations can be derived from learning outcomes as stated in curricula, educational standards, or other competency frameworks. This is an important step in aligning the assessment with relevant learning outcomes and teaching objectives, which is a prerequisite for valid assessment in educational contexts. Thus, it becomes possible to assess whether a written product shows the expected features and what the quality of those features is. This in turn allows specific feedback with respect to the targeted outcomes and objectives, the goal of diagnostic assessment, as will be discussed in more detail in Chapter 9. CSWE Certificates in Spoken and Written English –​a curricular approach

One example for such an alignment is found in the Certificates in Spoken and Written English (CSWE) (Brindley, 2000; Burrows, 2001), a curriculum framework of the Adult Migrant English Program in Australia. There, “(t)eachers design syllabuses to meet the needs of their students within the framework provided by the curriculum” (Burrows, 2001, p. 3). The CSWE specify competences on four successive levels, where each competence is described with detailed elements that are defined by “can-​do” statements. These statements

Analyzing writing products by rating them  179 TABLE 7.1 Illustrative Example for a Level-​Specific CSWE Competency

Elements

Performance Criteria

Discourse Structure 1. can use appropriate • uses appropriate staging temporal staging 2. can use appropriate • uses some conjunctive links conjunctive links 3. can use simple e.g., ‘first’, ‘then’ reference ‘and’, ‘but’ uses appropriately simple reference appropriately e.g., pronouns and articles

Range Statements

Evidence Guide Sample Task

• familiar and Learners write relevant topic about a past • approximately 100 event e.g., words in length excursion, • recourse to workplace dictionary visit, holiday, • may include a few picnic, grammatical and experience spelling errors story but errors do not interfere with meaning

Source: Adapted from Smith, 2000, p. 167.

form the basis for performance criteria statements that are used for directly assessing written and oral products. Teachers use the curriculum framework as a basis to develop their tasks, and they use the performance criteria statements as a criterion-​ oriented achievement checklist. The performance statements target a specific level and specific writing competences such as those related to the discourse structure of the text (see Table 7.1). In addition to the discourse structure of the text, as shown in Table 7.1, the CSWE scale defines the corresponding elements and performance criteria for grammar and vocabulary. The key elements and performance criteria refer to topic-​appropriate use of vocabulary and the ability to use the past tense and to construct sentences comprising two clauses (Smith, 2000, p. 167). The diagnosis within the CSWE is level specific, that is, learners attend courses that target one of the four competency levels, work on tasks that operationalize this specific level, and their performance is assessed against the performance criteria statements targeting this competency level (i.e., there are distinct statements for each of the four competency levels). The performance criteria statements are assessed using a three-​part, also called ternary, approach, that is, the teacher judges whether each statement is not achieved, achieved or highly achieved (Brindley, 2001, p. 60). Thus, writers get fine-​g rained feedback on their achievements with regard to specific learning outcomes, and on areas where they still need to work on. Such a level- and outcome-​specific approach, where detailed analytic feedback is provided for learning-​relevant aspects, has huge potential for diagnosis. Specific, fine-​g rained learning outcomes for different attainment levels are operationalized

180  Analyzing writing products by rating them

into tasks that target these outcomes and levels, and performance criteria statements are transparently aligned to the learning outcomes. Moreover, teachers and learners are familiar with these, so that feedback can be generated on a fine-​ grained level to assess the extent to which learners have achieved the goals, and where they need to improve. Brindley (2000) argues that the CSWE integrates instruction, assessment and reporting and that the fact that competences are used as the units in both instruction and assessment “allows gains to be reported which might not be detectable using a general proficiency scale, thus giving a more accurate picture of individual achievement. In addition, the specification of explicit performance criteria enables teachers to give diagnostic feedback to learners following assessment” (p. 30). In a study on rater behaviour when using these performance criteria, Smith (2000) found that the raters did not seem to refer to criteria external to the scale. Furthermore, the raters did not show a tendency to weigh some criteria at the expense of others but rather “adhered strictly to the performance criteria statements provided in making their assessments” (Smith, 2000, p. 178). Performance criteria statements can be derived from curricular frameworks, as is the case for the CSWE. If the curricular framework describes competencies in the form of proficiency scales, these need to be “translated” and transformed into diagnosis-​oriented assessment scales or checklists (see above). We will describe one such example (i.e., the checklists used in the VERA context in Germany) in the section on VERA8 in this chapter. Another way to develop performance criteria statements is to analyze actual learner products for relevant features; this can also serve to validate the performance criteria statements with empirical data. In the section on DELNA in this chapter, we will discuss an example of such an empirically derived diagnostic rating scale. Levels

With the CSWE, we have seen an example of a level-​specific competency-​ based approach, where the targeted proficiency level is defined by performance criteria statements relevant and specific for this level. Level-​specific approaches are feasible in contexts where the learner’s proficiency level is approximately known, for example because the learners are attending a language course where a certain entrance level (e.g., A2) is required, and which targets specific learning outcomes (e.g., bringing learners to A2+​); the learning outcomes can be “translated” into level-​specific performance criteria statements (i.e., targeting only the level A2+​). The performance criteria statements in level-​ specific approaches often take the form of checklists, which are assessed by giving judgments on two or three degrees of achievement (“not achieved” /​ “achieved”, for ternary approaches also “highly achieved”). Such a binary or ternary judgment gives detailed information to the learners with regard to how

Analyzing writing products by rating them  181

well they have achieved certain criteria (see Turner and Upshur (2002) for examples of empirically derived binary-​choice scales). There are also assessment approaches where rating scales span a range of expected performance levels or bands, as seen in the DELNA scales that we will describe below. Such multi-​level approaches (spanning several performance levels or bands) are feasible in contexts where the learners’ proficiency levels may not be known prior to the diagnosis or where they are anticipated to be heterogeneous, as is the case for post-​entry diagnosis such as DELNA. In such contexts, a multi-​level rating scale allows for placing the written performances at a range of performance levels (e.g., ranging from B1+​to C1). With regard to the number of levels that a rating scale can span, more levels generally lead to greater precision and more fine-​t uned assessment, yet in practice, more than nine scale bands are difficult to be reliably distinguished by the raters (Knoch, 2011b, p. 92; North, 2000, p. 38f; North & Schneider, 1998, p. 231; Pollitt, 1991). In contexts where a developmental longitudinal diagnosis is the aim, the rating scale levels may be informed by or derived from developmental models of writing as described in Chapter 2. If the diagnosis is proficiency-​oriented, relevant performance levels can be derived from frameworks such as the CEFR proficiency scales. Assessment criteria

In writing assessment in general, usually four or five generic criteria are specified, such as content, organization, grammar /​vocabulary in relation to range and accuracy, and aspects such as communicative effectiveness. These criteria are then defined by one or several performance criteria statements (also called descriptors). In task-​specific approaches, further aspects regarding the targeted genre, conventions and discourse features may be addressed, as well as the degree to which the task was accomplished and the quality of task completion. When it comes to diagnostic assessment, rating scales or checklists need to differentiate a number of analytic assessment criteria, which are derived from the assessment construct (e.g., Knoch, 2011b). As indicated above for the CSWE, if diagnostic scales or checklists are to result in valid diagnosis and meaningful feedback, the assessment criteria and the defining descriptors need to be aligned to the learning outcomes and teaching goals in the writing curriculum. Further insights can be gained from relevant writing models, as outlined in Chapters 2 and 3. It is important for any assessment to define the targeted construct in the test specifications, as argued in Chapter 5. The writing construct is not only the basis for valid tasks, but also for valid rating scale /​performance checklist criteria (e.g., Davies & Elder, 2005; Knoch, 2009a; Lumley, 2002, 2005). Given the numerous aspects that potentially contribute to writing ability, it is important to make an informed selection of the most relevant criteria that

182  Analyzing writing products by rating them

the diagnosis should focus on. Again, the context and nature of the diagnosis and its purpose will influence what aspects should be selected. These aspects can vary at different points of diagnosis, as the focus may shift during a school year or the progress of a writing course. In principle, the more detailed the criteria are, or in the words of the aforementioned CSWE, the more broken down the competence components are into their constituent elements, the more fine-​g rained the feedback will be. Yet one has to strike a balance between breaking down the competence of writing into ever finer elements and keeping the performance criteria manageable. After all, cognitive resources are not only limited for writers, raters also have limited capacities when it comes to the number of criteria that can reliably and validly be handled. While written products can theoretically be read and re-​read as often as deemed necessary, raters, nevertheless, seem to find it difficult to distinguish among more than eight or nine criteria (e.g., Weigle, 2007). Moreover, the criteria need to be defined as independently from each other as possible, to avoid overlap and thus a conflation of separate aspects, as will be discussed below. This also poses limitations on the number of criteria that can possibly be distinguished in one rating scale or checklist. Task-​specific assessment criteria

Besides the differentiation between holistic and analytic scales, the literature mentions task-​specific scales (e.g., Hamp-​Lyons, 1995), which are characterized by the inclusion of specific features that the task aims to elicit. Task-​specific scales have to be developed separately for each task in an assessment; hence they are very resource-​intensive. These scales have, however, the advantage of facilitating detailed feedback with regard to fulfilling the requirements of certain genres and discourse features. In order to keep costs at an optimum, the diagnostic scale could contain ‘generic’, task-​ independent criteria that are accompanied by task-​ specific features. The task-​specific features could take the form of a list of content, register and discourse features that are assessed in a binary or ternary way (see above). In this respect, such a list resembles test items rather than a rating approach, as the rater does not need to form a qualitative judgment but rather checks whether certain features are realized in the text. This can be one way to include task-​specific features in a diagnostic rating scale. Design principles of diagnostic scales

The literature distinguishes different starting points for the development of ratings scales or performance checklists (see e.g., Brindley, 1998; Council of Europe, 2001; Fulcher, 2003; Harsch & Martin, 2012; Hughes, 1989; Lumley,

Analyzing writing products by rating them  183

2002; North & Schneider, 1998). One can either take theoretical models (such as those outlined in the previous chapters of this volume), existing scales, checklists or frameworks (such as the CEFR or the above-​described CSWE), or empirical analyses of learner texts as a starting point for the development of rating scales or performance checklists. (Fulcher et al., 2011). For the ensuing steps of formulating and validating rating scale descriptors or performance criteria statements, there are a range of intuitive, qualitative and quantitative methods available (for a detailed overview on scaling methods, see e.g., Council of Europe, 2001, Appendix B; North, 2000; North & Schneider, 1998), all of which can be used in combination: • So-​called intuitive methods encompass expert advice or teacher intuition; no empirical data collection is involved, but intuitive methods usually feed into qualitative and quantitative data collection and analysis. • Qualitative methods use approaches such as formulating key concepts, analyzing performances for salient features, using comparative judgments or sorting performances, as well as sorting descriptor drafts (into criteria or levels). Undoubtedly, a certain level of intuitive expertise will also underlie the qualitative approaches, yet one characteristic is that data are collected and analyzed qualitatively; some of the data may also be analyzed quantitatively. • Quantitative methods collect numeric data and make use of statistical analyses such as discriminant analysis (where qualitative key features are scrutinized by regression analysis for the most significant features). One frequently applied method is Rasch-​scaling, where the results of qualitative descriptor sorting are calibrated on an arithmetic scale, so that the scale levels can be validated. These methods are usually used in combination, depending on available resources and data. Intuitive methods, for example, may use existing scales and experts’ opinions to select and collate relevant descriptors, or to describe expected features at different proficiency levels (Knoch, 2009a). While this may be a first step in compiling draft descriptors or statements, these drafts need to be empirically validated for diagnostic purposes. Here, qualitative and quantitative methods are usually employed. For example, learner text productions can be analyzed qualitatively and quantitatively for the occurrence and frequency of relevant features (see Chapter 8 for the use of automated text processing). Another approach is to have raters try out the rating scale with learner texts and analyze rating consistency. Cases of rater disagreement can then be used to qualitatively improve the rating scale descriptors (Harsch & Martin, 2012). Yet another approach, which combines intuitive with quantitative aspects, is to validate scale descriptors by having experts sort descriptors into scales

184  Analyzing writing products by rating them

and levels, and statistically examine whether the sorting exercise results in the intended scale criteria and levels (Harsch, 2010). All these methods are suitable for designing scales with different purposes, such as proficiency or rating scales. Most of the existing scale systems, such as the CEFR proficiency scales, have been developed by a combination of these methods (North, 2000). For the development of diagnostic assessment scales, however, some methods seem more suitable than others. Knoch (2009a) argues for empirically derived diagnostic rating scales, because intuitive methods are known to lead to imprecise terminology, relativistic differentiation between bands, and “often do not closely enough represent the features of candidate discourse” (p. 277). We will exemplify two such empirical diagnostic rating scale developments below. We now provide a list of fundamental design principles in the development of diagnostic scales or checklists, based on the works of Alderson (1991), Brindley (1998), Jones (1993), Knoch (2009a/​b), Lumley (2002), North (2000), North & Schneider (1998), Smith (2000), and Weigle (2002): • Descriptors/​statements need to be concrete, clear, and short. • They should enable binary yes/​no or ternary (see above for the CSWE) decisions. • They describe precise and concrete features –​vague impressionistic wording that is open to differing interpretations should be avoided. • They stand on their own –​the wording of the descriptors should distinguish one independent, analytic criterion. • They are based on relevant learning and teaching goals (existing scales or theoretical models) and validated by empirical methods. • The terminology used should be accessible to raters –​specialized technical terminology should be avoided. In cases where a scale covers several bands or levels, the following principles apply: • The descriptors need to stand on their own also regarding the adjacent bands/​ levels by qualitatively describing distinct features relevant for a specific level –​ they should not contain relativistic terminology such as “little”, more”, or “some”, which would require interpretation in relation to adjacent bands. • They should describe rather narrow bands to gain fine-​g rained diagnostic information. The main purposes of diagnostic rating scale descriptors or checklist statements is to guide the rating process and to facilitate a common understanding between learners, teachers and diagnosers. Such instruments do not usually claim to describe learner development or learner language. Ultimately, such an

Analyzing writing products by rating them  185

instrument is “a set of negotiable principles” (Lumley, 2002, p. 286) used to guide the diagnosis of written text products. We now illustrate the development of diagnostic scales with two projects that took different approaches to the development of rating scales, one taking existing proficiency scales as a basis for scale construction, the other taking learner texts as a starting point. VERA8 –​taking the CEFR as basis for rating scale development

We first describe an example of a descriptor-​based approach to the construction of an analytic scale with diagnostic potential, where intuitive, qualitative and quantitative methods to scale development were combined. The scale was developed in a large-​scale context, aiming to diagnose the English as foreign language proficiency of pupils (aged 14 to 15) towards the end of lower secondary school in Germany (Rupp et al., 2008). As described in Chapter 4.4.5, the writing tasks and the rating scale were employed in the VERA8 diagnostic assessment in 2009, to provide diagnostic feedback to teachers and pupils with reference to the CEFR proficiency levels. The level-​specific writing tasks were formally aligned to the CEFR levels A1 to C1 (Harsch et al., 2010), and the CEFR descriptors from levels A1 to C1 were taken as starting points for rating scale construction (Harsch, 2010; Harsch & Martin, 2012, 2013). The rating scale encompasses four analytic criteria: task fulfilment, organization, grammar, and vocabulary, each of which is defined by level-​ specific descriptors for each of the targeted CEFR levels. In a level-​specific approach (see above), students respond to a writing task that is aligned to a specific CEFR level. The resulting student writing sample is then assessed with level-​specific descriptors which target the same CEFR level as the writing task. For each criterion, a ternary rating is given for each of the descriptors, similar to the one described above for the CSWE assessment (see the “Holistic and analytic approaches” section in this chapter), that is, a student writing sample shows the expected features /​shows more than expected /​shows less than expected. Based on the descriptor ratings, a criterion rating is given for each of the four criteria in turn. Finally, an overall judgment is formed whether the student writing sample meets the level-​specific expectations, shows less or shows more than expected. The rating scale was developed in three phases. In Phase 1, relevant CEFR proficiency descriptors and descriptors from CEFR-​based rating scales were taken as a starting point, and an initial draft was intuitively compiled, where descriptors were allocated to the aforementioned four criteria (task fulfilment, organization, grammar, vocabulary). The criteria were based on commonly used writing assessment criteria in the German context. This draft was trialled on a small scale by teachers, and the resulting feedback was used to refine the initial scale. In Phase 2, the scale was trialled on a larger scale, and rater

186  Analyzing writing products by rating them

training was combined with scale revision: 13 raters worked on 19 tasks across CEFR levels A1 to C1 and rated an average of 90 written performances per task for a total of 16 days over a period of two months. Ratings were statistically monitored for agreement, systematic deviations in scores were discussed and the descriptors were revised where necessary (Harsch & Martin, 2012). In Phase 3, the refined scale was then subjected to a qualitative sorting exercise with 14 international writing assessment experts, whose task it was to sort the individual descriptors into their respective criteria and levels (Harsch, 2010). Results showed that descriptors were allocated to their intended criteria and levels in the vast majority of cases. This served to validate the scale from an external point of view, as none of the experts had been involved in the prior construction steps. Table 7.2 gives an impression of the final list of performance descriptors, illustrated for level B1, and the criteria organization, grammar, and vocabulary. Table 7.3 shows the descriptors for the criteria Grammar and Vocabulary for level B1 as they were used by teachers in VERA8 in 2009 with the aforementioned ternary approach. The diagnostic potential of this approach lies in the fact that when each descriptor is analyzed, the diagnosis yields very fine-​ g rained information on salient features. The level-​specific tasks and descriptors allow fine-​tuned feedback on where students are (with respect to where they are expected to be by the educational standards, which are aligned to the CEFR), and in which areas students show what strengths and weaknesses. DELNA –​analysis of discourse features

We now provide an example of an empirically derived diagnostic rating scale in the context of post-​u niversity-​enrolment diagnosis. The Diagnostic English Language Needs Assessment (DELNA) is used at the University of Auckland to diagnose undergraduates’ English language support needs (Elder, 2003; Elder & Von Randow, 2008). The results of the diagnosis are reported to students, faculty, and language support staff. The diagnosis involves an academic writing task (graph description) that is subjectively scored by human raters using a rating scale. A new diagnostic scale was developed in 2007 that contains the criteria and band levels shown in Table 7.4 below, which vary depending on the criteria. The new diagnostic scale was developed using empirical methods that encompassed automated analysis of written performances regarding the use of discourse features (see also Chapter 8 in this volume), quantitative rating data of the written performances, and qualitative analysis of interview data with the raters. The development took place in two phases (Knoch 2007; Knoch 2009a): First, 600 student writing samples elicited by the aforementioned task

newgenrtpdf

TABLE 7.2 Performance Descriptors for Level B1

Grammar (Accuracy should be treated in relation to range)

Vocabulary (Accuracy should be treated in relation to range)

B1

Range Uses a range of frequently used structures [such as tenses, simple passives, modals, comparisons, complementation, adverbials, quantifiers, numerals, adverbs]. Sentence patterns show simple variations [e.g., subordinate and coordinate clauses often beginning with ‘when’, ‘but’; relative clauses and if-​clauses]. Accuracy Structures and sentence patterns shown in the script are used reasonably accurately. Some local errors occur, but it is clear what he/​she is trying to express. Few global errors may occur, especially when using more complex structures /​sentence patterns (e.g., relative clauses, if-​clauses, passive and indirect speech). Occasionally mother tongue influence may be noticeable.

Range Shows a sufficient range of vocabulary (beyond basic) to express him/​herself in familiar situations: some circumlocutions may occur in unfamiliar situations. Accuracy Shows good control (i.e., adequate and appropriate use) of basic vocabulary. Some non-​impeding errors occur. Impeding errors may occur occasionally, especially when expressing more complex thoughts or handling unfamiliar topics and situations. Errors may occur in the field of collocations and complementation. Some spelling errors may occur. Occasionally mother tongue influence may be noticeable.

Structure /​Thematic development Produces a straightforward connected text (narrative or descriptive) in a reasonably fluent manner or links a series of shorter discrete simple elements into a linear sequence of points in a reasonably fluent manner. Thematic development shows a logical order and is rounded off. Longer texts might compensate for sone jumpiness or a missing ending. Organization in paragraphs not required, but might compensate for flaws in thematic development. Language /​Cohesion Uses a number of common cohesive devices throughout the text, such as articles, pronouns, semantic fields, connectors, discourse markers (like ‘so’ (consecutive), ‘in my opinion’). Shows reasonable control of common cohesive devices. The use of more elaborate cohesive devices may sometimes impede communication.

Note: Source: www.iqb.hu-​ber​l in.de/​bista/​auf ​bsp/​ver​a8_ ​2​0 09/​enfr/​Kodier​sche​m a_ ​k​ompl​ett.pdf, accessed 24.10.2011, no longer available online.

Analyzing writing products by rating them  187

Level Organization (Structure and cohesion should have equal weighting)

188  Analyzing writing products by rating them TABLE 7.3 VERA8 Grammar /​Vocabulary Descriptors for Level B1

Grammar (G) G1: Range

G2: Accuracy

Vocabulary (V) V1: Range

V2: Accuracy

Student uses a range of frequently used structures (such as tenses, simple passives, modals, comparisons, complementation, adverbials, quantifiers, numerals, adverbs). Sentence patterns show simple variations (e.g., subordinate and coordinate clauses often beginning with ‘when’, ‘but’; relative clauses and if-​clauses). Student uses structures and sentence patterns reasonably accurately. Some local errors occur, but it is clear what he/​she is trying to express. Student shows a sufficient range of vocabulary (beyond basic) to express him/​herself in familiar situations; some circumlocutions may occur in unfamiliar situations. Student shows good control (i.e., adequate and appropriate use of basic vocabulary). Some non-​impeding errors occur. Impeding errors may occur occasionally, especially when expressing more complex thoughts or handling unfamiliar topics and situations.

TABLE 7.4 Criteria of the New DELNA Scale

New scale

Band levels

Accuracy Lexical complexity Data description Data interpretation Data –​Part 3 Hedging Paragraphing Coherence Cohesion

6 4 6 5 5 6 5 5 4

Source: Knoch, 2009a, p. 281.

were empirically analyzed using the following language features (Knoch, 2009a: 279f ): • accuracy (percentage error-​f ree t-​units); • fluency (number of self-​corrections as measured by cross-​outs); • complexity (number of words from Academic Wordlist);

Analyzing writing products by rating them  189

• • • •

style (number of hedging devices); paragraphing (number of logical paragraphs from five paragraph model); content (number of ideas and supporting ideas); cohesion (types of linking devices; number of anaphoric pronominals ‘this/​ these’); • coherence (based on topical structure analysis). Based on these measures, a new scale was drafted, providing explicit, mainly countable features, and avoiding adjectives, such as ‘severe’ or ‘adequate’, to differentiate between the band levels. Each criterion focuses on one trait only to avoid conflation of, for example, vocabulary and spelling. The differing band levels per criterion (see Table 7.4) were based on the empirical findings (Knoch, 2007). In the second phase, the new scale was trialled in comparison to the old scale (Knoch, 2009a): 10 raters assessed 100 student performances, first using the old scale and then the new scale; raters were afterwards interviewed about the perceived differences. Rating data from the old and new scales were analyzed using Multi-​ Faceted Rasch Measurement in order to examine the new rating scale for its ability to discriminate among different levels of examinee performance, rater consistency and variance, and to compare the ratings from the old and the new scales. Factor analysis was then employed to establish the nature and number of the criteria in the new scale. Finally, the interview data were qualitatively analyzed for positive and negative comments on the new scale. Qualitative and quantitative findings of the second phase revealed that the new scale led to better candidate discrimination, higher reliability and higher consistency of the raters. Table 7.5 provides an abridged version of the new scale. The empirical scale-​development approach allowed for the development of a diagnostic scale that depicts diagnostic features relevant for its specific contexts. It has to be noted, however, that this scale contains some relativistic terminology such as “nearly error-​free”, “frequent”, or “infrequent”, while an attempt is made at quantifying and qualifying the distinctive features at the nine levels. Also, it is unclear how raters should distinguish between nine levels, if there are no descriptors below level 4, and some of the descriptors for some of the levels overlap (see note below Table 7.5). The scale is claimed to elicit consistent and reliable ratings that can be used to report meaningful outcomes to students, faculty and language support staff. Interview findings from phase 2, however, revealed that raters needed to be made aware of the purpose of a diagnostic assessment during rater training, so that they could focus on each criterion separately and avoid halo or central tendency effects (Knoch, 2009a, p. 297). We will now turn to further challenges when using rating scales.

newgenrtpdf

Accuracy Fluency Complexity Mechanics Reader-​Writer Interaction Content

Accuracy Repair Fluency Complexity Paragraphing Hedges Data description

Coherence

Interpretation of data Part 3 of task Coherence

Cohesion

Cohesion

9

8

Error-​f ree No self-​corrections

Nearly error-​f ree No more than 5 self-​corrections

Large number of words (more than 20) /​vocabulary extensive –​ makes use of large number of sophisticated words 5 paragraphs 4 paragraphs More than 9 hedging devices 7–​8 hedging devices All data described (all trends and relevant figures)

76 5 4 Nearly no or no error-​f ree sentences More than 20 self-​corrections Less than 5 words from AWL /​uses only very basic vocabulary 1 paragraph No hedging devices

Most data described (all trends, some figures) (most trends, most figures) Five or more relevant reasons and/​or supporting ideas

Data description not attempted or incomprehensible

Four or more relevant ideas Writer makes regular use of super structures, sequential progression and possibly indirect progression. Few incidences of unrelated progression. No coherence breaks.

No ideas provided Frequent: Unrelated progression, coherence breaks and some extended progression. Infrequent: sequential progression and superstructure. Writer uses few connectives. there is little cohesion. This/​t hese not or very rarely used.

Connectives used sparingly but skilfully (not mechanically) compared to text length, and often describe a relationship between ideas. Writer might use this/​t hese to refer to ideas more than four times.

No reasons provided

Source: Knoch 2009a, p. 304. Note: The numbers on top refer to the levels of the rating scale. There are no descriptors below level 4. Some of the descriptors for levels 8 and 9 overlap, as do some for levels 4 and 5.

190  Analyzing writing products by rating them

TABLE 7.5 Abridged Version of the New DELNA Diagnostic Scale

Analyzing writing products by rating them  191

Challenges for human raters

One of the well-​k nown issues with human rating is the so-​called halo effect, that is, when one characteristic of the writing overshadows the judgment of the whole text. This is a serious challenge with holistic ratings, but it can also happen in an analytic diagnostic approach, when for example, a text written in poor handwriting receives a lower-​than-​deserved score across all diagnostic criteria. Another known issue is the order effect, where the order in which the writing products are rated affects the scores awarded. Think of just having read a very strong text, which is then followed by a ‘mediocre’ one –​you may award lower scores to the mediocre one than you would have awarded had you just read a very poor text before the mediocre one. The literature reports a whole range of such challenges for human raters that are mainly rooted in the inherently complex and indeterminate nature of the rating process (e.g., Cumming, Kantor & Powers 2002; Douglas 1994; Lumley 2002, 2005; Milanovic, Saville & Shuhong 1996; Pollitt 1991). The complexity of the rating process can lead to raters paying attention to different features in the texts and scales. A rater’s focus may shift from text to text, or from task to task, leading to internal inconsistencies. Different raters may emphasize different text and scale features, leading to inconsistencies between raters. Moreover, raters may put differing importance on different features depending on the language they assess; Kuiken and Vedder (2014), for example, found systematic differences between raters assessing L2 and L1 texts –​L1 being the first language of the raters and the writers. Furthermore, the complexity of forming a judgment and coming to a score decision involves a range of strategies that raters employ, yet in differing ways, as for example Eckes (2008) found. Challenges can also arise from the writers’ texts, tasks and scales, which may cause raters to come to differing interpretations of text features or rating scale wording. Hence, it is important to design clear and unambiguous writing tasks and rating scale descriptors, as outlined in Chapter 5 and in this chapter above. Human causes for these challenges are attributed to limited cognitive resources, differing preferences as to what we value in a written text, the strategies we apply when rating or forming judgments, or our level of experience with rating procedures. All these facets can enhance or hinder the rating procedure and impact on the reliability and validity of the resulting scores (e.g., Eckes, 2008; Kuiken & Vedder 2014; Weigle, 2007). At this point, we would like to draw the reader’s attention to the difference between reliability and validity. Reliability as manifested in score agreement and consistency does not necessarily equal rating validity as manifested in a shared understanding of the rating scale and how the scale is to be applied. Hence, both score agreement/consistency and raters’ interpretation and application of the rating scale have to be taken into account. If raters come to

192  Analyzing writing products by rating them

the same scores, but for different reasons, this may cause validity issues even if it may not be reflected in the final scores, particularly if a summative overall score is reported (see e.g., Harsch & Marin, 2013; Smith, 2000). Rater variability has long been the focus of research, as it impedes score reliability (e.g., McNamara, 1996; Weigle, 2002). Such unwanted variability can be manifested in differing degrees of leniency or harshness. It can also be caused by raters not being consistent within themselves (e.g., across different rating sessions), or it can be attributed to raters giving differing importance to different characteristics of the writing. In order to examine the cognitive causes underlying such variability, think-​aloud or verbal protocol studies on rater cognition are conducted, where raters are asked to verbalize their judgment-​ finding processes, decision-​m aking procedures and scoring strategies. Such studies have shown that raters differ in their rating styles and strategies, and in their use of the scale (e.g., Barkaoui 2010a; Cumming, 1990; Cumming et al., 2002; Eckes, 2008; Knoch, 2007, 2009a; Lumley, 2002, 2005; Vaughan, 1991). Eckes (2008), for example, found distinct groups of raters who showed distinctively different approaches to using the rating scale. Some of these studies also indicated that certain rater characteristics may not be affected by rater training. Rater background is a further characteristic that may influence rating validity and scoring reliability. Experienced raters may bring the approaches and strategies that worked well in previous contexts to new rating contexts, even though these strategies may not be appropriate. For example, raters who have worked with holistic scales may not be able to work with analytic scales needed in a diagnostic context in that they may struggle with paying equal attention to separate criteria, rather than forming one holistic judgment. Novice raters may be focusing on irrelevant features of performance or may add criteria based on their own values and expectations that are not part of the actual rating scale (e.g., Douglas, 1994; Schoonen, 2005). An obvious solution to such background issues would be rater training, yet the literature reports inconclusive findings on the effectiveness of training. Knoch (2011a), for example, found that experienced raters did not change their behaviours when given individual feedback. We will discuss rater training in more detail below. Furthermore, background variables that can influence the rating processes and outcomes involve raters’ expectations towards what constitutes a “good” text, as well as their reading experiences. Such expectations and experiences can lead to raters focusing on different linguistic and communicative aspects in the texts (e.g., Douglas & Selinker, 1992, 1993; Weigle, 2007). Differing expectations can also be the cause for disagreement between raters on the quality or effectiveness of a written product. Another influencing factor may be raters’ familiarity with the language being assessed or familiarity with writers’

Analyzing writing products by rating them  193

other languages. Such familiarity may facilitate the resolution of ambiguities in the rated texts that are caused by interferences of other languages with the target language. For instance, if I teach Portuguese to learners whose L1 is English, then I am familiar with typical errors English L1 speakers produce in Portuguese texts. If I am, however, unfamiliar with the learners’ L1 and the resulting interferences, I will not be able to resolve such errors and hence may come to a different diagnosis of the learner’s text. While familiarity may be less relevant in classroom diagnosis where teachers are supposedly familiar with the language of schooling, other languages students might bring, and the SFL being taught, it may play a role in large-​scale diagnosis, where common knowledge of the learners’ L1 cannot be taken for granted. Given the above challenges, it is of particular interest in diagnostic contexts to understand how a comparable focus on relevant linguistic features can be established among raters as diagnosers. A study by Kuiken and Vedder (2014) examined the “relationship in L2 [and L1] writing between raters’ judgments of communicative adequacy and linguistic complexity by means of six-​point Likert scales, and general measures of linguistic performance”. (2014, p. 329). They found that the strategies used by the raters differed when they assessed weak and strong L2 writers or native-​speaker writers; the raters “seemed to attach more importance to textual features connected to communicative adequacy than to linguistic complexity and accuracy”. (ibid.). These findings indicate an intricate interwovenness of diagnostic features, proficiency levels and writing in one’s L2 or L1, which has the following implications for diagnostic rating scale design and for rater training: The weighting and importance of different criteria and features needs to be transparently described in the rating scale, and rater training needs to emphasize the importance that raters are to give to different criteria and features, two aspects that we will now discuss in more detail. Main implications for diagnosis Implications for diagnostic rating scale design

The challenges listed above point towards the need for raters to apply rating scales consistently and comparably, in order to achieve reliable and valid scores (e.g., Elder 1993; McNamara 1996). To this end, the wording of the scale is of utmost importance, that is, the wording used to define the features characteristic of the different criteria and band levels, or the wording of the descriptors in a checklist. It is, after all, the process of relating the scale wording to textual features that determines rating validity (e.g., Harsch & Martin, 2013; Lumley, 2005). The scale wording expresses the “values” (Hamp-​Lyons 1991, p. 252) of what constitutes good writing in a given assessment context –​hence, the

194  Analyzing writing products by rating them

values need to be transported in an unambiguous way to the raters, so that they can compare their own values with those demanded in the assessment context. This is of prime importance in contexts where raters external to the assessment context serve as diagnosers. In this respect, the rating scales or checklists function as an “operationalization of the values” (Harsch & Martin, 2013, p. 286) promoted by the diagnosing institution. These values, of course, should be written down in the assessment specifications in the beginning of the diagnostic cycle, and they should be operationalized in every single step along the cycle, from defining the construct, to designing tasks and diagnostic approaches, to rating the written products, and to finally giving feedback. Only if the intended values are reflected in the final diagnostic feedback, writers can see where they adhere to these goals and where they still need to improve. Needless to say, the sources of these values lie in the institutional context of the diagnosis, whether it is reflective of curricula, educational standards or conventions of a community of practice. With regard to diagnostic feedback, Knoch (2011b) recommends providing a “(d)etailed description of a test taker’s writing behaviour in the different categories of the rating scale (…). Beyond that, pointers to improvement and possible sample sentences/​paragraphs could be provided to help test takers improve their writing in future performances” (p. 94). Particularly in diagnostic contexts, monitoring how raters interpret and apply the “smallest” units of a rating scale, namely the descriptors, is of utmost importance, because otherwise we do not know whether the awarded scores and the resulting feedback validly reflect the intended values. Research shows that while raters may agree on an overall score for a criterion (or a script), they may have different reasons for rewarding this score, and they may actually differ on how they apply the different descriptors that define a criterion (see, e.g., Harsch & Martin, 2013; Smith, 2000). Hence, in diagnostic settings, we recommend defining the descriptors carefully, as they constitute the most fine-​g rained units of a rating scale or checklist. We further suggest collecting raters’ judgments on this fine-​ g rained level. The more fine-​ g rained the diagnostic scale or checklist, the richer the resulting diagnostic feedback will be. The wordings of the descriptors need to reflect the value features of the traits, whether that be lexical richness, appropriacy of expressions, or some index of word frequency. This ensures that raters know what to focus on in a given diagnosis. In cases where different features of the trait carry different (or equal) weights, guidelines are needed to clarify what features to weigh in which ways. For instance, in cases where several descriptors constitute a criterion, such as vocabulary, and a criterion score should be required, guidelines need to clearly specify the relation between the different descriptors and how raters

Analyzing writing products by rating them  195

are supposed to form the criterion score. One example can be found in the VERA8 checklists, which state that vocabulary accuracy should be considered in relation to vocabulary range, which is further explained by the fact that the accuracy descriptor for B1 (see Table 7.3) refers to “basic” vocabulary and explicitly allows certain kinds of errors beyond this basic range. Another way to clarify to the raters the meaning of descriptors, criteria, and band levels is to use benchmark texts. Benchmarks are pre-​scored written products that are analyzed in relation to the descriptors and band levels, and that are ideally accompanied by justifications that explain why a certain score was awarded, how the text was interpreted in relation to certain descriptors, and how the weighting rules were applied. In the VERA8 diagnosis, a booklet with benchmarks was published for the assessment taking place in 2009 to illustrate to teachers the targeted values and the intended descriptor interpretations. Figure 7.1 shows one example from this booklet. Implications for diagnostic rater training

While not all research on the effects of rater training reports positive findings (e.g., Knoch, 2011a), some practices that seemingly lead to effective training have been proposed. As Harsch and Martin (2012) state: (T)he main aims of rater training are to develop a common understanding of the assessment criteria and an interpretation of the descriptors as intended by the test constructors, to establish reliable rating procedures, and as far as possible to control undesirable effects of rater characteristics such as raters’ backgrounds, experiences, expectations, or their preferred rating styles (see, e.g,. Eckes, 2008; Lumley, 2005; Shale, 1986; Weigle, 2002). Rater training according to Cumming (1990), Shohamy, Gordon, and Kraemer (1992) or Weigle (1994), for example, plays an important role with respect to aspects such as clarifying assessment criteria and terminology issues, practising rating strategies, or reaching consensus about how to interpret scripts with reference to scale descriptors. It also facilitates novice raters to approximate expert raters’ behaviour (e.g., Cohen, 1994). (p. 233) Rater training usually encompasses the following aspects: • Clarifying the assessment context and purpose (i.e., diagnostic assessment in the classroom, or in a large-​scale monitoring study). • Clarifying the construct, the targeted values, the writers. • Familiarizing raters with the tasks and rating instruments, including benchmarks.

196  Analyzing writing products by rating them

FIGURE 7.1 Benchmark

Text from VERA8.

Source: www.iqb.hu-​ber​l in.de/​bista/​auf ​bsp/​ver​a8_ ​2​0 09/​Schuelerleistung​en_​u​nd_ ​K​odie​r ung​ en_ ​E​ngli​sch.pdf accessed 24.10.2011, no longer available online.

newgenrtpdf

TABLE 7.6a Justifications for the Benchmark Text from VERA8 (see Figure 7.1) for Task Fulfilment and Organization

Task Fulfilment (TF) TF1: Content points

TF2: Ideas relevant to the task

TF4: Text type requirements TF5: Communicative effect Organization (O) O1: Structure /​ Thematic development

O2: Language/​Cohesion

Rater’s Comments

Code*

Student describes most of the following content points: • accident embedded in a situation • cause of accident • type of injury and/​or part(s) of the body hurt • reaction to the accident Student makes reference to school environment and an accident during a sports event (sports injury /​car accident /​fight with resulting injury or accident). Student writes consistently in a neutral /​non-​informal tone, since it is a report for the school. Student writes a factual description (in most parts of the text); there should not be too many narrative parts. Student shows a clear picture of the sports accident because enough information is provided.

fulfilled, all bullets addressed

2

Student produces a straightforward connected text (narrative or descriptive) in a reasonably fluent manner or links a series of shorter discrete simple elements into a linear sequence of points in a reasonably fluent manner. Thematic development shows a logical order and is rounded off. Student uses a number of common cohesive devices throughout the text, such as articles, pronouns, semantic fields, connectors, discourse markers (like ‘so’ (consecutive), ‘in my opinion’). He/​ she shows reasonable control of common cohesive devices.

no connected text, incoherent (code 0)

* Code definitions: 0–​insufficient evidence, 1–​below level B1, 2–​at level B1, 3–​above level B1.

all points mentioned relevant neutral tone partly achieved (report and narrative parts) communicative effect not achieved, more than occasional difficulties

range of cohesive devices not sufficient; cannot show sufficient control due to limited range (code 1)

1

Analyzing writing products by rating them  197

TF3: Register /​Tone

Target for Sports Accident: Level B1

newgenrtpdf

Grammar (G) G1: Range

G2: Accuracy

Vocabulary (V) V1: Range V2: Accuracy

Overall

Target for Sports Accident: Level B1

Rater’s Comments

Code*

Student uses a range of frequently used structures (such as tenses, simple passives, modals, comparisons, complementation, adverbials, quantifiers, numerals, adverbs). Sentence pattern shows simple variations (e.g., subordinate and coordinate clauses often beginning with ‘when’, ‘but’; relative clauses and if-​clauses). Student uses structures and sentence patterns reasonably accurately. Some local errors (1) occur, but it is clear what he/​she is trying to express. Few global (2) errors may occur, especially when using more complex structures /​sentence patterns (e.g., relative clauses, if-​ clauses, passives and indirect speech).

narrow range (should, past tense, negation); nearly no sentence variation (mainly subject-​predicate-​object) (code 1)

1

Student shows a sufficient range of vocabulary (beyond basic) to express him/​herself in familiar situations; some circumlocutions may occur in unfamiliar situations. Student shows good control (i.e., adequate and appropriate use) of basic vocabulary. Some non-​impeding (3) errors occur. Impeding (4) errors may occur occasionally, especially when expressing more complex thoughts or handling unfamiliar topics and situations.

limited range (code 1)

control not sufficient, more than some local errors (e.g., tenses not controlled); global errors (last sentence, which …) (code 1)

1

some control only; some spelling errors, frequent non-​impeding errors; some lexical errors are impeding (e.g., wond, food, make me warm) (code 1) 1

* Code definitions: 0–​insufficient evidence, 1–​below level B1, 2–​at level B1, 3–​above level B1 (1) Local errors are grammatical errors within one sentence which do not hinder understanding (e.g., mixing up of tenses, forgetting to mark agreement, problems with subordinate clauses, errors in word order). It is usually clear what the writer wants to express. (2) Global errors are those grammatical errors which hinder understanding at the sentence level. (3) Non-​impeding errors are those lexical /​spelling errors which can be solved spontaneously. (4) Impeding errors are those lexical /​spelling errors which are irresolvable or take a great deal of effort to resolve.

198  Analyzing writing products by rating them

TABLE 7.6b Justifications for the Benchmark Text from VERA8 (see Figure 7.1) for Grammar and Vocabulary

Analyzing writing products by rating them  199

• Familiarizing raters with the rating approach, for example, by providing decision-​m aking models (e.g., Lumley, 2002; Milanovic, Saville & Shuhong, 1996) and by demonstrating helpful strategies (e.g., Lumley, 2005). • Giving raters time to practice, and giving data-​d riven, timely and transparent feedback (e.g., Harsch & Martin, 2012; Knoch, 2011a), if possible tailored to raters’ profiles (Eckes 2008). • Giving raters time and space to discuss their ratings and the feedback (Harsch & Martin, 2012; Knoch, 2011a; Weigle, 2007 who reports discussions as particularly helpful for teachers). • Monitoring rating quality on the usage of the most fine-​ g rained scale level, that is, the descriptors or performance criteria statements, over time to facilitate concrete feedback and guidance for inconsistent raters (see the procedures outlined in Harsch & Martin, 2012, p. 233). There are two distinct ways in which rater training has generally been conducted. One is called “schema-​d riven or top-​down rater training”, where “raters are taught to use a mental schema or cognitive prototype that represents effective performance at a particular level of competence. That is, they are encouraged to ‘scan’ the performance for schema-​relevant incidents and to form online evaluations, as exemplified by training in the use of holistic scoring rubrics” (Eckes, 2008, p. 179, who cites Lievens, 2001; Pulakos, 1986). More relevant for diagnostic contexts is what Eckes describes as “behavior-​d riven (or bottom-​up) rater training” (2008, p. 179), where he states that: Behavior-​d riven training divides the rating process into three phases that are strictly distinguished from each other: behavioral observation, classification, and evaluation. Raters are taught to proceed from one phase to the next only when the previous one is finished. In other words, raters are trained to classify pieces of factual information into discrete categories to form a judgment. Use of an analytic rubric fits into this approach to rater training. (p. 179) This latter approach is also feasible in diagnostic contexts to train raters to focus on observing the targeted behaviour or features in the text, classifying these by matching them to the features defined in the scale descriptors, and finally forming their evaluation or judgment. A further distinction of rater training can be made with regards to whether the raters are “hierarchically” (Harsch & Martin, 2012, p. 233) trained by a ready-​m ade model that they have to internalize, or whether a “discussion-​ based” (p. 233) approach is taken, where raters can discuss ratings, scripts and scale interpretation and application. While the hierarchical approach may be suitable for contexts where an existing and well-​working scheme is passed on

200  Analyzing writing products by rating them

to new raters, the latter is recommended by East (2009) or Harsch and Martin (2012), given its potential for “developing a common understanding and application of the descriptors” (p. 233). This approach is particularly feasible in contexts where a new scheme is developed, revised or validated, because raters who are to apply the scale are involved in shaping it so that the wording becomes transparent and clear (e.g., Barkaoui, 2010b; Harsch & Martin 2012). Even with the best possible training, we acknowledge rater effects that may prove to be resistant to training. Eckes (2008), for instance, concludes that raters’ preferences for certain strategies or their individual level of leniency or harshness may always remain. Knoch (2011a, p. 180) raises the concern that “a great deal of rating behavior [may be] fixed, depending on the raters’ background or their individual rating styles”. The effects of such individual differences need to be acknowledged and accounted for. As long as a rater’s internal consistency is given, and as long as all raters have come to a shared agreement on how the rating scale is to be understood, interpreted, and applied, the remaining inter-​rater differences can be accounted for. In large-​scale assessment, statistical methods can be used to adjust the resulting scores for harshness/​leniency effects. In classroom contexts, double marking helps alleviate such rater differences, as will be discussed in the “Differences large-​scale vs classroom-​based diagnosis” section in this chapter. Another angle that may need to be considered is the ratings produced by novice vs expert raters. While some raters are experts in the context that they are familiar with, this expertise may result in unwanted rater variability when it comes to new contexts. In these contexts, raters need to be re-​trained, just like novice raters have to be trained. Knoch (2011a) even goes so far as to question whether it is useful to employ experienced raters in new contexts, as they may transfer their rating experience and may find it hard to adapt to the new context. Differences large-​scale vs classroom-​based diagnosis

As indicated in previous chapters, conditions of diagnostic assessment differ depending on whether the diagnosis takes place in a classroom, in a large-​scale monitoring study, or in a research context. In the latter two contexts, enough resources are usually available to design and validate rating instruments and to train raters in scale interpretation and application. Furthermore, such contexts generally monitor rater consistency, be it with regard to internal consistency, between rater consistency, or consistency of ratings over time. It can also be expected that in large-​scale and research contexts, rater effects will be accounted for by statistical means. In typical diagnostic assessments in the classroom, however, the resources needed to carry out these analyses and the technical know-​how for carrying

Analyzing writing products by rating them  201

them out should not be presumed. What can, however, be presumed in classroom contexts is that teachers ‒ unlike raters in large-​scale assessments such as DELNA ‒ know their students, and are familiar with their backgrounds, linguistic and otherwise. Also, teachers and students can be assumed to share a common understanding of what constitutes good writing, and students should be familiar with the formats used for eliciting writing skills. This familiarity and shared knowledge creates a setting in which shared understanding amongst raters can be assumed to exist between teachers within one institution, and also between teachers and students within one class. Nevertheless, as teachers’ experiences and competences vary, such shared knowledge may not necessarily lead to useful diagnosis. Teachers, like raters, need to develop diagnostic competence (Edelenbos & Kubanek-​German, 2004). The classroom is also a context in which diagnostic self-​a ssessment may be a feasible way to involve learners actively in diagnosis and in planning their learning journey. Just like teachers, learners too need to be trained in self-​d iagnosis. One means to achieve diagnostic competences among teachers and learners is to involve them in the construction of diagnostic instruments for the classroom. Teams of teachers or teachers along with their students can co-​construct checklists or scales. The relevant performance criteria and checklist statements can be derived from learning outcomes, teaching goals or educational standards, which should be known to students and teachers alike. The diagnosis can be conducted by teachers using double marking, where the results can be discussed and negotiated, as recommended by Weigle (2007). Alternatively, self-​and peer-​d iagnosis can be employed in addition to the teacher ratings, and the results can be discussed in class. Feedback can then be tailored to the needs of the writer group, aligned to the learning and teaching goals, indicating particular strengths and weaknesses. Based on the feedback, teachers and learners can plan further steps in cooperation. Such collaborative assessment between teachers and learners, be it in the design, execution or feedback phase, not only raises awareness of the values and learning outcomes, but also trains students to identify their own and their peers’ strengths and weaknesses, and familiarizes them with giving and receiving constructive feedback to and from peers. Furthermore, we assume that collaborative assessment also has a positive effect on teachers’ diagnostic competence: It is likely that by getting a better understanding of what students do when they are assessing their own writing (e.g., which aspects they find difficult to diagnose, and which ones easier), teachers’ skills in diagnosis improve. In other words, as teachers become more aware of what metacognitive skills different kinds of students are likely to have, and what students’ limitations are in that regard, the better teachers will be able to provide students with useful feedback and guidance. The importance of feedback resulting from diagnostic assessment, and the washback of diagnosis on learning cannot be stressed often enough. Be it in the

202  Analyzing writing products by rating them

classroom or in large-​scale assessments, means should be put in place that facilitate the take-​up of feedback. As a first step, the descriptors or checklists employed in the diagnosis can be used for reporting, and they can be accompanied by qualitative comments on particular areas that need improvement. The written products and the ratings can also be used as basis for consultancies, as is the case in the aforementioned DELNA large-​scale diagnosis. Such consultancies could be framed in the sociocultural paradigm, making use of the principles of dynamic assessment and mediation. We will explore this aspect in more depth in Chapter 9. Limitations of conventional rating scales for diagnosis

A diagnostic approach that focuses on a written end-​product is of course limited to observing textual features displayed in the final product. Diagnosing processes as we described in Chapter 6 are beyond the capacities of a rating scale approach. What is, however, possible is to assess a series of written drafts as they are edited and revised over a period of time. This way, it may be possible to capture and analyze certain editing and revision processes. Furthermore, a series of different texts elicited by increasingly complex tasks, perhaps documented in a portfolio, may be assessed by the same rating scale, thus tracing and diagnosing writer development over time. When it comes to diagnosing cognitive processes, these can be observed by, for example, keystroke logging, eye-​tracking, or intro-​and retrospective approaches, as discussed in Chapter 6. All these methods yield data that have to be analyzed diagnostically. Here, diagnostic grids and checklists that focus on relevant indicators of cognitive processes can be developed following the principles outlined above in this chapter. We have to concede that such procedures are most often employed for research purposes, less so for the purpose of giving diagnostic feedback to writers. Another limitation of the rating approach is its resource intensity. In any context, only limited resources are available for diagnostic assessment, while the design and validation of diagnostic rating scales, and rater training are very resource intensive. It may therefore be beyond the capacities of individual teachers to follow all steps outlined in this chapter. Nevertheless, as we indicated in the “Differences large-​scale vs classroom-​based diagnosis” section in this chapter, there are low-​cost alternatives that classroom teachers can employ with colleagues or in collaboration with their learners. Finally, we would like to indicate a few alternative or complementary approaches that we consider worth exploring: • Some assessment studies have coded certain atomistic features (for a review of early studies, see Miller & Crocker, 1990). One example in a large-​scale study is the assessment of German L1 proficiency in the German National

Analyzing writing products by rating them  203

Language Assessment Study DESI (e.g., Neumann, 2012). For assessing writing skills, a combination of holistic and analytic criteria was used, with 22 formal and 25 content-​related analytic features being assessed in a dichotomous way (0–​not present; 1–​present), whereas the holistic ratings were based on a five-​point scale. All the features were benchmarked to help the raters. The 22 formal features were grouped into categories such as addressee, recipient, date, subject matter, start and ending of the letter, description of actions, and a complimentary close and the 25 content-​related features into source of the information, description of the situation, deficits/​ problems, claims/​ suggestions and consequences. In addition, there were three language features (orthography, morphology, sentence construction) and three pragmatic features (text organization, style, word choice), all rated on a five-​point scale (see Neumann, 2012, p. 44). The dichotomously scored features were then treated like reading or listening items: The more features were present in a text, the higher the resulting score. This approach has diagnostic potential if the features defined in the atomistic criteria capture diagnosis-​relevant aspects. Then, the atomistic criteria along with their dichotomous coding can show the learners what they can already do and where they need to improve. Nevertheless, since such atomistic criteria specify task-​specific expectations (e.g., is a certain linguistic feature realized), such an approach only generates task-​specific feedback that may be difficult to generalize beyond the very expectations of one task. Here, it is helpful to have accompanying criteria in the form of a traditional rating scale, as was the case in the German writing assessment of the DESI study. Neumann (ibid.) outlines how formative feedback can be based on such a combinatorial approach. • In order to capture specific subskills of writing, we refer to Chapter 5 on writing tasks and the possibility of complementing rating approaches with discrete tests on certain subskills, such as vocabulary tests or tests on discourse knowledge. Diagnostic feedback can then report both the rating results and the discrete test results, providing a richer picture. • Furthermore, rating approaches can be complemented by automated scoring of discrete features, as is often done in SLA research, where syntactic aspects such as t-​units or lexical complexity measures are captured. We will discuss all aspects related to automated scoring in Chapter 8; suffice it to state here that such discrete features can enrich diagnostic assessment and feedback, similar to the atomistic features or the discrete tests just mentioned. Conclusions

Summing up the main points of this chapter, rating written performances for diagnostic purposes is best done by using an analytic approach, where

204  Analyzing writing products by rating them

relevant linguistic, textual or discourse features are described in several distinct assessment criteria. The wording of these descriptions, which can be called ratings scale descriptors or performance criteria statements, is of utmost importance. Not only do the descriptions have to convey the understanding of what constitutes good writing in a given diagnostic context, they also should be aligned to the teaching goals, learning outcomes, curricula or competency frameworks relevant for the context. Diagnostic rating can be done by either using a checklist that describes one specific performance level, or by employing a rating scale that spans multiple performance levels or bands. The decision of using one or the other depends on the degree of heterogeneity of the learners’ proficiency levels. For raters to be able to understand, interpret and apply the rating scale or checklist consistently, reliably and validly, training is needed during which a shared understanding of the expected writing features can be developed; this is best supported by benchmark texts which illustrate the relation between text features and rating scores. Finally, it is worth considering whether diagnosis takes place in a large-​ scale context where raters do not know the learners and hence may need more training, or whether it takes place in classroom settings where teachers as diagnosers can be presumed to know their students, and teachers and students can be assumed to have developed a shared understanding of what constitutes good writing. In these contacts, students can be involved as agents in diagnosing text products, may they be their own or those of peers; this should further contribute to foster students’ writing skills.

8 AUTOMATED ANALYSIS OF WRITING

Introduction

Automated analysis of writing involves the analysis of a written text in terms of different production features of the text by a computer program. Although such activities may seem to be limited to very special situations, automatic text analysis and feedback protocols are, in fact, ubiquitous in our daily lives in the form of spell checks, grammar checks, and other online resource tools. Automated analyses and feedback are available in a wide range of languages for a number of different devices. For instance, MS Word 2016 offers a total of 16 different English language dictionaries, ranging from US and UK English to the English used in Belize and Zimbabwe. For Spanish, 22 dictionaries are available to the writer. Depending on our choices, MS Word can also highlight grammatical errors or repeated words, and can even give a readability estimate for the text. Word is also capable of offering stylistic advice on punctuation, lexical choice, and sentence structure. Besides analysis, these text editors can provide the writer, or somebody else, such as a teacher, with immediate feedback on a range of production features. The program can also suggest corrections and even perform them automatically. Many of the same automated analyses and autocorrection functions are also incorporated into other common programs and devices such as smartphones. The feedback described above from one well-​k nown text editor provides information on both the writing product and process. In other words, the feedback is provided to writers on their actual text –​the product of writing –​and

DOI: 10.4324/9781315511979-8

206  Automated analysis of writing

is offered to writers while they are generating text –​the process of writing (see Chapter 6). As automated analysis and feedback tools are part of everyday life, it is not surprising that these technologies have also been used in assessment contexts. In this chapter, we describe what is involved in automated analysis of writing, how it is used for assessment, and how it contributes to diagnosing SFL (second or foreign language) writing ability. We cover computer programs implemented in both high-​stakes language examinations and in tools that provide feedback and guidance to learners. An important part of the chapter is a discussion about the constructs that automated assessment programs can, or cannot, measure. We also consider the types of feedback that these tools deliver and conclude by reviewing some diagnostically promising directions in this field. Since automated analysis is carried out when the writer produces text, or immediately after completing the text, this places the matters discussed here not only on the assessment stage of the diagnostic cycle (see Figure 1.1 in Chapter 1), but also on the feedback stage, as these computer systems provide users with information about their writing. However, given the importance of what automated assessment can or cannot analyze, our discussion also concerns the first stage of the cycle which focuses on construct definition. Need for automated analysis of writing

One of the main reasons for the interest in automated analysis of writing is the amount of time and effort it takes to read, evaluate, score and give feedback to writers. Automated analysis is potentially very useful for individual writers using a text editor, for teachers who want to increase the amount of feedback given to their students, or for examination boards that have to score large numbers of examinees. Language examination providers in particular have been interested in investigating automated scoring of writing. Valid and reliable assessment of writing is a time-​consuming, laborious, and expensive operation. The quality of rating written performance cannot be taken for granted. It requires human raters with extensive training. The time it takes for raters to be trained to levels required by high-​stakes examinations is one, but obviously not the only reason, why automated scoring is attractive. As discussed in Chapter 7, human raters face many challenges when assessing writing, and automated rating systems can address some of these challenges. The strengths of automated systems are rooted in the fact that the evaluation is carried out by computers, which are fast and consistent, unlike the slow and often inconsistent human raters. Also, computers do not get tired, suffer from lack of motivation, or have lapses of attention. Neither do they forget to consider something that they ought to consider. They are not bothered

Automated analysis of writing  207

by the halo effect or the order effect. Rather, they can evaluate all the intended criteria separately without crossover, and they are not affected by the order of the texts that they judge. Furthermore, they are not overwhelmed by the number of performance features they have to rate. Instead, each text can be analyzed along even hundreds of criteria, whereas human raters can meaningfully attend to only a handful of features. And the automated systems are not overwhelmed by the number of texts they have to analyze. Finally, the ‘training’ (or perhaps ‘reprogramming’) given to computers is ensured to have the intended effect, whereas human raters are very difficult to train, particularly as concerns their leniency vs severity (see Chapter 7). More accurately, however, automated systems undergo procedures that resemble the training of human raters in that both try to learn from previously (human) rated texts. In the case of computer systems, they gradually adjust their algorithms to match the ratings given by humans to the texts in the training materials (see the “Examples of automated essay scoring systems” section in this chapter). In short, the key advantage of automated systems is that they save resources, since they process large numbers of texts fast and are cheaper than human raters (at least when operational; the start-​up costs of such systems can be substantial). Therefore, large-​ scale examination providers have been very interested in developing automated scoring systems. The fact that automated systems are so efficient and save resources also means that diagnostic/​formative assessments can become more feasible. Teachers have almost never enough time to diagnose their students’ writing in detail and provide them with extensive feedback, but now automated evaluation tools can do much of that for them. As will become clear later in this chapter, these systems cannot, and they are not intended to, replace the teacher; rather, the two can complement each other’s strengths and weaknesses. Automated scoring vs automated evaluation

In the Encyclopedia of Applied Linguistics, Burstein (2013) distinguished between automated essay scoring systems that provide their users with only a score, from automated essay evaluation systems that give feedback. Interestingly, Burstein (2013, p. 1) referred to the feedback provided by the latter types of systems as diagnostic feedback. She does not define diagnosis but appears to refer to the analytical nature of the automated feedback and, more specifically, its wide coverage of different linguistic, textual/​d iscoursal, and stylistic features. (collectively “production” features). Since the present chapter concerns automated analyses that provide potentially diagnostic feedback rather than scores, we focus on what Burstein calls automated evaluation systems. As many others have done, we call them automated writing evaluation (AWE) systems because of our focus on writing.

208  Automated analysis of writing

However, we give an overview of the development and current state of the automated scoring systems as well because automated scoring and evaluation engines scrutinize rather similar features of texts. Furthermore, some systems provide both scores and detailed feedback (e.g., the Pearson Test of English Academic). In others, separate systems exist for scoring and feedback, but they use the same engine; this is the case with Educational Testing Service’s e-​rater. On the one hand, e-​rater is used for scoring purposes in the TOEFL iBT, for example; on the other hand, it is implemented in Criterion, which is a separate system providing English language learners with formative feedback. Automated scoring systems

Automated scoring started in the 1960s with Project Essay Grade (PEG; see Burstein, 2013; Page, 1966; Shermis et al., 2013). Because computer technology was then limited to bulky mainframe computers and entering data was laborious, PEG was not a practical, commercial application like the more modern scoring engines, but rather a research tool for exploring what features of writing could be analyzed and whether they could predict human ratings. The project investigated a rather wide range of textual features including measures of length of words, sentences, paragraphs; counts of punctuation marks and specific types of words including prepositions, connectors, relative pronouns, subordinating conjunctions; counts of misspelled words; and a measure of word frequency. In the Page and Paulus (1968) study, the best predictors of human ratings were the average word length (and variation in it), word frequency, essay length, number of commas, dashes and hyphens, and spelling. On the whole, PEG relied only on the surface features of texts (Shermis et al., 2013, p. 9). Automated essay analysis and scoring started to develop more rapidly in the 1980s and 1990s with the coming together of three developments (Shermis et al., 2013). The first was the emergence of microcomputers and different word processors. These technologies made it easy to produce electronic (rather than handwritten) texts that could be analyzed automatically. The second key development was the Internet, which created a convenient platform for administering automated analyses (see for example Coh-​Metrix discussed in this chapter). The third evolution involved progress in natural language processing (NLP), of which the functionalities applied in word processors are just one example. NLP has to do with different computational technologies that carry out language analyses that typically cover morphological, syntactic, and semantic aspects of texts (Shermis et al., 2013, p. 7; more on NLP, see Cahill & Evanini, 2020). The above developments contributed to the introduction of several automated essay scoring systems in the 1990s. One of them was in fact an online version of PEG (Shermis et al., 2001); the concept that Page (1966) had proposed 35 years

Automated analysis of writing  209

earlier. The newcomers in this field included e-​rater by Educational Testing Service (ETS), IntelliMetric by Vantage Learning, and Intelligent Essay Assessor by Pearson Educational Technologies (Shermis et al., 2013; for an overview of scoring systems; see also Dikli, 2006). Examples of automated essay scoring systems

The e-​rater system is used by ETS in several examinations for both L1 and SFL speakers of English, including the GRE (Graduate Record Examination) and TOEFL iBT, where it is used together with a human rater to score essays. E-​ rater uses NLP to extract production features from essays to predict holistic human ratings (Enright & Quinlan, 2010). The program analyses dozens of features that form several macro-​level aspects: grammar, usage, mechanics, style, organization, development, preposition usage, collocation density, lexical complexity, and, in the prompt-​specific scoring models, also topic-​specific vocabulary usage (Deane, 2013a; Enright & Quinlan, 2010). The system focuses on linguistic accuracy and range, as well as textual structure. It does not analyze content (apart from topic-​specific vocabulary), argumentation, or coherence (H. Lim & Kahng, 2012). E-​rater is also used in the instructional application called Criterion, which supports writing development by providing feedback on a number of the same features analyzed by the e-​rater engine (see the “Examples of automated writing evaluation systems” section in this chapter). IntelliMetric is a system developed by Vantage Learning (Schultz, 2013; for online demos in several languages, see www.intell ​i met​r ic.com/​d ir​ect/​#sp-​ overv​iew). The system combines artificial intelligence, NLP technologies, and statistical methods to simulate human ratings (Dikli, 2006; Schultz, 2013). The program gives both a holistic score and diagnostic feedback related to five dimensions of writing: (1) focus and meaning, (2) content and development, (3) organization, (4) language use, voice, and style, and (5) mechanics and conventions (Schultz, 2013, p. 90). The dimension scores are based on an analysis of over 300 syntactic, semantic, and discoursal features (Dikli, 2006, p. 16). IntelliMetric’s scoring system is trained with a large number of human-​rated essays. An instructional application called MY Access! uses the IntelliMetric scoring engine to provide feedback in several different languages (Dikli, 2006). Intelligent Essay Assessor (IEA) is an automated scoring system owned by Pearson Knowledge Technologies (Dikli, 2006; Folz et al., 2013). For example, Pearson Test of English Academic uses it as the scoring engine for writing tasks. The system focuses on analyzing the content and meaning of texts by using Latent Semantic Analysis (LSA; see the “What happens in automated writing analysis?” section in this chapter), but it also covers certain other aspects of writing. According to Folz et al. (2013, p. 78), IEA scoring involves content (using LSA), lexical sophistication (e.g., word maturity, variety), style,

210  Automated analysis of writing

organization and development (overall and between-​sentence coherence, topic development), grammar (e.g., n-​g ram features –​i.e., combinations of 2, 3, etc., words –​grammatical errors and error types), and mechanics (e.g., spelling, capitalization, punctuation). IEA is trained to associate these features with the human ratings of the essays in the training material. The IEA engine also underlies other Pearson programs that aim at helping students to develop their writing (and reading) skills, such as WriteToLearn (Pearson, 2010). Implications for diagnosis. Automated scoring systems are potentially very useful diagnostically because they analyze a wide range of features. For certification or admission purposes, only an overall score or sub-​skill scores are needed, so little diagnostic information is made use of in such contexts. However, the fact that many scoring engines are implemented in programmes that specifically intend to support writing development indicates that the designers of such systems recognize their pedagogic usefulness. Examples of automated writing evaluation systems

Criterion is an online tool that supports the development and instruction of writing in English across several genres. It provides its users with instantaneous holistic scores and detailed feedback on their texts. However, the automated analysis of writing is only part of Criterion as it includes tools and activities that can be used in all the stages of the writing process such as templates for planning the text, peer editing, online writing conferences, and the submission of revised versions of the text. The teacher can modify the writing tasks and complement those offered by the system and select which feedback Criterion presents to the learners. There are also several versions of an online writer’s handbook, which explain the feedback and provide examples and instructions about improving writing. The core of Criterion is the e-​rater scoring engine developed by Educational Testing Service (Burstein et al., 1998; Burstein, 2013). The e-​rater system uses NLP to extract features from essays to predict holistic human ratings (Enright & Quinlan, 2010). Criterion extracts the same information but uses it as the basis for its feedback which covers linguistic accuracy and range, as well as textual structure. However, some aspects of writing such as content (apart from topic-​ specific vocabulary), argumentation, and coherence are not analyzed but are left to the teachers using the system with their students to deal with. Criterion can be used for analyzing and revising multiple drafts of the text; it has a planning tool that contains eight planning templates for different text types, for example, compare and contrast or persuasive texts, and a writer’s handbook that the students can consult both during the writing process and when studying the feedback (Chapelle et al., 2015).

Automated analysis of writing  211

Intelligent Academic Discourse Evaluator (IADE) was developed at Iowa State University to complement writing instruction of non-​native English-​ speaking students who were learning to write research papers in their specific disciplines (Chapelle et al., 2015; Cotos, 2014). IADE focuses on the introductions of research papers and provides feedback to students about how their draft introductory sections conform to the expectations of research articles in their field with respect to three conventional moves (see Swales & Feak, 2004): establishing an area of disciplinary knowledge on the chosen topic, identifying a niche in that area, and showing how the writer’s study addresses that niche. The tool uses colour coding to indicate which parts of the student’s introductory text correspond to each move and reports how long the introductions in the discipline are generally, and what proportion (percentage) of them falls into each move. IADE also gives ranges, that is, minimum and maximum percentages for each move in typical introductions. Based on the comparison of the length and distribution of text in the student’s introductory moves, IADE then recommends the student decreases or increases the size of each move, and possibly the introduction as a whole. Writing Pal (W-​Pal) differs from the systems such as Criterion or MY Access! (see IntelliMetric above) in that it did not start as an automated essay scoring system, but was specifically designed to serve writing instruction and to cover the entire writing process from prewriting (i.e., planning) and drafting to revising (Roscoe et al., 2014, pp. 40–​41). For this reason, we describe the system in more detail and include a description of how it was modified based on research on its use in writing instruction. Writing Pal is an intelligent tutoring system (ITS) that aims to improve US high school students’ essay writing. Thus, it was designed for L1 English speakers, but the system is probably suitable also for such SFL learners whose English is strong enough to understand the information provided by the system, possibly with the teacher’s help. Similar to some other systems reviewed earlier, Writing Pal combines automated analysis of learners’ texts with other types of activities where both the learners and teachers often have important roles. Writing Pal specializes in scaffolded strategy instruction, which takes place through interactive lessons, games, and essay writing tasks (Dai et al., 2011). Learners interact with Writing Pal mostly by engaging in the activities provided by the system; the teacher’s role is to guide the process by selecting and sequencing activities, monitoring learners’ progress, and providing any support needed. The writing strategy instruction in Writing Pal is divided into modules that correspond to the three main stages of the writing process: planning, text generation, and revising (see Chapter 6). In the planning or prewriting phase, the students are taught to use their prior knowledge, generate content for the essay, and design a coherent plan for the text. The text generation or drafting

212  Automated analysis of writing

modules instruct how to start the essay, develop an argument, and conclude the text effectively. The final modules guide the students in reviewing the texts and in improving their clarity and coherence. All modules include lessons carried out by three computer-​simulated characters –​a “teacher” and two “students” –​ and practice tasks called Challenges that take place in a game environment. The Challenges can request students to identify and classify examples such as conventional elements of an essay, or to organize given pieces of information into an outline for the essay. Students may also be required to write a whole text or a part of it (see Dai et al., 2011; Roscoe & McNamara, 2013; Roscoe et al., 2014, for details). Mnemonics in Writing Pal have been used to support students’ strategy use. For example, the RECAP mnemonic can help students build a conclusion to their essays (RECAP =​“Restate the thesis; Explain how arguments supported the thesis; Close the essay; Avoid adding new evidence; Present the conclusion in an interesting way”; Roscoe et al., 2014, p. 42). In Writing Pal, automated textual analyses are conducted on the Challenges tasks based on NLP algorithms (Dai et al., 2011, pp. 6–​9). The analyses first focus on text length, relevance, and paragraph structure, followed by requests to the students to improve their texts based on the feedback. Next, the system evaluates the revised essays holistically, presents the result to the students and then draws their attention to the parts or aspects of the text that could be further improved, as well as suggests appropriate strategies. We next describe how Writing Pal was modified after research showed that the users were not benefiting from some of its functions. Few published accounts exist of how automated analysis systems change and why. Obviously, all major providers of automated scoring engines continuously fine-​tune their systems used for both scoring and formative evaluation purposes. However, details of the modifications to these proprietary software systems are not usually disclosed. Therefore, it is instructive to examine a major revision of Writing Pal, particularly as it was designed with pedagogical considerations in mind. Roscoe et al. (2014) investigated over 100 mostly L1 English-​speaking 10th graders who used Writing Pal for six months and found a statistically significant 0.5 point increase on a 6-​point scale between pre and post essays. They also found that most students were satisfied with what the first version of Writing Pal offered. However, almost 40% of the students found some aspects of the system difficult, not useful, or uninteresting. Therefore, the developers set on modifying the system to better meet the users’ needs. These were the most significant changes: The originally long video lessons on strategy use were divided into shorter, 5-​m inute lessons. Each module now started with an overview of the rationale and preview of the module followed by 3–​ 4 videos focusing on relevant strategies. Instead of three characters, the revised videos show just one figure who presents the topic; the discussion among the different characters had been

Automated analysis of writing  213

considered distracting. The games were modified by creating different versions in terms of difficulty (easy, medium, hard) to cater for varying student abilities and preferences. More interactivity and opportunities for generating text, rather than just recognizing examples of essay elements, were added to the games to increase engagement (Roscoe et al., 2014, p. 53). The improvements in Writing Pal feedback aimed at making negative feedback less threatening by attempting to increase students’ confidence in their ability to improve their writing (Roscoe et al., 2014). To achieve this, feedback on problems was made less personal, for example, your essay was changed into this essay in feedback on errors and weaknesses, whereas proposals for using strategies to overcome the identified problems were changed in the opposite direction: they were made personal and more specific. Users were also given more control over the amount of feedback to avoid overloading them. Automated writing evaluation systems: Implications for diagnosis

Automated writing evaluation systems are tools whose purpose is to support writing development; therefore, they are extremely useful for diagnosis. Some of them (e.g., Criterion) are based on scoring engines that were originally developed for high-​stakes testing purposes, while others (e.g., Writing Pal) have been specifically designed with formative pedagogical aims in mind. The writing evaluation systems reviewed here have significant diagnostic potential, particularly when used in educational contexts with teacher support. Criterion, IADE, and Writing Pal cover all the stages of the diagnostic cycle (Figure 1.1) by not only diagnosing weaknesses and providing feedback but also by giving advice on further action (see also Chapter 9). All three also support process writing although whether the learners revise their first text version and submit new drafts for further analyses depends on their choice. We will report on studies on the use of some AWE systems, including Criterion, later in this chapter to examine the advantages and issues with these tools. Space does not allow for an analysis of many other, potentially useful automated writing evaluation systems. A recent review of over 40 such programs by Strobl et al. (2019) sheds light on the main trends and issues. The review found that the tools typically focus on one genre and one language (academic essays and English) while other genres and languages were clearly less common. They also discovered that most systems focus on the linguistic aspects of language at the word and sentence level while programs for developing writing strategies and macro-​level text structure were rare. Three points seem to affect the diagnostic usefulness of AWE systems. The first concerns the constructs that they measure –​in other words, which aspects of SFL writing they cover and which they don’t. Obviously, the more aspects of writing they can analyze, the better, at least in principle. The second

214  Automated analysis of writing

concerns the accuracy of analyses. The more accurate (reliable, systematic) the information they can extract, the better. Thirdly, they should cover the entire diagnostic cycle from a conceptually sound analysis of learners’ writing ability to intelligible feedback and appropriate action. AWE systems are often used in educational contexts where teacher support is available. Some of the activities, for example, in Writing Pal actually assume teacher involvement. We will discuss what automated systems cover in more detail in the section on constructs in automated assessment of writing in this chapter. At this point, we just note that many systems are likely to be limited in what they assess, as the review by Strobl et al. (2019) revealed. Automated text analysis tools for research purposes

Many tools exist that can extract detailed information from texts written by both L1 and SFL speakers (for an overview, see Pirnay-​Dummer, 2016). Many of these are only available for English, but programs have also been developed for other languages. The systems that are outlined next are mostly used in research –​sometimes for validating automated evaluation systems (e.g., Dai et al., 2011; McNamara et al., 2015). Coh-​Metrix (www.cohmet​r ix.com) is a free online program (Figure 8.1) developed at the University of Memphis, USA, that analyzes English texts in terms of over 100 linguistic and discoursal indices (Graesser et al., 2004; McNamara et al., 2010, 2014). It calculates descriptive indices of word and sentence length, and variation in these, as well as features that contribute to text coherence (e.g., referential expressions, connectives, noun and verb overlap). It performs Latent Semantic Analysis as well as analyzes lexical and syntactic complexity and diversity. Coh-​Metrix has been applied in L1 and SFL research

FIGURE 8.1 Interface of the Coh-​Metrix Web Tool (available at http://​tool.cohmet​

rix.com).

Automated analysis of writing  215

on text cohesion (e.g., Graesser et al., 2004) and in various comparative studies, for example, texts written in test vs academic study contexts (Riazi, 2016), electronic vs handwritten texts (Barkaoui & Knouzi, 2018; Kim et al., 2018), texts written for different prompts (F. Liu & Stapleton, 2018), and texts written in response to independent vs integrated writing tasks (Guo et al., 2013). Coh-​Metrix research has also covered such specific topics as the relationship between linguistic characteristics of texts and holistic ratings (e.g., Aryadoust & Liu, 2015; Crossley & McNamara, 2011; McNamara et al., 2010; Khushik & Huhta, 2020, 2022; Vögelin et al., 2019), differences between L1 and SFL speakers (Crossley & McNamara, 2009), characteristics of inconsistently rated essays ( J. Lim, 2019), rating scale revision (Banerjee et al., 2015), effect of the task and learners’ L1 background on texts (Allaw, 2019), and English language learners’ syntactic development (Crossley & McNamara, 2014; Crossley et al., 2016; on syntactic complexity, see also Lu, 2017). A version of Coh-​Metrix has also been designed for analyzing texts written in Spanish (Quispersaravia et al., 2016). L2 Syntactic Complexity Analyzer (L2SCA) is a free tool that analyzes syntactic complexity in written English. It was developed at the Pennsylvania State University, USA, by Xiaofei Lu (see Lu, 2010). L2SCA calculates 14 measures of syntactic complexity relating to the length of production units, coordination, subordination, and phrasal sophistication (Figure 8.2). The tool has been used in studies of syntactic complexity (e.g., Ai & Lu, 2013; Khushik & Huhta, 2020, 2022; Lu, 2011; Lu & Ai, 2015). Ai and Lu (2010) have also designed L2 Lexical Complexity Analyzer that focuses on vocabulary and computes 25 indices of lexical density, sophistication, and variation (Figure 8.2). Both tools are available as online and downloadable versions. Users can input just one text, or two if they are interested in comparing texts in terms of their linguistic features. To illustrate how the L2SCA reports on syntactic complexity, Figure 8.3 displays the visual output for extracts from the current book: Text 1 is the final paragraph in the section on Hattie and Timperley’s model of feedback in Chapter 9 and Text 2 is the second paragraph of that section. For the abbreviations shown in Figure 8.3, see the right side of Figure 8.2. The first index of syntactic complexity in Figure 8.3 is S, which refers to the number of sentences in the two texts. We can see that the line for Text 1 is shorter indicating that it has fewer sentences (5) and that Text 2, with a longer line, has 7 sentences. In addition to visual output, the L2SCA produces a file containing numerical values for the indices that can be used, for example, in statistical analyses. Several other tools have been developed for analyzing various textual features by researchers who were involved in Coh-​Metrix (e.g., Scott Crossley, Danielle McNamara) or whose research contributed to these tools (Kristopher Kyle; see

216  Automated analysis of writing

FIGURE 8.2 List

of Linguistic Measures Analyzed by the L2 Lexical Complexity Analyzer (left) and L2 Syntactic Complexity Analyzer (right), available at https://​a ihaiy​a ng.com.

Kyle, 2016). These programs are freely available at the site NLP Tools for the Social Sciences at www.ling​u ist​ican​a lys​isto​ols.org/​ and include the following: • TAACO –​Tool for the Automatic Analysis of Cohesion (Crossley et al., 2016; Crossley et al., 2017, 2019) • TAALES –​Tool for the Automatic Analysis of Lexical Sophistication (Kyle & Crossley, 2015; Crossley et al., 2017; Kyle & Crossley, 2016; Kyle et al., 2018; Monteiro et al., 2018) • TAALED –​Tool for the Automatic Analysis of Lexical Diversity • TAASSC –​Tool for the Automatic Analysis of Syntactic Sophistication and Complexity (Crossley et al., 2017; Kyle, 2016) • SEANCE –​Sentiment Analysis and Cognition Engine (Crossley et al., 2017; sentiment analysis involves computationally identifying and categorizing opinions expressed in text as positive, negative, or neutral)

Automated analysis of writing  217

FIGURE 8.3 Visual

Output Sample from the L2 Syntactic Complexity Analyzer.

Implications for diagnosis. The above tools are mostly research instruments but sometimes have been used in AWE development. They provide users with information about the characteristics of texts, which has obvious diagnostic potential. Some research exists on their use for formative assessment purposes (e.g., Wilson et al., 2017, who used Coh-​Metrix). Diagnostic use of these tools requires support by teachers because the output from them consists of a large number of opaque numerical indices. Extracting formative meaning from such indices is challenging, but worth exploring in future research. As was noted at the beginning of this section, the text analysis tools presented here are mostly used for research purposes and their diagnostic value lies mainly in that some of them have been used in the development and validation of automated evaluation tools used for diagnosing writing. Using text analysis tools as diagnostic tools would be difficult as many of the indices they calculate are not readily understandable. For example, the value of 1.16 for the index of syntactic complexity called ‘complex nominals per T-​unit’ is totally opaque as such and would not be diagnostically useful for a language learner. While some other indices are more transparent (e.g., mean sentence length as number of words), they, too, are probably of limited diagnostic usefulness. Only when the numerical values are turned into pedagogically meaningful feedback can they help learners. For example, an analysis of a learner’s text based on certain indices might show that the text is syntactically more (or less) complex than what is typical of the particular type of text. Such feedback could then be followed by an analysis of the text, probably with a teacher’s assistance, to see

218  Automated analysis of writing

what in the learner’s use of syntactic structures causes that difference and how the text could be modified, if the aim is to be able to produce texts that follow the typical syntactic patterns of the target texts. What happens in automated writing analysis?

Automated writing analysis systems typically employ natural language processing (NLP) technologies that draw on computer science or artificial intelligence. For example, Writing Pal uses NLP resources such as lemmatizers, syntax parsers, latent semantic analysis, rhetorical analyzers, and different lexical databases (for example, Princeton University’s WordNet, CELEX by the Max Planck Institute for Psycholinguistics, and the MRC Psycholinguistic Database; for details, see Roscoe et al., 2014, p. 40). The development of the analytic algorithms in Writing Pal was based on a large collection of essays and on correlating the essay ratings with the Coh-​Metrix analysis of the various production features in the essays. The results of such linking of human ratings and production features were then used when feedback to students was designed (Dai et al., 2011, p. 9). NLP tools are usually combined with statistical modeling and machine learning technologies. Statistical modeling involves, for example, building various types of regression models to predict human ratings from the production features (see Yan & Bridgeman, 2020, for details). The field of machine learning, as explained by Folz et al. (2020, p. 5) “focuses on algorithms that can process large amounts of data to analyze the patterns of across and within the features and infer relationships to criteria of interest”. For example, the approach used by e-​rater and IntelliMetric (and, hence, in Criterion and MY Access!) combines these and is based on essay corpora annotated for various features. Statistical analyses are used to identify those features that predict human ratings, and algorithms are then developed to implement the predictions. The Pearson Intelligent Essay Assessor (IEA) is also based on comparisons of test takers’ essays with a corpus of essays whose human ratings are known. However, IEA uses Latent Semantic Analysis (LSA), which focuses on the semantic similarity between new essays and the essays in the reference corpus. More precisely, LSA focuses on word meanings and their relationships at different levels of context (sentence, paragraph, whole text) and calculates the distances between the texts to estimate the quality of new essays (see Landauer et al., 2003). IntelliMetric, too, uses LSA in its analyses but less extensively than IEA (see the section “Examples of automated writing evaluation systems” in this chapter). Overall, essay evaluation systems work better when analyzing texts that are similar in prompt to those in the reference corpora on which the algorithms underlying the systems are based (Wohlpart et al., 2008).

Automated analysis of writing  219

Constructs measured by automated writing assessment systems

Deane (2013a) argued that in order to understand the constructs measured by AWE systems, it is useful to distinguish text quality from writing skill. Text quality refers to the features of the text, viz. the product of writing, whereas writing skill addresses writer characteristics including how writers control the writing process (which can be accessed by using, for example, keystroke-​ logging; see Chapter 6). The text quality approach to writing assessment, Deane (2013a) maintains, dominates standardized testing and most automated scoring systems. The typical writing task in high-​stakes assessment contexts is a holistically rated, timed essay, which is assessed with reference to features typical of many rating scales (see Chapter 7 for examples). Deane (2013a) used the Six-​Trait model of writing developed at ETS (Spandel, 2012) to illustrate features in text quality. The model specifies the following characteristics which, in essence, define the construct of writing as seen from the perspective of text quality: • Content /​ideas –​meaningfulness and completeness of thoughts; existence of a theme; • Organization –​logical order and presentation of ideas, paragraphs, transitions between paragraphs; • Voice –​tone, sincerity, appropriateness for the audience; • Word choice –​precision, naturalness, variety; • Sentence fluency –​variation in sentence structures, flow and rhythm; • Conventions –​grammar, punctuation, spelling, capitalization. Deane (2013b) argued that from the socio-​cognitive perspective, text quality is an unstable construct because writers may not be able to produce texts of equal quality for different purposes and in different genres. Conversely, one and the same text may be judged differently depending on the context and purpose. The other view of the assessment focuses on writing skill. Deane (2013a) used the Framework for Success in Postsecondary Education in the USA to illustrate the writing skill approach. The Framework for Success in Postsecondary Education (Council of Writing Program Administrators (CWPA) et al., 2011; O’Neill et al., 2012) defines eight general habits of mind that refer to “ways of approaching learning that are both intellectual and practical and that will support students’ success in a variety of fields and disciplines” (CWPA et al., 2011, p. 1). One of the habits of mind worth highlighting is metacognition, which is defined as “the ability to reflect on one’s own thinking as well as on the individual and cultural processes used to structure knowledge” (CWPA et al.,

220  Automated analysis of writing

2011, p. 1). Other habits of mind include: openness, curiosity, engagement, persistence, creativity, responsibility, and flexibility. The Framework for Success also defines five areas of knowledge or skill related to writing (CWPA et al., 2011, p. 1): • Rhetorical knowledge –​the ability to analyze and act on understandings of audiences, purposes, and contexts in creating and comprehending texts; • Critical thinking –​the ability to analyze a situation or text and make thoughtful decisions based on that analysis, through writing, reading, and research; • Writing processes –​multiple strategies to approach and undertake writing and research; • Knowledge of conventions –​the formal and informal guidelines that define what is considered to be correct and appropriate, or incorrect and inappropriate; and • Abilities to compose in multiple environments –​from using traditional pen and paper to electronic technologies. Writing ability in the above framework is understood rather broadly as it is seen to be closely related to learning, thinking and the other language skills such as reading. Furthermore, writing ability comprises both skills that can be seen in the products (i.e., texts) of writing and skills that relate to the writing process as discussed in Chapter 6. To understand the possibilities and limitations of automated evaluation in more detail, Deane (2013a, 2013b) refers to an analytical framework developed by Educational Testing Service for their CBAL (Cognitively Based Assessment of, for, and as Learning) approach, which links assessment and teaching (Bennett, 2011; Bennett & Gitomer, 2009). Similar to the above Framework for Success, the CBAL approach emphasizes the close connections between reading, writing, and critical thinking, and defines the components of writing ability (Deane, 2013a, 2013b). CBAL is socio-​cognitive in nature “since it pays attention not only to the cognitive elements –​skills, knowledge, and strategies –​ that enter into writing, but also to the social contexts and purposes for which it is deployed” (Deane, 2013b, p. 299). The writing framework in the CBAL approach covers several levels ranging from the print level (e.g., decoding and transcribing) to the discourse level (e.g., planning and structuring) to the conceptual level (e.g., conceptualizing and rethinking) to the social level (e.g., situating, reflecting and engaging; for details, see Deane, 2013b, p. 301). Deane (2013a, p. 17) summarized what automated systems can do: “The features of state-​of-​the-​art AES [Automated Essay Scoring] systems almost exclusively deal with textual evidence … which address the text production abilities that enable writers to organize text according to some outline and

Automated analysis of writing  221

elaborate upon the points it contains, using appropriate, clear, concise and unambiguous language, in conventional orthography and grammar.” Deane (2013b, pp. 302–​303) exemplified the three types of abilities that automated scoring systems typically cover: Structure: paragraphs and transitions, thesis and topic sentences, transitional discourse markers, and cohesion across sentences; Phrase: vocabulary, sentence length and complexity, and stylistic choices; Transcribe: adherence to written conventions that govern grammar, usage, spelling, and mechanics. Deane (2013a) acknowledged that the current automated assessment systems can only indirectly capture many of the dimensions in the framework, particularly those related to the conceptual and social aspects of writing. He also acknowledged that the challenges are obviously greater with those features that are attributes of the writer (skill) than characteristics of the text (quality). Therefore, it is more appropriate to consider the features that existing systems evaluate as proxies of the dimensions of writing ability that we are interested in evaluating, or diagnosing, than as direct measures of those dimensions of ability. Furthermore, Deane (2013a) argued that we should not claim that automated systems measure the same traits as do human raters, even if their assessments correlate strongly. However, the same can be said about human raters: high inter-​rater agreement is no guarantee that the raters’ judgments are based on the same criteria (e.g., Harsch & Hartig, 2015). A reason why human and automated ratings often agree is that the development of a basic text production fluency and an ability to employ cognitively and socially more complex text generation strategies often happen in tandem (Deane 2013a, p. 18). Automatization of text production frees resources for planning, evaluating, and revising that increases a text’s appropriacy for a given purpose. Therefore, human raters focusing on these broader and challenging criteria may end up awarding the writers very similar scores as the automated systems that analyze only some of the features. The co-​development of the more basic and advanced writing abilities may also underlie the common finding that text length correlates with human ratings and computer scores. Weigle (2013, p. 43) hypothesizes that writers who manage to support main points with details are rewarded by raters for the good development of arguments. At the same time, such elaborations result in longer texts rewarded by scoring engines that measure the length of discourse units. The overall result would be a high correlation between human and automated evaluations even if the two were largely based on different analyses. Another contributor to high levels of human-​computer agreement may relate to issues in human ratings (Weigle, 2013). Holistic ratings may mask qualitative differences across different dimensions of writing as the raters have to balance the strengths and weaknesses of the text when deciding on one

222  Automated analysis of writing

overall score (see Chapter 7). Even if analytic rating rubrics were used, raters may fail to assess each dimension separately. It should be noted, however, that not all studies have found high correlations between automated scoring systems and human raters (see S. Liu & Kunnan, 2016, for a review). The way the algorithms in the automated scoring systems are trained may affect how closely they match human ratings. Assuming that the criteria used by human raters are not radically different from the features that the automated system can analyze, it makes sense to expect that the more extensive the training materials (i.e., human-​rated texts) are and the more consistent the human raters are, the higher the match between human and machine scores is likely to be. Implications for diagnosing SFL writing. The current automated evaluation systems address the text, the product of writing, and focus on how writers organize their texts and discuss their content by using appropriate language and conventional orthography and grammar. Writers’ skills related to metacognition, critical thinking, and the writing process in general, can be evaluated only indirectly (see Deane, 2013a). This may seem somewhat disappointing, but we should bear in mind that the same limitation applies to human raters, too; they cannot assess the writing process simply by investigating completed texts. Diagnosing the writing process requires that the rater –​human or computer –​can observe those processes as they unfold or as they are reported by the writer afterwards. Another insight from distinguishing text quality and writing skill concerns the generalizability of the information obtained from analyzing texts. The information and, thus, the diagnosis may need to differ across different genres and purposes of writing. One of the concerns in high-​ stakes assessment, namely the correlation between automated and human ratings, may have to be viewed differently when the focus is on diagnosing writing ability rather than awarding holistic grades. In diagnostic assessment, it is particularly important to know which criteria the rater –​human or non-​human –​uses, so that meaningful feedback could be given. What tasks are suitable for automated writing evaluation?

Writing tasks used in automated evaluation need to be designed by following the same careful principles as tasks assessed by human raters (see Chapter 5), including ensuring their appropriateness for learners’ proficiency level, cognitive maturity, background knowledge, and social context. Taxonomies such as the Triadic Task Taxonomy (see Robinson, 2007), which elaborates on the cognitive, interactive, and learner factors involved in tasks, frameworks such as the CEFR and task design principles from the language testing literature are helpful also in designing diagnostic writing tasks.

Automated analysis of writing  223

In Chapter 5, we discussed the usefulness of direct vs indirect writing tasks as diagnostic measures and made the point that if the focus of diagnosis is the learners’ ability to perform real-​world writing tasks, then direct writing tasks should be used. Automated systems allow for the use of direct writing tasks, thus avoiding indirect multiple-​choice or gap-​fi ll tasks that are not optimal for diagnosis in all contexts (see Chapter 5 and Heck & Crislip, 2001, study). However, if the interest lies in obtaining information about the component skills (e.g., linguistic features) of writing, then indirect tasks may also be useful. The development of automatically scored direct writing tasks in high-​stakes examinations is rather complex (see the “What happens in automated writing analysis” section in this chapter). Typically, the scoring algorithms are trained on human-​rated texts that are responses to exactly the same or very similar task prompt. When it comes to automated writing evaluation systems that provide learners with feedback, more varied tasks can be used. However, with AWE tools, too, the range and type of tasks is often limited (see the “Examples of automated writing evaluation systems” section in this chapter), presumably because the accuracy of feedback is higher if the system ‘knows’ about typical responses to specific task prompts. For example, Warschauer and Grimes (2008) noted that MY Access! only supported essays written for certain prompts, and if learners wanted to write about something else, they could not use the tool. Direct writing tasks such as an essay or a story and indirect multiple-​choice tasks represent the two ends of a continuum in the directness of writing tasks. Between these extreme points, there are different semi-​d irect tasks that require the writer to input only short pieces of text ranging from single words to whole sentences or sets of related sentences. The scoring of such tasks may just involve simple pattern matching. This is the case with DIALANG, in which written responses to gap-​fi ll writing items are scored against a list of accepted responses. However, tasks requiring clause-​or sentence-​length responses need to be analyzed by resorting to more complex NLP/​A rtificial Intelligence-​based analyses. Quixal and Meurers (2016) describe in detail how the development of such NLP-​based, limited response writing tasks can take place. Their task was a set of specific questions that learners responded to in writing. The questions related to a common context (e.g., a course the learner wanted to attend) and required the learner to describe, explain and provide greetings and introductions to their texts. The system gave feedback on communicative (e.g., appropriateness of their responses) and linguistic aspects of writing (see also work on the automated analysis of short-​a nswer or constructed response items by, e.g., Attali et al., 2008; Hermet et al., 2006; Leacock and Chodorow, 2003). We should bear in mind that even if current AWE can work with direct writing tasks, automated analyses may only indirectly measure writing skills. In other words, what they quantify in the texts are often just proxies of the

224  Automated analysis of writing

qualitative aspects of writing that we are usually interested in evaluating. Deane (2013a) mentioned argumentation and quality of content as examples of more abstract qualities that are often measured by rather simple quantitative indices that reflect, for instance, the writer’s choice of vocabulary and variation in it. It should be noted that there is another level in the degree of directness in writing tasks. This concerns the divide between writing products and processes. Most current AWE systems analyze only completed texts and, therefore, inferences about the writing processes must be indirect. At present, we know rather little about which such inferences might be warranted. The takeaways for task design for diagnosing SFL writing through automated analysis are that the writing tasks used in automated systems need not be different from those that human raters evaluate –​both can work with direct writing tasks such as essays. However, the accuracy of automated diagnosis and feedback is likely to be higher if the new writing task to be used in diagnosis is very similar to those that the AWE system was originally trained with. AWE approaches can also be applied to the analysis of writing tasks that require responses limited to a few words or sentences. Such tasks may be useful if the diagnosis targets specific features of the writing which are often surface features of the text but can sometimes relate to social aspects of writing such as when the writer has to provide an appropriate salutation for a letter. Automated feedback

Chapter 9 addresses diagnostic feedback in general, but here we focus on the characteristics of automatically generated feedback based on Hattie and Timperley’s (2007) framework of feedback levels relating to agency, delivery, focus, timing, and requested response. To that end, Hattie and Timperley’s (2007) argued that effective feedback should answer three questions • What are my goals? • What is my current performance in relation to those goals? • What should I do next to move towards the goals? Hattie and Timperley divide feedback into four partly overlapping levels: task, process, self-​ regulation (metacognition), and self. Task feedback informs learners about task completion; it is usually about the correctness of response, or how performance matches set criteria. Both the process and self-​regulation level feedback aim at increasing the learners’ deeper understanding of the task and their performance on it and, thus, improving their ability to perform on other, different and possibly more demanding tasks. Both process and self-​ regulation are about (writing) strategies. Process-​ level feedback concerns

Automated analysis of writing  225

cognitive processes related to memory, learning, and processing of information. Self-​regulatory (or metacognitive) feedback has to do with how learners control their behaviour (e.g., writing), and how committed and confident they are about their abilities. The final level of feedback targets the learner as a person (self ) and may contain positive or negative evaluation of the learner and/​or their performance (e.g., Well done!). How does the feedback from AWE systems line up with the above levels? AWE tools clearly focus on the task (level). As a result, their feedback concerns learners’ performance in the task at hand. Typically, these tools analyze completed texts and very rarely the underlying processes, although the latter are, in principle, analyzable with keystroke-​ logging, for example. Indeed, as Deane (2013a, 2013b) concluded, current AWE applications evaluate text quality rather than writing ability, and the inferences about the latter are made almost entirely based on evidence on the former. Diagnostic feedback should go beyond the task at hand and include the other feedback levels, particularly those that concern the cognitive and metacognitive processes and strategies since that helps learners to engage with new tasks. Such feedback addresses the questions What next? and How to get there? and helps learners to decide what they could do, given their current profile of strengths and weaknesses, to move forward. Seen from the perspective of the diagnostic cycle introduced in Chapter 1 of this book, such advice focuses on the actions that should follow feedback so that diagnosis as a whole would have the intended positive effect on learning. Advice to learners about which strategies they should use cannot be directly extracted by the AWE system from its analysis of learner texts. Such cognitive and metacognitive feedback is instead based on the interpretations by test designers of the meaning of the textual (or process) analyses carried out by AWE algorithms. Feedback that is more motivational in nature and targets the learners as persons (self level) can be added to automated feedback as a separate element. A simple example of such encouragement could be the use of a smiley shown to the learner as a recognition of a correct response. It should be added that advice on action and strategy use can also be given by systems that do not involve automated analyses of texts but use indirect writing measures. For example, DIALANG comes with a variety of metacognitive advice and information. Finally, it should be noted that turning automated analyses of learner performances into pedagogically useful feedback is no simple task, regardless of the feedback level. For example, Roscoe et al. (2014, p. 40) admitted that developing feedback based on the automated analysis of texts is “challenging as there is little empirical research on how to design and implement formative feedback based on automated assessments”. However, advances in automated performance analyses also entail advances in the types and usefulness of the

226  Automated analysis of writing

feedback that can be given to the learner (see Burstein et al., 2020, for a review of current developments in automated feedback). We next turn to the key aspects of feedback that need to be considered in the design and evaluation of (automated) diagnostic feedback: agency, delivery, focus, timing, and requested response. Agency. Ostensibly, the AWE system is the provider of feedback and, thus, is the main agent. However, the situation is complex because the system is obviously designed by human agents who are, in fact, indirect agents in this matter (see, e.g., Huhta & Boivin, 2023). Furthermore, in educational contexts, the teacher may get involved in mediating or interpreting feedback for the learners and assume some agency in that process. In fact, one of the issues with computerized feedback is that it may not be actionable without instructor support (e.g., Huhta, 2010). Furthermore, some AWE systems are designed to combine automated and teacher feedback, such as Writing Pal. Learners, too, can have some agency in using AWE feedback, if the context allows them to decide when and how to use the AWE system. Obviously, the learners themselves decide if and to what extent they study and act on the feedback they receive. Delivery. Automated feedback is delivered in writing, although in principle the system could give the same information to the learner orally. It is also possible to provide feedback as pictures or graphics, or by using different colours to draw learners’ attention to specific points in their texts, as is done in the IADE and Roxify systems, for example. For feedback to be memorable and understandable, it may be best to combine different delivery formats so that learners can choose which one(s) to use, which, thus, also increases their agency. Some research on diagnosing SFL reading suggests that verbal feedback alone is not as effective as a combination of formats and that feedback given through graphics is remembered longer than verbal feedback (Dunlop, 2017). Focus. AWE feedback can focus on a variety of aspects of writing, similar to feedback based on human ratings. Using the Hattie and Timperley (2007) model, feedback can concern learners’ goals, present states, or future actions, and target the task, process, metacognition, or self levels. Typically, automated analyses and feedback focus on the task level. Depending on the level of interest, feedback can be direct and focus on error correction, as happens in task-​level feedback, or it can be indirect, guiding the learners’ use of strategies, as in metacognitive feedback. For diagnosis, a wider range of feedback is usually preferable to a narrow focus. Timing. The instantaneous nature of AWE feedback is its main strength compared to human feedback. Immediate feedback makes its utilization more likely than information given after a considerable time lag. Immediacy can mean more than one thing. Probably the most typical scenario is that automated feedback is delivered to learners after they have completed the text. However, computer programs can give feedback in the middle of text generation while

Automated analysis of writing  227

the writing process is unfolding. In fact, this is what the spelling and grammar checkers implemented in text editors do. Research on automated writing feedback indicates that learners often greatly value the immediacy of the information (e.g., Liao, 2016; Zhang & Hyland, 2018). Expected response. This aspect of feedback refers to whether learners are expected to react to the information they receive by modifying their text in some way. Whether this is required or not depends on the approach to writing. In the pedagogical approach called process writing, learners produce one or more new versions of their texts based on the feedback (Chapter 6). In other cases, revising writing is voluntary and depends on the learners’ interests and approaches to study. Some AWE systems discussed in this chapter allow and even encourage the submission of revised versions of the original text. From the diagnostic assessment perspective, acting on feedback by revising one’s text should be more effective as it completes the diagnostic cycle. Usefulness of automated writing evaluation

The effects of automated evaluation and feedback on SFL writing are moderated by several factors. Here, we focus on the validity and reliability of the evaluation and feedback learners receive, and their impact on learning. A key factor in the validity of automated evaluation of learners’ writing abilities is their familiarity with computers or other devices with which they write. This requires computer and typing (or similar) skills, and weaknesses in these skills will affect the speed at which learners can generate text. Also, their higher-​order processing may be affected by their computer skills (see Chapter 6). Unfamiliarity with the required input device may not only affect the scoring of learners’ writing, but may also yield insufficient, biased, or distorted diagnostic information about the learners’ writing abilities. While there are plenty of studies on the role of computer skills in computerized testing and comparisons of test-​taker performance on paper-​based vs computer-​based writing tasks based on holistic ratings of performances (e.g., Chan et al., 2018), studies on the effect of computer skills on the diagnostic information obtained from learners’ performances seem to be lacking. Computer-​based diagnosis of SFL writing might be enhanced if information about learners’ familiarity with using the particular input device could be collected as part of the diagnostic process. The second validity issue with AWE concerns the scope of writing that the systems can measure in the first place. Most AWE programs focus on the linguistic word, sentence, and paragraph level features, possibly covering some text-​level discourse aspects like logical connectors, but rarely going beyond that (Deane, 2013a; Strobl et al., 2019). Obviously, AWE feedback is likely to impact only those aspects of writing that the system can analyze. Current AWE

228  Automated analysis of writing

applications are, therefore, somewhat limited in what they can hope to impact in the first place. The main reliability issue with AWE systems concerns the accuracy of error detection and other analyses they perform. First, SFL performances pose serious challenges to automated systems because of the errors and unexpected formulations that learners produce. This makes the basic linguistic analyses such as word recognition and parsing challenging. Hempelmann et al. (2006) compared four part-​of-​speech taggers/​parsers and showed that the best of them at the time, the Charniak parser, which underlies many Coh-​Metrix analyses, achieved 89% accuracy on L1 English expository and narrative texts. Crossley and McNamara (2014) estimate that the parsers are less accurate with learner English. There appears to be no research comparing the accuracy of automated analyses of L1 and SFL writing, but several studies investigating computerized and teacher feedback indicate that automated evaluation systems are far from perfect. Dikli and Bleyle (2014) compared teacher feedback with feedback from Criterion and found teacher feedback on grammar to be clearly more accurate than Criterion feedback: the automated system identified a total of 94 errors compared with 570 by the teachers, with an accuracy rate of 63% vs 99%, respectively. For mechanics, Criterion achieved 50% accuracy against over 99% of the teachers. For usage errors (e.g., articles, prepositions, word forms), the corresponding accuracies were 43% and 100%. Furthermore, Criterion failed to identify some quite common error types that the teachers spotted (e.g., singular/​plural, verb tense; see also Chapelle et al., 2015; Link et al., 2014; Ranalli et al., 2017). Liu and Kunnan (2016) studied the use of the WriteToLearn system by Chinese university students majoring in English. They discovered that the system was more consistent, but also more severe in its scoring than the four trained human raters involved in the study, and that there were some essays that it could not score. The precision of error feedback was only 49%, which failed to reach the recommended threshold of 90% set by Burstein et al. (2004). The system failed to recognize many word choice, preposition, and article errors typical of English language learners (S. Liu & Kunnan, 2016, p. 86). In a study of another AWE program, MY Access!, Hoang and Kunnan (2016) obtained somewhat similar results as the system achieved only 73% precision in error feedback and only moderate correlations existed between the automated scores and human ratings. Despite the issues with automated evaluation systems, they have been reported to have beneficial effects on both learners and teachers. This research has addressed a variety of topics, particularly how the systems are used, but also how useful learners perceive them to be, whether and how learners revise their texts, and how using the systems impact text quality and/​or writing ability.

Automated analysis of writing  229

A good example of research focusing on how AWE affects text revision is the study of Criterion by Chapelle et al. (2015), who analyzed revised texts written by learners of English (see the “Examples of automated writing evaluation systems” section in this chapter). They found that about half of the Criterion feedback led to revision, which indicates that the feedback affected learners’ writing process and possibly their texts, too. The researchers could not study the students’ reasons for not acting on certain feedback, but they believed the main explanation was its inaccuracy. Most revisions concerned adding, deleting or changing a word or part of a phrase. Sentence level changes were quite rare. About two-​thirds of the errors Criterion identified correctly resulted in revisions which almost always resulted in the correction of the problem. The researchers did not examine the impact of the feedback on text quality or learners’ writing skills, but the number of successful revisions described above suggests that the quality of at least some of the texts improved. The study by Dikli and Bleyle (2014) on Criterion which demonstrated that its error feedback is often inaccurate (see above) also addressed learners’ perceptions of its feedback. The findings showed that the learners found Criterion feedback on grammar, usage, and mechanics helpful. Interestingly, the students also consulted other sources of information frequently, particularly their grammar and writing books, but also Criterion model essays and their teachers. Another study with Criterion and MY Access! (Warschauer and Grimes, 2008) found that both teachers and English-​speaking teenaged students considered their feedback useful. However, the study also demonstrated issues in the general use of the systems: the tools were used rather infrequently, and most students did not submit a revised version of their text to the system, or when they did, their revisions almost always focused on mechanics rather than content or style. Another study on Criterion (Li et al., 2015) addressed the use and perceptions of the system and its impact on text quality. The researchers studied both teachers and English language learners at university level. The results showed that the teachers’ feedback strategies changed as they used Criterion in their classes: over time, they started focusing on giving feedback on content development and textual organization and left most of the feedback on grammar and mechanics to Criterion. The teachers mentioned the inaccuracy in some error feedback to be somewhat problematic, but they also reported occasional discrepancies between the Criterion overall score and the teacher’s score on the same texts to cause frustration for both the learners and teachers. Li et al. (2015) also found that the learners were satisfied with the AWE corrective feedback and that it also motivated them to use other resources to make the corrections. Criterion feedback on organization and content received mixed evaluations; the less experienced learner-​w riters found even the somewhat limited feedback on organization by Criterion rather useful whereas the more advanced learners did

230  Automated analysis of writing

not. The researchers also discovered that, depending on the paper the students had to write (4 papers were required), between 44% and 72% of them submitted a revised text to Criterion, which may have been due to these adult ESL learners being rather motivated to improve their writing in English. They contrasted this finding with that by Attali (2004), whose teenaged English L1-​speaking informants rarely revised their texts with Criterion. Finally, the researchers reported a significant decrease in errors in the students’ writing but stated that this was likely to be at least partly a result of an extra motivational factor of the educational setting: the students had to achieve a particular overall Criterion score for their texts before they could submit them to the teacher for grading and comments. We conclude this overview of the impact of AWE systems with a study of a tool that analyzes textual organization and functions. Chapelle et al. (2015) investigated the impact of the Intelligent Academic Discourse Evaluator (IADE) with international graduate students taking a writing course at a US university. IADE (see the “Examples of automated writing evaluation systems” section in this chapter) gives feedback on the structure of the introductory sections in research reports. The feedback comes in the form of colour codes applied to the writer’s text that indicate the amount and placement of three conversational moves of typical introductions so that the writers can see if they have paid enough attention to each move, that is, establishing the territory, identifying a niche, and addressing the niche. All students reported that the colour-​coded feedback helped them focus on the discourse /​functional meaning at least to some extent, and most said it helped them improve their writing. Those who were unsure or thought the feedback did not help them revise reported not being able to act on the feedback that only indicated where they had failed to communicate the intended functional meaning without providing recommendations as to how to do that. Chapelle et al. (2015, p. 402) summarize a key effect of IADE feedback by stating that “noticing a mismatch between intended and expressed meaning appeared to lead to reflection on functional meaning, and that making connections between functional meaning and lexical choice led to the construction of new discourse meaning”. Understanding how lexical choices relate to how one conveys meanings is obviously an important aspect of the writing ability. However, the researchers also noted that some students seemed to think that revising certain vocabulary was all they needed, thus adopting a very limited revision strategy. Implications for diagnosis. AWE analyses and feedback (particularly on errors) have often proved inaccurate, meaning that the use of these tools for diagnosis has to be considered carefully in each context. Most AWE tools focus on the grammatical, lexical, and mechanical features of texts at the phrase, clause, and sentence levels, and when the diagnosis focuses on these, AWE can be quite useful, as studies on learners’ revisions indicate. Many learners

Automated analysis of writing  231

regard automated tools positively, which is obviously good, as it may motivate them to use the tools. However, AWE systems need to be used regularly and, importantly, they need to be used as part of the text revision process, which is likely to improve both learners’ texts and their writing skills. Future developments in automated writing evaluation systems

Future AWE systems are likely to add new (sub)constructs and new sources of evidence to the models, and to develop new measurement methods (Deane, 2013b). These developments will obviously broaden the scope of automated diagnostic assessments and feedback. New constructs may relate to reasoning at the conceptual level and use of source materials at the discourse level of writing. As Deane (2013a) noted, the current systems do not cover writing abilities at the conceptual level, and the discourse-​level analyses focus on the structure and coherence of the text but do not tackle the writers’ use of sources such as reading or listening materials. The increasing use of integrated reading/​listening-​into-​w riting tasks in language assessments obviously makes the development of such capabilities ever more important. Developing approaches to analyzing abilities that relate to the social level of writing is even more challenging. Deane (2013a, p. 307) states that the analysis of these new constructs requires significant advances in NLP and AI technologies. A challenge in the evaluation of argumentation, for example, is that effective argumentation tends to be genre specific (Deane, 2013a, p. 308), which means that the analyzed features need to be weighted differently depending on the genre. The second direction that Deane (2013b) predicts for future systems is to expand the sources of evidence about writers’ abilities. Until now, this evidence has been based on the completed texts, meaning that the automated analyses inform us about the quality of the texts rather than learners’ writing ability. Deane proposes that future systems could also tap the writing process, which is more informative of the writers’ abilities and would extend the range of writing skills measured considerably. He further proposes that keystroke-​logging be used for capturing the writing process. Recent research has indeed investigated if some features of the writing process that keystroke-​logging can capture might be useful for assessment (Deane & Zhang, 2015; Guo et al., 2018). These studies have explored the relationship between the process features and human ratings as well as their stability vs variation across different testing occasions, topics, and genres. The third road forward concerns developing new measurements of the writing ability through combining improvements in NLP and theoretical advances in defining the constructs of interest, as exemplified, for instance by the CBAL framework (Deane, 2013b). This may simply mean a more accurate

232  Automated analysis of writing

and comprehensive measurement of the skills already covered by the automated systems. However, this may also mean extending the scope of measured constructs to those that belong even to the social level of writing such as the writer’s stance and the tone of the text. This requires the development of valid and reliable NLP analyses of the new constructs. Research by Zhao (2013, 2017), Yoon (2017) and J. Lim (2019) on authorial voice paves way for exactly this kind of extension of constructs. Zhao (2013) first developed an analytic voice rubric from Hyland’s (2008) voice model. Zhao (2017) investigated and established a relationship between analytical voice strength ratings and holistic ratings of text quality in argumentative essays. Yoon (2017) used the automated Authorial Voice Analyzer to quantify the lexico-​g rammatical features considered important for authorial voice based on Zhao’s research and investigated if they relate to holistic human ratings of voice strength in argumentative essays. Authorial Voice Analyzer (AVA) calculates the occurrences per 1,000 words of expressions that represent the interactional metadiscourse categories proposed by Hyland (2005), viz. attitude markers, boosters, directives, hedges, reader pronouns, and self-​mentions, and combines them into indicators of stance and engagement. Yoon found that the quantity of self-​ mentions combined with a variety of boosters (e.g., without doubt) and attitude markers (positive or negative opinion words, words expressing emotions) explained 26% of variance in the ratings of voice strength. Yoon proposed that the role of linguistic and discourse-​level features (e.g., lexical and syntactic sophistication, coherence, idea development) in voice should be investigated further, for example, with Coh-​Metrix and TAALES. In a different study, J. Lim (2019) used Coh-​Metrix and Authorial Voice Analyzer to compare essays that had received discrepant human ratings and found that authorial voice was one of the features that explained the divergent scores, both when measured quantitatively by AVA and evaluated qualitatively through rater interviews. In Chapter 7, we discussed how teachers who know their students’ linguistic backgrounds are often able to diagnose certain errors as being caused by the students’ first language. None of the AWE systems we know of can take learners’ L1 background into account in its analyses. One future direction that these systems could explore, then, is how learners’ L1, and other background information, might be used to fine-​t une the automated diagnosis. One further direction for the development of AWE systems concerns increasing the users’ –​teachers’ and learners’ –​agency by allowing them to decide what exactly their tool should analyze and report as feedback. By default, the current systems provide users with a fixed set of information. Although learners can choose what feedback to attend to, the amount of information can be overwhelming –​which may be one of the reasons why many learners require the teacher to help them interpret, and prioritize, the feedback.

Automated analysis of writing  233

There are many reasons why AWE systems should be configurable by users. First, feedback should be rationed into manageable portions. Secondly, being able to turn on or off specific analyses and feedback would contribute to the individualization of diagnosis, the lack of which is regularly complained about by learners in research on diagnostic assessment. Such flexibility would allow users to decide which focus and level of granularity of the feedback would be most helpful for particular learners at particular stages of learning. It would also allow the teacher to link the constructs assessed by the AWE tool more closely with the curriculum, course or teaching materials. Even if we conceive diagnostic assessment building on SLA and other theories rather than on specific curricula, diagnosis in educational contexts is always influenced by the educational program of the institution (see Chapter 7 for an example of diagnostic rating scales based on a curriculum). Therefore, adjusting AWE for a particular curriculum may improve its diagnostic effectiveness in practice. The third benefit of more flexible AWE systems would be affective. Being able to modify the system may increase the feeling of being in control of one’s learning (and teaching) and increase motivation. Configurable AWE comes with some challenges, however. Configuration decisions should ideally be well informed. It may be, for example, that tradition and a narrow view of language learning may lead to the user enabling only task-​ level feedback focused on spelling and grammatical accuracy while disabling the functionalities that relate to metacognitive strategies. For specific writing tasks and phases of the writing process, such a decision may be entirely justifiable, but as an overall approach that would not be advisable, given what we know about writing development and useful diagnosis. Thus, meaningful use of AWE requires sound pedagogical knowledge from the teacher. As far as the learners are concerned, the better aware they are of the goals of learning and criteria for success (see Sadler, 1989) the more likely they are to benefit from using an AWE system. Designers of automated writing analysis tools may also try to provide their users with more research-​based advice on the use of their systems, for example, by providing examples of different options on how to sequence the provision of different kinds of feedback across different writing tasks or different versions of the same text. What are the characteristics of automated systems that are diagnostically useful for developing SFL writing ability? Roscoe et al. (2014, pp. 39–​40) provide us with some pointers. First, interdisciplinary collaboration is needed for designing pedagogically meaningful systems, including contributions from, for example, education, psychology, writing research, linguistics, and computer science (see also Quixal and Meurers, 2016). More specifically, the account by Roscoe et al. (2014) of the development of Writing Pal suggests that pedagogically useful systems consider writing as a complex process, and they cover the whole writing process from planning to revising. Such systems offer

234  Automated analysis of writing

different strategies for the learners to address the issues they face during those stages. Secondly, the systems should support sustained practice: improving writing requires engagement in various writing activities frequently enough. As an activity, writing takes time and it can become tiring and boring, particularly if the activities are mechanical and repetitive. The solution that Writing Pal uses is gamification, embedding writing tasks and strategy practice in educational games. The third factor that Roscoe et al. (2014) consider important for useful tutoring systems is individualized formative feedback. We conclude by reminding that AWE systems have significant limitations when they are used as the sole provider of feedback and guidance. Both weak and strong SFL learners would benefit from using AWE with other sources of support. Research reported in this book indicates that some learners are likely to need somebody to interpret the feedback and assist in deciding what to do next. The studies reviewed here reported that learners often use other sources besides the AWE tool to help them revise their texts, including different reference materials such as writing guides, as well as the teacher. The studies also point out that the way teachers integrate AWE systems into their courses matters, particularly when it comes to how often the tools are used. Particularly important for the benefits of AWE tools to materialize seems to be that learners work through two or more versions of their texts, revising them with the help of automated feedback. Therefore, the implementation of automated writing evaluation systems seems particularly promising when the process writing approach is used to teach writing.

9 THE ROLE OF FEEDBACK IN DIAGNOSING SFL WRITING ABILITY

Introduction

In this chapter, we focus on the feedback part of the diagnostic cycle, as well as cover the action stage of the cycle. We bring together different threads, most of which were discussed elsewhere in this book, but from other angles, including the who of diagnosing, that is, the agents involved in giving and receiving feedback (see Chapter 4), the how, namely the feedback on the products and the process of writing (see Chapters 4 and 6), and the directness and the timing of the feedback. We draw upon Hattie and Timperley’s (2007) seminal work and the sociocultural theory to conceptualize diagnostic feedback for learners. Feedback is an essential part of all teaching, learning, and assessment. The relationship between feedback and assessment depends on the purpose of the assessment (e.g., Kiely, 2018). There are assessments, such as a summative test at the end of schooling, where qualitative feedback is not very common. However, in all contexts where assessment aims to promote learning, including diagnostic assessment, the distinction between teaching and assessing becomes somewhat blurred. While learning from and by assessment can happen in a variety of ways (see, e.g., Purpura, 2021), it most often happens via feedback (Davison & Leung, 2009), which either follows assessment, or becomes part of it. This is particularly important for diagnostic assessment, which does not end with identifying learners’ areas of struggle, but aims at informing teachers and/​or learners about these problems in an actionable way. Useful feedback needs to incorporate advice on how to address these identified areas of struggle in order to link diagnosis to action. Therefore, feedback takes a central stage in the diagnostic cycle. Across the previous chapters, we have touched upon DOI: 10.4324/9781315511979-9

236  The role of feedback in diagnosing SFL writing ability

why feedback is important in diagnostic assessment, what factors should be considered when giving feedback (including learners’ inter-​and intrapersonal characteristics), and who the feedback is directed to (teachers or learners). We also discussed feedback provided by some diagnostic instruments and procedures, including examples of automated assessment systems. What we have not addressed yet is what makes feedback effective in diagnostic settings. A common setting when thinking of feedback on writing is a classroom assignment written by a learner and evaluated by the teacher. In that context, the learner receives feedback from the teacher, often as corrections and comments, and possibly a grade. The learner may then attempt to improve the quality of the text by taking the teacher’s feedback into account. This feedback can take many forms depending on the reviewer/​reviser’s understanding of the writing task, the focus of the review/​revision, and the writer’s expertise (see Chapter 6). The reviewer’s focus can have two broad goals: (1) it can aim to help learners to improve the quality of their texts, or (2) it can aim to develop learners’ writing skills. For the former, text features will be the focus of evaluation, followed by corrective feedback, which could entail underlining mistakes, directing learners’ attention to areas of difficulties, correcting errors, or proposing to rewrite pieces of the text. This kind of corrective feedback, however, is not what we understand as diagnostic feedback, although it can be part of such feedback. For a true diagnosis, the diagnoser needs to establish the learner’s areas of struggle, understand the underlying reasons for the problems, and find ways forward to overcome these problems (see also Alderson, Brunfaut et al., 2015; Harding, et al., 2015; Jang & Wagner, 2014). Feedback becomes diagnostic when these aspects are communicated to learners in a way that they can act upon the feedback and take remedial action. We first review the literature to establish what is known about effective feedback in writing. Here, we draw on research on feedback more generally and apply relevant findings to diagnostic feedback. We begin with an influential feedback framework proposed by Hattie & Timperley (2007) to establish a sound conceptual basis for feedback in the classroom. We then summarize the literature on effective feedback for the aspects of agency, delivery, focus, timing, and requested response, before we turn our attention to technologically supported and automated feedback. Next, we discuss the contributions that sociocultural approaches can make to diagnostic feedback. This is followed by an analysis of feedback provided by some of the diagnostic instruments and approaches that we discussed so far (see Chapters 4, 5, and 8). Here, we also discuss the role of the CEFR scales for feedback in relation to instruments such as DIALANG, VERA8, and Roxify. In the concluding part of the chapter, we summarize our view of the desirable characteristics of diagnostic feedback on writing.

The role of feedback in diagnosing SFL writing ability  237

Conceptualizations of feedback

Definitions of feedback vary across fields, but what is common is the understanding of feedback as information. In SFL education, the predominant view of feedback as information on incorrect language use seems to have been sparked by Corder’s (1967, reprinted in 1981) seminal paper on error correction. A similar understanding of feedback as “any indication to the learners that their use of the target language is incorrect” is expressed by Lightbown and Spada (1999, p. 171). Besides the function of feedback as error corrective, there are at least two further important functions, namely (1) reinforcement and (2) incentive for improvement (e.g., Kulhavy & Wager, 1993). In a similar vein, Butler and Winne (1995) describe the functions of feedback as: information with which a learner can confirm, add to, overwrite, or restructure information in memory, whether that information is domain knowledge, metacognitive knowledge, beliefs about self and tasks, or cognitive tactics and strategies. (p. 275) We now turn to one of the most comprehensive models of feedback, viz. that by Hattie and Timperley (2007). It was based on a synthesis of meta-​analyses of more than 7,000 studies on the effectiveness of different forms of feedback. We draw on Hattie and Timperley’s model in the remainder of this chapter when we develop our understanding of effective diagnostic feedback on writing. Hattie and Timperley’s model of feedback

In their feedback model, Hattie and Timperley (2007) defined useful/​effective feedback as providing learners with answers to three questions: (1) “where am I going?” (goals), (2) “how am I doing?” (performance in relation to these goals), and (3) “where to next?” (action to make progress; see Figure 9.1). These three questions are related to what Hattie and Timperley call “feed up”, “feed back”, and “feed forward”, respectively. They chime well with our diagnostic cycle of establishing goals, diagnosing performance, and pointing out ways forward to close learning gaps. In addition to these three questions, Hattie and Timperley differentiated four levels of feedback, namely feedback on (a) the task, (b) processes underlying the task, (c) self-​regulation, and (d) the learners’ selves. Their feedback model is depicted in Figure 9.1. Feedback on the task performance is the most common type of feedback in the SFL classroom. It gives information on how well a learner has completed a particular task, for example, in terms of linguistic performance on a writing task, or how well a communicative goal has been achieved, and is largely associated

238  The role of feedback in diagnosing SFL writing ability

FIGURE 9.1  Hattie

and Timperley Feedback Model (Hattie & Timperley, 2007,

p. 87).

with corrective feedback. Feedback on the task performance can also include instructions if there is a lack of knowledge that led to the incorrect response. Hattie and Timperley (2007) reported that written feedback on the task performance is more effective than grades or marks alone. While marks serve an important function for accountability, their effectiveness can be enhanced by complementing them with qualitative feedback on task performance. At the process level, Hattie and Timperley (2007) subsumed feedback that aims at enhancing deeper-​level understanding, in order for learners to understand the processes required to complete the task and to enhance learners’ abilities to transfer these insights to other tasks. They (p. 93) define this level as feedback that “is more specific to the processes underlying tasks or relating and extending tasks”. Thus, cognitive processes, that is, processes related to memory, learning, information, and actions involving them, appear to be at the heart of what Hattie and Timperley mean by the process level. An example of such feedback can be feedback directed on processes involved in discourse synthesis as writers learn about the topic they are writing on (see Gebril & Plakans, 2013). Process-​level feedback is most effective when combined with goal-​setting, and when it provides cues aimed at mediating underlying concepts, processes, and strategies needed to understand and perform a task, including strategies for

The role of feedback in diagnosing SFL writing ability  239

error detection. Not surprisingly, such feedback is more effective for learners’ writing development, as it is argued to transfer to “other more difficult or untried tasks” (p. 93). This supports our view of feedback not just as a single action or a series of these directed by an expert to a novice, but a cyclical process aimed at creating a reciprocal link between what a learner currently thinks or knows and the knowledge, skills, and abilities actually needed to perform certain tasks successfully. The self-​regulation feedback –​or metacognitive feedback, as Hattie and Timperley also call it –​targets learners’ commitment, confidence, and control. This level of feedback is directed at how students can best monitor themselves in their movement towards achieving a goal. This feedback entails six elements: internal feedback, for example, self-​comments, self-​ assessment, willingness to put in effort, confidence, attributions about success/​f ailure, and seeking for help. Hattie and Timperley (2007) stress that feedback at this level needs to clearly attribute reasons for success or failure to performance, and not to learner personalities. Feedback directed at self-​ regulation, enabling learners to reflect on themselves and encouraging them to become agents of their own learning, has the potential for learner development. The process and self-​regulation feedback level in Hattie and Timperley’s framework may not be entirely distinguishable. They both involve the use of, and feedback focusing on, strategies, and they both target the writing process more than the finished product, the text. However, certain features can distinguish them: process-​ level feedback appears to be more direct (Hattie & Timperley, p. 96), in which it resembles task feedback, whereas self-​regulation feedback is more indirect and often consists of hints rather than direct answers or advice. Furthermore, process feedback appears to relate to cognitive processes whereas self-​ regulation feedback clearly focuses on metacognitive processes –​Hattie and Timperley, in fact, use self-​regulation and metacognition as synonyms. Because both process and self-​regulation feedback aim at strategies for improving performance, Alderson, Haapakangas, et al. (2015) merged them into what they called strategy feedback. Strategy feedback according to Alderson, Haapakangas, et al. (2015) targets cognitive, metacognitive, and social strategies (see, e.g., O’Malley & Chamot, 1990; Oxford, 1993; Purpura, 2014, for a detailed discussion of strategies). Briefly, cognitive strategies refer to manipulating the material in the task at hand, for example, asking learners to think about the meaning of a particular word in the context. Metacognitive strategies involve thinking about the processes more generally, for example, referring to planning, monitoring, or evaluating a text. Finally, social and affective strategies refer to interaction with others, such as seeking help, and regulating one’s own feelings. In practice, Alderson, Haapakangas, et al. (2015)

240  The role of feedback in diagnosing SFL writing ability

focused on the cognitive and metacognitive feedback and, thus, maintained the original distinction between the two levels. In this book, we do somewhat similarly but do not abstain from using the original Hattie and Timperley terminology and for clarity’s sake, refer to cognition and cognitive processes whenever discussing the process-​level feedback and to metacognition when focusing on the self-​regulation level. The final level in Hattie and Timperley’s framework is feedback towards the learner’s self, encompassing such aspects as praise, for example, “Great effort!” after the learner wrote an essay, or comments targeting the learner as a person, for example, “You can do better!” Hattie and Timperley found that while self feedback is commonly used in the classroom, in the meta-​ analyses it turned out to be the least effective. Indeed, self feedback has the least diagnostic information for the learner, as it does not identify what is strong vs weak about the performance, what could be done better, and how the performance can be improved. It has, however, as we will see below, motivational potential. It is important to note that the levels and questions in Hattie and Timperley’s framework are not meant to suggest that feedback can or should only address one level or one question at a time. Rather, this framework can stimulate reflection and help address all relevant aspects in feedback. It can also be used when analyzing feedback, for example, as a coding frame in research; here, multiple coding is possible since one and the same feedback, if effective, can and should address several levels or questions in an integrative way. With this framework as background, we now explore what is known about feedback in the domain of diagnosing writing. What is known about diagnostic feedback on writing

Diagnostic feedback on writing usually covers three main aspects: (a) communicating strengths and weaknesses of the product of writing; (b) communicating the effectiveness of the writing strategies employed during the writing processes; (c) improving writers’ general writing abilities by showing “the cognitive gap between the current performance and a goal” (Alderson, Haapakangas et al., 2015, p. 168), by explaining the origins of the diagnosed weaknesses, and by suggesting ways to overcome them. With reference to the diagnostic cycle, the feedback stage directs the ensuing action to be taken and ideally links it to previous teaching and learning goals (Kunnan & Jang, 2011; Harding et al., 2015). Learning goals are essential to consider in diagnostic feedback, as Alderson, Haapakangas, et al. (2015, p. 171) noted, since it is only when learners have a clear understanding of the goals that diagnostic feedback has the potential to promote learner development. Therefore, diagnostic feedback should show the gap between learners’ current performance and where

The role of feedback in diagnosing SFL writing ability  241

they are supposed to be heading, and define new learning goals, preferably broken down into sub-​goals (e.g., Kunnan & Jang, 2011). This should facilitate feedback intake and uptake, viz. the ensuing actions taken by the learners. If we think of diagnosis as a repetitive reiteration of this cycle, it makes sense to also take previous feedback and actions into consideration. This allows tracing how learners took feedback on board and how they enacted upon it, thus adding a longitudinal diagnostic perspective. We will return to this point in Section 9.5 when we discuss the sociocultural angle to diagnostic feedback. We now summarize the literature on (diagnostic) feedback on writing for the following key aspects: agency, delivery, focus, timing, and requested responses. This summary is based on a number of reviews of feedback research (e.g., Kulhavy & Wager, 1993; Révész, et al., 2022) as well as on our engagement with the feedback research on SFL writing, which we elaborate on in the following. We take into consideration learners’ cognitive maturity, their beliefs about learning, their language proficiency, their writing proficiency, the language learning context (L1 or SFL), and different institutional settings. Agents

The most obvious agents who provide feedback in the classroom are the teachers. Their role is not only to set goals (ideally together with students), explain expectations and requirements, plan, and organize teaching activities and writing tasks that lead to exposure and experience for the learners. Teachers also provide feedback on task performance and learner development, and they monitor uptake of feedback, revision, and improvement. For teacher feedback to be effective, the context of writing instruction and diagnosis needs to be set in a transparent and motivating way and feedback aligned to instruction so that students better understand what is required from them (Lee & Coniam, 2013). This agrees with our diagnostic cycle, where goals, performance analysis, and feedback are aligned. Regarding the question whether teacher feedback is always the most effective, Biber et al. (2011) found in a synthesis of 306 studies on the effectiveness of feedback for writing development that teacher feedback was more effective for L1 writers than for L2 writers, for whom peer feedback (and computer feedback, as these were studied together as “other”-​feedback) turned out to be more effective. For the L1 learners, the mean effect size for the teacher feedback was large whereas the mean effect size of other-​feedback was quite small. For L2 learners, the picture was somewhat reversed: the mean other-​feedback effect was large while the mean teacher-​feedback effect was still moderate in size. Peer feedback is perhaps the second most used approach. The most obvious benefit of peer feedback is learner autonomy and engagement in both providing

242  The role of feedback in diagnosing SFL writing ability

feedback on each other’s texts and in reacting to the feedback from other learners (e.g., Hyland & Hyland, 2006). For peer feedback to be effective, training is needed, as Hyland & Hyland (2006) suggested. It seems that over time, the quality of peer feedback can improve, as K.-​H. Cheng et al. (2015), who compared peer and teacher feedback, suggested. While studies on the effectiveness of peer feedback yield mixed results, an early meta-​analysis by Hillocks (1986) showed slightly more positive effects for combined teacher and peer feedback than for teacher feedback alone. It appears that learners of all SFL proficiency levels benefit from giving and receiving peer feedback. Yang (2018) found that proficient EFL learners were more capable of providing and utilizing indirect feedback, whereas less proficient learners generally produced direct feedback, which their peers simply accepted. However, once engaged with more proficient learners, less proficient learners substantially improved their writing with indirect feedback. Yang attributes this effect to the negotiation of meaning with peers. Interestingly, development can occur regardless of how appropriate this feedback is perceived by the peers (Huisman et al., 2018). We can assume that a certain level of maturity is needed to provide such explanatory feedback and to negotiate the meaning of it with one’s peers. Yet even younger learners should benefit from peer feedback, which may perhaps focus on more concrete aspects. The pivotal characteristic here seems to be learner engagement in the evaluation and revision process, so that agency and autonomy is shifted towards them (Hyland & Hyland, 2006). This leads us to the learners as writers themselves and their role in the feedback process. Hattie and Timperley (2007) stress the importance of learners being able to generate feedback for themselves, for example, through developing their error detection skills, through monitoring their strategy use as well as monitoring their use of rhetorical, cohesion, and topical knowledge. Lam (2016) reports a synthesis of literature on portfolio-​based self-​a ssessment in university settings. It appears from this synthesis that the use of self-​feedback is productive only if it is complemented by teacher or peer feedback. For diagnostic feedback to be effective, this may mean that self-​a ssessment can be one component that needs to be complemented by feedback from other agents to be effective (e.g., Poehner, 2012). In addition to human agents, we should note that feedback is nowadays increasingly provided by automated systems that analyze learners’ writing and provide feedback to the learners themselves and in many cases to their teachers as well. It is also possible that some of the automatic feedback is mediated and interpreted for the learners by the teacher (see Chapter 8 and the “Evaluating feedback in diagnostic instruments” section in this chapter). What is important in writer agency, however, is the pivotal effect that learners’ agency and engagement have on the uptake of feedback, regardless of

The role of feedback in diagnosing SFL writing ability  243

who has provided it, as Zhang and Hyland (2018) pointed out. They found in a case study with two learners that learner engagement had a direct effect on the effectiveness of feedback across multiple drafting sessions. If learners do not take up the feedback, for example, because they simply delete problematic text passages, or do not react appropriately to feedback, perhaps because they do not understand it or do not have the knowledge to remedy the problem – feedback may lead to an improved text draft, but not to an improvement in writing ability. This finding is supported by Yang (2018), who found that less proficient L2 writers were able to react upon direct error-​correction feedback, but were not able to provide or react upon indirect feedback (see also the section on feedback focus below). They benefited, however, from negotiating feedback with their peers. With regard to feedback uptake and learner agency, Liao (2016), in a study with 63 EFL learners at a college in China, differentiated four learner types, namely goal setters, accuracy pursuers, reluctant learners, and late bloomers. While there is no tool available to diagnose such uptake types, it may be useful to bear in mind that different learners react differently to feedback; teachers who know their learners well may be in a good position to tailor the way feedback is delivered to learners’ uptake styles. Feedback uptake is also affected by learners’ conceptions of the role and aims of diagnostic feedback and their beliefs about the value of teacher and peer feedback (Hyland & Hyland, 2006; Kunnan, 1995). For instance, Doe (2015) found with 47 EFL learners in a Canadian university setting that their understanding of feedback as judgement hindered them from acting upon the feedback. Depending on learners’ socialization and prior experience, explaining and experiencing the formative role of feedback as learning support may be necessary so that learners can see the usefulness of and experience ways to act upon feedback. Learner uptake and their reactions to feedback are also intricately linked to the way feedback is delivered, as we will outline in the next section. Delivery

We now discuss the affective effect of feedback wording and presentation, as well as the effects of oral and written delivery of feedback. Drawing on Ellis’ (2008) typological framework, we then differentiate between direct and indirect feedback, and metalinguistic comments. While Ellis’ framework focused on corrective feedback (see the section below on focus), we widen our angle to include all aspects of diagnostic feedback. Computer-​based delivery of feedback is discussed in a separate section. Delivery and wording have an effect on students’ motivation and emotional reactions, which in turn influence uptake and actions (Hyland & Hyland, 2006;

244  The role of feedback in diagnosing SFL writing ability

Mahfoodh, 2017). Mahfoodh, for example, found with EFL university students that harsh criticism, negative evaluations, or too many comments on the first draft led to negative emotional reactions. Kluger and DeNisi (1996) found that feedback that presented little threat to learners’ self-​esteem appeared to be moderately effective whereas feedback that challenged it did not have any effect. In light of Hattie and Timperley’s (2007) findings that feedback which targets the self-​level is less effective, it seems that feedback that is perceived as threatening may be counterproductive. Hence, it is necessary to consider carefully the affective perception when wording feedback so that beneficial effects can be achieved. Negative feedback comments tend to be related with negative effects, while positive comments are associated with positive effects (e.g., Hillocks’ [1986] meta-​analysis). Feedback in the SFL writing classroom is provided in written or oral form; yet oral feedback (referred to as conferencing), seems to dominate in the L1 domain (e.g., Hyland & Hyland, 2006). This is supported by Biber et al. (2011), who found in their synthesis that oral feedback was more effective for L1 writing, while written feedback was more effective for L2 writing. When it comes to effective feedback delivery, one needs to take into account whether writing takes place in one’s L1 or L2. Furthermore, oral feedback delivery lends itself quite easily to the sociocultural concept of negotiating difficulties and underlying concepts with learners. We will take this up in more depth in the “Sociocultural angle to feedback” section in this chapter. When it comes to direct and indirect delivery of feedback, which some researchers classify as explicit and implicit (or overt and covert), research suggests that different delivery forms are suitable for different proficiency levels and age groups. Yang (2018) and Huisman et al. (2018), for example, found for university students that writers at lower L2 proficiency levels benefited more from direct corrective feedback, while more proficient L2 writers benefited more from indirect, evaluative, explanatory feedback on structure and style. Less proficient learners, nevertheless, benefited from negotiating the meaning of feedback with their peers, indicating that they are also able to react to feedback beyond the direct corrective type if it is discussed with them in an accessible way. This again chimes with sociocultural theory’s notion of mediation. Moreover, there are indications that indirect feedback can be associated with improvements in writing quality across different texts (Hyland & Hyland, 2006). Ferris (2006) recommended using primarily indirect feedback, at least for the focus on error correction, due to its longer-​lasting effects because, Ferris argued, it draws students’ attention to the sources of the errors more than direct correction does. Not all studies, however, point towards such differential effects of indirect and direct feedback on proficiency levels. A meta-​analysis of 21 studies on written corrective feedback to improve grammatical accuracy by Kang and

The role of feedback in diagnosing SFL writing ability  245

Han (2015) revealed no significant differences between direct and indirect, or between focused and unfocused feedback. Rather, they found that the more proficient in L2 the writers were, the more they benefited from corrective feedback. It may be that the narrow focus of the feedback on the correction of grammatical errors led to these outcomes. With regard to different writing genres and writer development in different genres, the meta-​analyses found such heterogeneous results that these could not be generalized. A different angle for a similarly narrow focus on error-​corrective feedback is reported by Mawlawi Diab (2015). When comparing two groups of about 20 EFL learners who received different types of feedback, the author found significant improvement on errors with the group that had received both direct correction and metalinguistic explanations (thus combining task and self-​ regulation /​metacognitive levels in Hattie and Timperley’s framework), but no effects for the group that had received only metalinguistic explanations. Interestingly, the ‘direct and metalinguistic’ group preferred the metalinguistic explanations, because they made them think about the errors, supporting the above point about engaging learners. This finding is also supported by a research synthesis by Biber et al. (2011), who found commentaries more effective than error location. These research findings illustrate the complex constellations of a multitude of intricately interwoven aspects that contribute towards effective feedback. Depending on the research design, some aspects are considered while others are ignored. Hence, different studies may come to seemingly opposing results because not all of the highly complex array of factors could be taken into account. This heterogeneous picture of research findings, which is also described by Hyland & Hyland (2006), is further exacerbated by the fact that what may be effective for one setting and one focus may not yield effective results with a different learner group or a different focus. Moreover, the focus on, for example, improving grammatical accuracy may require a way of delivery that differs from a focus on improving an argument or developing writing ability overall, as we will outline in the following section. Focus

As indicated at the onset of this chapter, the two main goals of feedback on writing are either on improving a text or on developing writing ability (see Chapter 8 for a related discussion on the differences between text quality and writing ability). Depending on these goals, the focus of feedback will differ: Often, text improvement is addressed by a rather narrow focus on linguistic correctness via corrective feedback –​task level feedback in Hattie and Timperley’s terminology. This kind of feedback alone is often too limited to be classified as diagnostic. Truly diagnostic feedback aims at developing

246  The role of feedback in diagnosing SFL writing ability

writing proficiency, by addressing a writer’s struggles, along with underlying reasons and ways forward. While it may include corrective feedback, diagnostic feedback will focus on all four levels of Hattie and Timperley’s framework, and it is most effective if it focuses on all three questions of feeding up, back and forward, in line with the concept of diagnostic cycle or process (see Chapter 1 and Harding et al., 2015). There are, of course, goals that lie between these two extremes. Think, for example, of text improvement across several drafts, where task-​level feedback is aimed primarily at improving concrete linguistic or stylistic aspects of the text; nevertheless, feedback can and often will also target the writer’s cognitive and metacognitive processes and strategies, which, in turn, have the potential to develop writing ability. Moreover, as Hyland (2011) found, some learners engaged with teacher form-​focused feedback not primarily to correct their texts, but rather to test their hypotheses about language, as opportunities for further practise, and for developing strategies for improving their writing abilities. Hence, there are no clear-​cut lines of what exactly counts as diagnostic feedback, and we acknowledge that corrective feedback can have a developmental effect on writing proficiency. Ultimately, the diagnoser does not have control over how feedback is taken up, which is why we argue that monitoring the uptake is part of the diagnostic cycle (see section on requested responses below). Different foci of diagnosis require feedback to be delivered in different ways, as discussed above. The delivery can range from explicit error correction to implicit hints, to providing metalinguistic explanations and metacognitive comments on processes, or self-​ regulatory comments on strategy usage. Feedback targeting different cognitive and metacognitive processes and strategies seems to be essential in such pedagogical practices as process writing, as well as for improving the writing process (planning, reviewing, revising) of each individual draft in general. Moreover, feedback can include advice on mediation and instruction of underlying concepts that were diagnosed as reasons for problems. We have also pointed out above that the focus of diagnosis and feedback may vary with age, beliefs, maturity as a writer, and language proficiency of the learner. While young or novice writers may benefit more from explicit error correction, more mature writers may find metacognitive comments targeted at self-​regulation more beneficial, as for example K.-​H. Cheng et al. (2015) reported. Given this complex constellation of interrelated aspects, we now try to disentangle the two main foci of feedback, namely on the product to improve a text, and on the process to develop writing ability. Providing diagnostic feedback on the product refers to what Hattie and Timperley called task level. The focus lies mainly on the linguistic aspects, organization, genre, and style of the text when it comes to diagnosing writing skills, but there may also arise the need to provide feedback on content, for

The role of feedback in diagnosing SFL writing ability  247

example, in how far topical (mis)conceptions may have lead to misunderstanding or misrepresented facts and arguments in a writer’s text. At the task level, corrective feedback is provided most often, which may explain the large body of studies on error correction. As with other aforementioned aspects of feedback, these studies report mixed results. Hyland and Hyland (2006) argued that the effectiveness of corrective feedback differed with regard to what they called treatable (aspects that follow the rules) and untreatable errors (those that do not, e.g., word choice). Ferris (2006) recommended varying feedback for these two types of errors. Hyland and Hyland also argued that the form-​meaning dichotomy was a false one, as these two aspects are intertwined and cannot be considered in isolation. It seems that feedback focusing on form alone is less effective than simultaneous attention to both form and content, as Biber et al. (2011) reported. While there may be contexts where explicit correction is beneficial, Ferris (2006) recommends the use of indirect feedback, locating errors rather than labelling or correcting them. This chimes with the above discussed maxim of learner engagement and activation in order to stimulate learner development, even if the focus lies on the writing product. When the focus lies on the writing process and the development of writing ability, we are dealing with Hattie and Timperley’s process and self-​regulation levels, as well as the self-​level. Needless to say that there are overlaps between a focus on the product and a focus on the process, particularly when it comes to writing across several sessions with the aim of improving the drafts and learners’ writing abilities. When comparing the effectiveness of self-​level vs other levels of feedback, K.-​ H. Cheng et al. (2015) found with a small sample of undergraduate EFL writers in Taiwan that addressing the process of drafting and revising the text, without explicit guidance and references to specific issues, was significantly more useful for the improvement of the overall quality of writing across different sessions than affective self-​level feedback. This agrees with Kluger and DeNisi (1996), who found in a meta-​analysis of 131 studies a decrease in feedback effectiveness when the focus was moved away from task and processes towards the self-​ level. Tang & Y.-​T. Liu (2018), however, found teachers’ affective comments for low proficiency 7th grade (young adolescent) EFL writers with L1 Chinese motivating for these students, but they did not find any improvement in their writing. Hence, there may be a role for such an affective focus with young novices to keep them motivated. Timing

Immediate or as little delayed feedback as possible has been proposed to be one of the key characteristics of diagnostic assessment (e.g., Alderson & Huhta, 2011) since it enables learners to see how the feedback relates to their performance,

248  The role of feedback in diagnosing SFL writing ability

which helps them benefit from the feedback. Indeed, the literature suggests that immediate feedback is more effective than delayed feedback (e.g., Hattie & Timperley, 2007; Liao, 2016; Zhang & Hyland, 2018). Immediate feedback can mean more than one thing. For example, in the computerized DIALANG system, most of the feedback is provided right after completing one full language test. However, the system includes a type of feedback which is even more instantaneous, the so-​called ‘immediate item feedback’, which the users of the system can activate at any point and which allows them to immediately see if the response was acceptable or not and what the correct/​acceptable answers were. This feedback is optional, and the effect on learning of either type of DIALANG item feedback has not been studied. However, Huhta (2010) investigated over 500 learners’ preferences and found them to be rather divided. While some learners wanted to know the outcome of every response immediately, others considered such information distracting and preferred to study such feedback only after first completing the test without interruptions. This suggests that the timing of some types of feedback could be a factor to be considered when attempts are made to individualize feedback. Requested responses

With regard to requested responses from learners, Ellis (2008) differentiates whether a revision of the text is requested or not, and if not, whether students are to study the corrections or are just handed back the corrected text. This rather narrow focus on correction can be widened to include responses that range from studying the feedback, studying certain linguistic aspects, revising specific linguistic aspects, revising different text sections, to redrafting a text, if the feedback focuses on text improvement. If the goal is to develop writing proficiency, requested responses may include researching and studying theoretical knowledge aspects or genre conventions (see the “Evaluating feedback in diagnostic instruments” section in this chapter). The responses may also require reflecting or practicing strategy use as well as reflecting different stages and processes during the writing activity. The requested learner responses need to be worded clearly and understandably for learners, so that they can react to the feedback and take the necessary actions –​and, thus, complete the diagnostic cycle. It seems helpful to negotiate the meaning of the feedback and what steps can be taken with the learners, as it is practised in sociocultural approaches. Lee (2014), for example, develops a convincing argument for the inclusion of monitoring how learners react to the feedback and how they take it up and forward as part of diagnosis. Diagnoser does not have control over how feedback is taken up and forward but can monitor it. This step represents the end of one cycle of diagnosis and the beginning of the following cycle.

The role of feedback in diagnosing SFL writing ability  249

Summing up: Effective diagnostic feedback on writing

The conditions for effective diagnostic feedback that aims at improving writing ability rather than text quality alone can be summarized as follows (based on, e.g., Bitchener, 2017; Gass, 1997; Schmidt, 2001; Swain, 1985; Tomlin & Villa, 1994). Such feedback should help learners understand the nature and likely origin of their problems, suggest how these could be overcome, and encourage learners to develop their writing skills, as also MacArthur (2016) argues. It should be delivered in a way that motivates learners to improve and should be phrased in such a way that they can notice the problems or gaps that the feedback addresses. This can be achieved by raising learners’ awareness, that is, by pointing out that new information has been provided; by facilitating learners’ understanding, for example by explaining the reasons for the gap or problem and by suggesting how to address these; and by focusing on learners’ intake, in order to enable them to compare new information to their existing knowledge. Finally, the uptake of feedback and students’ actions need to be monitored, in order to ensure that learners have understood the feedback and taken it on board, so that development can take place. Informed by Hattie and Timperley’s model, effective diagnostic feedback on writing should focus first on the task level by drawing learners’ attention to the task demands and to any gaps in fulfilling these demands. Along with this, feedback should contain comments on how to address the task demands, including corrective feedback. This feedback mainly aims at improving the product, the text draft. The next two levels –​process and self-​regulation levels –​ aim at improving learners’ writing skills rather than at improving a particular draft by concentrating on cognitive and metacognitive processes and strategies. These range from underlying concepts of writing, such as genre conventions, and learners’ strategies when planning, drafting and reviewing/​revising their texts to understanding how to integrate the task, their knowledge and their goals in writing, how to assess whether they have reached the goals, how to define new goals, and how to manage themselves to reach these new goals (see also Chapter 6). For all four feedback levels, the focus should lie on (a) the strengths, gaps, and weaknesses (feed back); (b) reasons for the weaknesses, ways forward and actions to be taken (feed forward); and (c) on the learning goals (feed up). Automated writing evaluation and technologically supported delivery

Based on Chapter 8 and the discussion of automated approaches to diagnosing writing, we take up aspects of technology-​supported or-enhanced feedback here. While automated writing evaluation (AWE) refers to feedback that is

250  The role of feedback in diagnosing SFL writing ability

generated via algorithms and delivered online, technology-​supported feedback is provided by human agents but delivered via an online or technology-​ supported environment. With regard to technology-​supported delivery of feedback, T. Chen (2016) reports a synthesis of 20 studies on peer-feedback delivered on computer, with heterogeneous results. Learners generally found the technology-​ supported feedback more flexible, with less control by the teacher, allowing learners to work in their own time. On the downside, some studies found the technology-​ supported feedback confusing, particularly if there was nobody present to explain or negotiate the feedback. This, however, we assume to be true for any kind of written feedback. Some studies in the synthesis reported a feeling of heavy workload. In environments where learners collaborated online, synchronous phases were reported as beneficial for brainstorming and interaction in groups, whereas asynchronous phases stimulated reflection and revision. A growing number of studies is concerned with automatically generated feedback, the aforementioned AWE, also reporting mixed results. Resonating T. Chen’s (2016) results, Liao (2016) and Zhang and Hyland (2018), for example, reported that learners felt a greater sense of distance, more autonomy, and more flexibility in how to respond to the automated feedback. Furthermore, Liao (2016) found that AWE allowed the effective implementation of process writing due to its timely feedback. Li et al. (2015) found for the Criterion tool (see Chapter 8) that learners were satisfied with feedback on errors, but less so with feedback on organization. They also reported that Criterion reduced teachers’ workload and that learners submitted more drafts. Chapelle et al. (2015) found with the IADE tool (see Chapter 8) that its colour-​coding of different sections of academic texts stimulated students’ reflection of text meaning. There are, on the other hand, a number of drawbacks reported in the literature. First, not all algorithm-​created diagnosis or feedback is automatically accurate (Chapelle et al., 2015; Hoang & Kunnan, 2016; Li et al., 2015; S. Liu & Kunnan, 2016; Ranalli et al., 2017). There are indications that experienced teachers are more accurate and provide more helpful feedback than, for example, the Criterion (Dikli & Bleyle, 2014). Furthermore, learners do not automatically take the automated feedback up (Chapelle et al., 2015), nor know how to act on it without the teacher’s help (Huhta, 2010). Learners can also get confused if the feedback from the automated system differs from that they receive from their teacher (see Chapter 8 for details). Moreover, automated tools may reduce linguistic errors, but that may be at the cost of text complexity, as Li et al. (2015) reported. In conclusion, the synthesis by Stevenson and Phakiti (2014) revealed that automated feedback had an effect on score increase and error decrease, but they found no indication that it would improve overall writing proficiency. Given the fact that the current AWE solutions focus on evaluating text quality rather

The role of feedback in diagnosing SFL writing ability  251

than writing skill, as discussed in Chapter 8, this is not surprising. In the realm of diagnostic feedback, AWE may be one step with a focus on, for example, specific linguistics aspects, which themselves may be indicators of development, but AWE needs to be integrated into the overall diagnostic design and cycle. One of the major advantages of AWE is its immediacy, and the reduction of workload for aspects that can be reliably assessed by algorithms. Hence, while AWE may have its place in supporting and complementing feedback and negotiations by and with other human agents, when it comes to explaining reasons for weaknesses, underlying concepts, and individual next steps, teachers and peers cannot be entirely replaced as mediators. Sociocultural angle to feedback

In the previous sections, we underscored the importance of learner agency and learner engagement with diagnostic feedback. This discussion brings forth the importance of negotiating feedback with learners (see Nassaji, 2017), which leads us to the sociocultural perspective on development, as discussed in Chapter 2. In the present section, we will focus on how viewing assistance to learners through the lens of sociocultural theory can further inform our understanding of effective diagnostic feedback. We will first outline how the sociocultural concept of mediation can expand our understanding of feedback. We will then elaborate how feedback as mediation can be organized in practice. Finally, we will discuss the diagnostic value of the sociocultural angle to understanding and giving feedback. Mediation and feedback

One of the central concepts in the sociocultural perspective on development is mediation. Therefore, in order to discuss how sociocultural theory can help define useful diagnostic feedback, we next briefly conceptualize mediation and feedback as mediation. Mediation is conceptualized as central to any human functioning and development, shaping them (e.g., Poehner & Leontjev, 2020; Vygotsky, 1987). Hence, feedback, being a kind of mediation, directs what happens after it is provided. Therefore, sensitivity to the ways that feedback mediates how learners act upon feedback should help us understand better how feedback promotes learners’ development and allows for gaining further insights into learners’ abilities (Bitchener & Storch, 2016; Ellis, 2010; Lee, 2014). As Lee (2017) noted, when feedback is sensitive to learner responsiveness and when it is provided with the intention to promote learner abilities, it corresponds to the three qualities of effective feedback as defined by Hattie

252  The role of feedback in diagnosing SFL writing ability

and Timperley (2007), answering the questions of “Where am I going?” (feed up), “How am I doing?” (feed back), and “Where to go next?” (feed forward) (p. 56). Essentially, then, effective diagnostic feedback as outlined earlier in this chapter coincides in some ways with the sociocultural concept of mediation. One difference is that feedback outside the sociocultural paradigm is often conceptualized as provided to learners, while mediation in the sociocultural paradigm focuses on the process undertaken with learners as a joint activity (see Poehner & Leontjev, 2020). Furthermore, while any feedback can be perceived as shaping the ensuing performance, not all feedback promotes learner development even when it feeds forward. As Lidz and Gindis (2003, p. 104) noted, for Vygotsky, not all interactions are equal in that only some of them promote development. This argument is best understood with reference to the Vygotskian notions of Zone of Proximal development (ZPD) and obuchenie (Chapter 2). To reiterate, ZPD is the range of learners’ abilities that emerge when external assistance is available. ZPD is malleable and emerges in interaction and is promoted by this interaction. Obuchenie is the process in which teaching and learning are intertwined and which both promote learner development and build on it. The notions of intentionality/​reciprocity and transcendence (e.g., Feuerstein et al., 2010) help to understand how mediation in the process of obuchenie can be organized. Intentionality is the mediator’s aim to promote learners’ performances beyond their current capabilities. Mediation, hence, is a joint effort, where, by employing a range of different strategies, the mediator creates opportunities for the learner to act. Reciprocity emerges from the mediator’s intentionality, which creates many ways for the learner to react to mediation but that it also limits and directs these reactions. Finally, transcendence concerns internalizing new knowledge, that is, how the learner applies the new knowledge to novel contexts by reconstructing it. Feedback as mediation

The notions of the ZPD and obuchenie suggest that for feedback to promote learners’ development, it should emerge in cooperation with the learners, and that the responsibility for performance should be shifted onto the learners as much as possible. This involves providing as implicit feedback to learners as possible and, depending on how they respond to this feedback, gradually increasing its explicitness (see, e.g., Nassaji & Swain, 2000). Sociocultural research on feedback and mediation is oftentimes informed by the seminal study by Aljaafreh and Lantolf (1994). Based on the results of their study, the authors developed a Regulatory Scale, consisting of 13 mediational moves of gradually increasing amount of explicitness and detail (see Figure 9.2).

The role of feedback in diagnosing SFL writing ability  253

FIGURE 9.2 Regulatory

Scale (Source: Aljaafreh & Lantolf, 1994, p. 471).

As seen in Figure 9.2, the Regulatory Scale does include feedback, but it is also much more than that. The authors highlight that the sole presence of another person changes the context, as the learner sees the other person as a resource. This enables reciprocity, for example, with the act of asking for help. The function of the scale is not to correct the learners, though correction may as well be the outcome of such mediation, but to provide learners with enough understanding to internalize the knowledge and mediational means needed in similar contexts –​on similar tasks. This implicit-​to-​explicit approach to guiding learners’ performance then allows for shifting the responsibility for performance as much as possible onto the learners. Hattie and Timperley’s (2007) framework adds to the understanding of the scale and how learners should be helped to extend their knowledge to new, different contexts. Namely, feedback should not function only on the task level, but should stretch beyond the task at hand and function also at the cognitive and metacognitive levels. Indeed, feedback such as “it is not really past but something that is still going on” is about the learner’s performance on a particular task. However, it also promotes the learner’s understanding of the use of tenses. Depending on the learner’s ZPD, the learner can either realize which tense is required or request more assistance. We note that even

254  The role of feedback in diagnosing SFL writing ability

when the learner understands which tense is needed in the specific case that the learner and the mediator jointly work on, it does not necessarily mean that the learner is able to apply this knowledge automatically in a different context independently, or even with less assistance from the mediator (see Lantolf et al., 2016). Next time, the mediator can build their assistance to the learner on that previous feedback and on the learner’s responsiveness to it, asking the learner “what happens when some action started in the past and is still going on?”. How exactly feedback can be negotiated with learners along the implicit-​ explicit continuum varies across studies (see Chapters 2 and 4 for some examples). However, it always comprises a variety of moves involving hints, guiding questions, and models, to name a few (Poehner & Leontjev, 2020) and is informed by intentionality/​reciprocity. Another point to be made about mediation is that the intention is not to find out one most effective mediational/​feedback strategy for a particular learner. Rather the whole interaction aims to promote the learner’s development (Poehner & Leontjev, 2020; cf. Ellis, 2017). We acknowledge that negotiating feedback with learners on their writing is not always possible in the same way as with oral performance, especially if feedback is given on writing products (though see Shrestha & Coffin, 2012). However, we argue that studying changes in learners’ performance in response to feedback and when possible, other reactions to the feedback, such as asking for additional assistance, together with this feedback, is important, as it yields insights into how feedback shapes learners’ performance and directs learners’ development. These insights can serve as the basis for modifying the subsequent feedback and instruction, be it on the next draft or a different text written by the learner (see Shrestha, 2020 for possible typologies of mediation and reciprocity in writing). The following Figure 9.3, which is a modification of Figure 6.2 in Chapter 6, is a graphical representation of this view. Figure 9.3 illustrates that the evaluation of the second draft does not simply entail the evaluation of the learner’s text as such, but should take into account the previous feedback given to the learners and, when possible, other forms of learners’ responsiveness to the feedback than just a product of their writing. As teachers do not always do that, however, the corresponding arrows in Figure 9.3 are of lighter colour (see Sadeghi & Rahmati, 2017, for an example of such systematic approach to guiding learners’ writing across texts). In a way, this process of giving feedback to learners across several drafts resembles dynamic assessment, but at a slower pace and with a smaller number of mediational moves (see Chapter 2), and, as we elaborate in the following section, it aligns well with the notion of diagnostic cycle. To summarize this section, feedback promoting learners’ development should give as much responsibility for performance to learners as possible by

The role of feedback in diagnosing SFL writing ability  255

FIGURE 9.3 Feedback

as Mediation Across Drafts.

building on the previous assistance that learners received together with their reactions to that support. In the following, we will discuss the implications of this for diagnosis with reference to notions of intentionality/​reciprocity and transcendence. Feedback as mediation and diagnosis

The diagnostic value of the socioculturally informed approach to feedback emerges from the understanding of learner development as happening in interaction with others. Diagnostic insights emerging from this process involve identifying how close the learner is to independent functioning (their ZPD) and the reasons for the learner’s struggles that remain hidden when only their independent performance is considered. The diagnosis of learners’ ZPD is informed by the understanding that any external assistance both creates several opportunities for the learner to respond to and directs this responsiveness. The learner’s reaction to the mediator’s intentionality then feeds back to the mediator; this itself contains diagnostic information, particularly on how close the learner is to independent performance. It should be remembered that a major goal of feedback as mediation, or any diagnostic feedback, is not to help learners self-​correct, but to reveal learners’ knowledge and help them apply this knowledge in other contexts. Thus, further diagnostic information that can be obtained from learners’ reactions to mediation concerns their areas of struggle. Rahimi et al. (2015) (Chapter 4), give a clear illustration of these two foci ‒ diagnosing and promoting learner development. After the initial inquiry of why the learner writes hesitantly, the teacher continued to probe, for example, “if you had enough time and

256  The role of feedback in diagnosing SFL writing ability

no space limitation, you would include all the ideas?” basing their questions to the learner on the learner’s reactions until the learner’s difficulty with differentiating between brainstorming and outlining was revealed. The teacher then switched the focus to guiding the learner’s understanding of the difference between them. The teacher continued mediating until the learner was able to verbalize the difference between brainstorming and outlining. By first focusing on diagnosis, the teacher was able to organize the following guidance of the learner appropriately. The concepts of transcendence and internalization help us to understand what the action following the feedback or mediation can look like. Transcendence, sometimes referred to as transfer (see, however, Poehner, 2007, for a discussion of differences between the two concepts) refers to the extent to which learners can effectively apply knowledge and abilities that emerged in previous interaction to a new context. A usual technique to elicit transcendence involves sequencing the tasks in ascending difficulty or complexity (see Chapter 5 on task characteristics and difficulty). From a diagnostic perspective, the concept of transcendence applies to the teacher in a different way, as the teacher should build their assistance to the learner also on the insights gained about the learners in the previous mediation or interaction as well as their unassisted performance. For example, a learner’s unassisted performance on an essay, which is similar to an essay this learner wrote previously. can tell the teacher whether the learner has fully internalized the new knowledge. However, whether and how the learner’s mediated performance has changed is as important to assess as the change in their unassisted performance. This can only be done if the teacher, too, actively connects previous insights obtained about the learner with new insights to adjust the feedback and the following instruction. In practice, this means that the teacher should both consider how much help the learner needs with specific struggles as well as possible reasons for the learner’s struggles and possible ways to address them. These decisions should be based on previous interaction(s) with the learner, but it should also be borne in mind that the learner’s ZPD could have developed due to the previous interactions. To give an example, at a later interaction in Rahimi et al. (2015, p. 202) with the same learner as we discussed above, the learner needed only minimal intervention from the teacher to remember that they should write according to the outline, not the brainstormed list of ideas. This and the preceding episode illustrate, therefore, how in interaction, learners’ problems can become visible and how they internalize feedback, thus developing their conceptual understanding (and, consequently, their SFL writing). Transcendence, thus, aligns with the notion of the diagnostic cycle in that the teacher (mediator) modifies the instruction based on the diagnostic insights that emerge in the previous assessment. However, the sociocultural perspective on diagnosis expands this notion to include adjustment of the teacher’s instruction

The role of feedback in diagnosing SFL writing ability  257

as a co-​constructed activity, in which teaching, learning, and assessment are intertwined. Evaluating feedback in diagnostic instruments

In this section, we will discuss three diagnostic instruments with the angle of studying how the feedback provided in these instruments is diagnostic: VERA8, DIALANG, and Roxify Online. The analyses are based on the main categories that we summarized earlier in this chapter: agency, delivery, focus, timing, and requested responses, shedding light on how the feedback in future diagnostic L2 assessment systems could be developed. VERA8

VERA8 is a diagnostic test used in the German school system that is aligned to the CEFR and makes use of the CEFR scales in its reporting approach (Chapter 4). Its aim is to inform teachers, parents, and students about reaching the learning outcomes that are aligned with the CEFR, spanning levels A1 to B1 across three school tracks. VERA8 takes place one or two years before the end of lower secondary schooling and thus gives an intermediary diagnostic insight into the areas where students are well on their way, and those areas where students fall behind the expected progress (see CTE, 2017, for a somewhat similar national diagnostic examination in the Netherlands). With regard to feedback, VERA8 functions on different levels, with the main focus lying on educational monitoring and on giving teachers feedback on the class level, also in comparison to other classes and schools, as well as in comparison to regional districts and to the federal state. With regard to the writing section of the test and the feedback gained from it, the learner texts are assessed by an analytic rating scale (see Chapter 7). The detailed descriptors of this scale target task fulfilment, grammar, vocabulary, and organizational features. The rating scale provides teachers with enough details to diagnose in which areas learners are struggling and what goals need to be set for the following instruction. Focusing on the grammar descriptors for level B1, we illustrate how such a diagnosis can happen (see Table 7.3 in Chapter 7). By comparing learner texts against these detailed descriptors, teachers can diagnose for an individual learner, but more reliably for their whole class, what features are missing or are underdeveloped. These aspects serve as an indication of where the learners within one class should go next. Such a profile by itself can serve as diagnostic feedback: it does not just inform how learners perform on a specific task and how well they can execute certain linguistic features, but it also shows those detailed aspects that teachers cannot yet do well enough.

258  The role of feedback in diagnosing SFL writing ability

The focus of the writing profile lies on promoting learners’ writing skills as defined in the rating scale rather than on improving a text. As stated above, the main purpose of VERA8 is diagnostic educational monitoring. Hence, the feedback is mainly geared to the teachers as agents, who get valuable information on the writing skills within their classes; this can be used to plan teaching in relation to the goals set by the educational standards. It is the responsibility of the teachers to break down the standards into meaningful and teachable steps, and the feedback form VERA8 helps teachers to align their teaching to where their class stands as diagnosed in the assessment, and where it needs to go next, as outlined in the educational standards. VERA8 is usually accompanied by a practical guide for teachers suggesting follow-​up activities and tasks. Nevertheless, the assessment does allow certain insights into individual learners’ writing skills, with the limitations that learners work on tasks set by an external exam body, and the test only allows a snapshot view of one point in time. Here, it is again the teachers’ responsibility to deliver individualized feedback to the learners. Considerations regarding the timing, mode, wording, and activities that require learners to take up feedback are usually up to the teachers, but supported by the aforementioned practical guide. In order to provide meaningful individualized feedback, it is important that teachers and learners have a shared understanding of the construct in the rating scales descriptors, for example, what they understand as frequent structures and how these can be acquired by the learners. It is up to the teachers and learners to agree on suitable ways for monitoring uptake and further actions. We now turn to an instrument that attempts to combine the feedback based on the CEFR with more focused diagnostic feedback –​the DIALANG system. DIALANG

DIALANG was the first large-​scale assessment system based on the CEFR and that Framework informed both the content of its language tests and much of its feedback (Alderson, 2005; Huhta & Figueras, 2004; Huhta et al., 2002). The writing tests in DIALANG are indirect, that is, they comprise multiple choice and gap-​fi ll tasks targeting specific aspects of writing: accuracy, appropriacy, or textual organization. The main reason why writing is not assessed directly is that this would have required either human rating of learners’ performances or some automated analyses of the texts, neither of which was feasible. However, such indirect tasks are compatible with our understanding of diagnosis as aiming at a detailed investigation of specific features of language and component skills with a use of different, often discrete-​point item types (see, e.g., Alderson & Huhta, 2011, and Chapter 5). Agency in DIALANG is a complex question, as it is in many other computerized assessment systems. The main provider of feedback is the

The role of feedback in diagnosing SFL writing ability  259

automatized system functioning over the Internet: it delivers the test results and all other information to learners. Users have to print or take screenshots of the information in order to save it. Otherwise, they can just read the feedback. The same applies to the teachers when DIALANG is used in an educational institution: the results of individuals or groups of students are not available automatically to the teacher, but have to be collected from each individual learner by the methods described above. Thus, making a permanent record of DIALANG feedback requires some effort, which is obviously an obstacle in its application for learning and teaching. Although the main provider of DIALANG feedback is the computerized system, learners are expected to have certain agency in the process and so are teachers, if present. DIALANG was designed for learners studying languages both independently and in institutional contexts. It was hoped that learners would find ways to use the feedback in their further language learning, and thus, have more agency in the learning process. For example, the fairly detailed descriptions of the CEFR levels aim at increasing learners’ understanding of their present proficiency and of their next most immediate goal. To support this, the system presents concrete advice on how the learner might make progress towards that next CEFR level. Thus, in line with the three key feedback questions posed by Hattie and Timperley, DIALANG provides its users with information about potential goals, their standing with respect to those goals, and ways to achieve them. DIALANG feedback is voluminous, and some of it may be challenging for inexperienced learners. Therefore, the feedback comes in 18 different languages so that many learners can study it in their first language. Unavoidably, some learners find it difficult to process the feedback alone (see Huhta, 2010) and would benefit from the teacher’s interpretation of the feedback and from the activities they could give to them based on the feedback. DIALANG delivers both direct and indirect feedback. The clearest example of the former is information about the correctness of responses to individual test items, which the learner can obtain immediately after answering an item or after taking a whole test. This is the only type of feedback in DIALANG that probably focuses on developing the text rather than the writer’s abilities, even if the ‘text’ related to such an item may be just one sentence. DIALANG also delivers feedback that can be considered indirect. Examples of this include advice on how to make progress towards the next CEFR level and information regarding a match vs mismatch between self-​assessment and test result, and the accompanying discussion of possible reasons why they might not match. That last type of feedback clearly involves metalinguistic explanations which focus on the nature and development of language proficiency. The indirect feedback encourages the use of different kinds of cognitive and metacognitive strategies, and can, thus, be seen as aiming at developing writers’ skills rather than their texts.

260  The role of feedback in diagnosing SFL writing ability

The requested response from learners is quite limited in DIALANG, but some response may take place (see Ellis, 2008) even if it is likely to vary from one individual to the next. Since learners do not write complete texts in the writing test, a revision of the text is not applicable in this case. However, learners can study the results item by item, see what they responded and what the accepted answer(s) were, and, possibly, think about why they replied the way they did and why their answer might not be acceptable. In summary, DIALANG provides learners with a range of feedback immediately after each test. The feedback addresses all three key questions in Hattie and Timperley’s model (where now, where to, how) and all the feedback levels except the self-​level, unless perhaps indirectly, as we discussed in the “Conceptualizations of feedback” section of this chapter. However, the system is limited in that it is somewhat complicated to make permanent records of the feedback and, as with other automated feedback systems, interpreting feedback often requires help from a teacher. Roxify Online

As DIALANG, Roxify (see Chapter 4) is designed to give its users computerized diagnostic feedback. Differently from DIALANG, however, Roxify assesses learners’ writing directly through the use of algorithms. Before we discuss the feedback in Roxify with reference to the characteristics of diagnostic feedback that we summarized earlier in this chapter, we will briefly discuss the construct that Roxify assesses and how it provides the feedback. The diagnosis in Roxify focuses on L2 English vocabulary in academic essays. The system gives feedback on the following aspects: • • • • • •

taboo words, such as clichés, idioms, and slang words; warnings, such as misspellings, inclusive pronouns, and value-​laden words; general and academic vocabulary wordlists; vocabulary, such as duplicates, hedging, and statistical metalanguage; citations; and such figures as the number of words and sentences and a readability score.

Once a text is uploaded into the system, feedback in English is immediately displayed to learners. Teachers, too, can access their students’ essays and the corresponding feedback in the system. Roxify feedback consists of several parts. The system colour-​ codes the problematic vocabulary in the essays. Explanations are then given, informing the learner and the teacher of what is found problematic and what could be done to remedy these problems. For most categories, links and/​or practice exercises are supplied that explain further what

The role of feedback in diagnosing SFL writing ability  261

a particular category is or give examples. Finally, strategies are suggested as to how learners could make sure to address their problems in academic writing. For example, when too many discourse markers are used, the system tells the learner to “check that all of discourse markers are needed, check that all of them are used correctly, and check that the reader is not confused by the number of discourse markers”. The system also informs the learner/​teacher when there are no problems detected for a particular category but urges learners to check. The feedback is, therefore, rather detailed and addresses the task, the process and self-​regulation, and the self-​level in Hattie and Timperley’s (2007) framework. The feedback in Roxify also informs learners where they are in relation to the set goals, where they are to head next, and for most categories, how they can get there. The main provider of feedback, as in DIALANG, is the system. One major difference, however, is that in Roxify, the feedback is based on an automatic analysis of the text. This means that learners are given more agency than in DIALANG, as they are reminded by the system that the algorithm can both mark false positives and skip problematic words; learners are, therefore, encouraged to check their texts themselves. Learners are free to click or not to click links with further information on the parts of the assessed constructs. The teachers are, too, given agency in that they can further annotate texts using the built-​in annotation function and specify vocabulary that learners are expected to use in their academic writing. In terms of delivery, Roxify feedback is rather direct though it does not correct learners’ mistakes but marks them. Even though the feedback clearly goes beyond the text at hand, informing learners about what quality academic writing should be like in terms of vocabulary, there is somewhat greater focus than in DIALANG on improving the text. For example, learners are given an opportunity to upload several drafts of the same text and check whether they were able to successfully address the problems marked by the system. This facilitates monitoring learners’ uptake and adds a process-​oriented focus. This does not mean, however, that there is no focus on developing learners’ writing ability, which is strongly embedded in the feedback, for example, “You should remember to vary your vocabulary in your essay… Check the thesaurus and make sure you are using synonyms”. Roxify requests responses in a specific way, removing or replacing certain words. It elicits further responsiveness from learners by encouraging them to follow the links to exercises covering specific parts of the construct. That said, Roxify does not elaborate on sources for learners’ weaknesses. It also does not target a particular proficiency level though judging by the level of complexity of Roxify feedback messages and that they are delivered in the target language, a rather high level of proficiency is expected from the learners who use it.

262  The role of feedback in diagnosing SFL writing ability

Main implications for diagnosing SFL writing

In the present chapter, we have discussed what we consider to be the key characteristics of diagnostic feedback and suggested ways to ensure that such feedback is useful, which means that it promotes learner development. We have also shown how the key characteristics can be applied in analyzing the feedback provided by certain existing diagnostic instruments, some of which provide their users with automated feedback. We now summarize our view of the desirable characteristics of diagnostic feedback on writing, based on the literature reviewed in this chapter but also our analysis of the three instruments in the “Evaluating feedback in diagnostic instruments” section in this chapter. In short, diagnostic feedback: • should help learners understand where they are (what they can and cannot do), where they are heading next (their learning goals), and how they get there (action to take); • should relate the identified learners’ problems to the task at hand (task level in Hattie and Timperley’s framework) and propose strategies as to how learners can address the identified problems in following tasks (process and self-​regulation levels); • should not demotivate learners so that they could self-​ regulate their performance better (self-​level); • should be based on a detailed and comprehensive definition of the writing construct; • should take the immediate recipient into account; for example, teacher should be given hints as to how to deliver feedback to learners; • should be detailed and qualitatively explanative; • should be provided immediately or as soon as possible; • should take prior teaching and learning goals into account and link the explanations and suggested actions to these; • should raise awareness of strengths and weaknesses both in the writing products and processes; • should focus on improving writers’ abilities, not just the quality of their texts; • is enhanced if it can be individualized, viz. tailored to the particular learners, such as their proficiency level, abilities, beliefs, and development (e.g., their ZPDs); • should apply flexibly direct and/​or indirect feedback as well as different forms of delivery in ways that suit the particular context; • can focus on strengths and/​or weaknesses; when it centres on weaknesses it avoids being negative and threatening to learners’ self-​esteem, rather aiming at ensuring a positive affective reception; • should aim to build a shared understanding of what underlies these weaknesses, thus explaining the origins of the problems;

The role of feedback in diagnosing SFL writing ability  263

• should suggest how the identified problems can be remedied, adjusting prior goals; • should build on prior diagnosis, also taking into consideration how learners reacted to prior feedback; • should link with and lead to concrete action by the teacher and/​or the learner to address the diagnosed weaknesses. The list should not be understood in such a way that diagnostic feedback in a particular instrument or procedure should have all of the qualities that we listed above. After all, the feedback in none of the instruments that we discussed in this chapter has all of these qualities. Rather we propose the list can be used for three major purposes: • validation of feedback in the process of creating novel diagnostic instruments; • selection of existing diagnostic procedures for a particular context; • guidance of teachers who are developing their feedback practices. An example of the first purpose is the ongoing revision of the DIALANG system. Harding et al. (2018) provide a tentative outline of how DIALANG 2.0 might function. Based on Harding et al.’s (2015) conceptual work on defining the stages of an ‘ideal diagnostic process’, the revised diagnostic procedure is envisaged to start with the selection of the skill area (e.g., writing), followed by a two-​ stage self-​ a ssessment (or assessment by the teacher) that aims to identify one to three more specific areas of concern (e.g., textual organization or pronoun references). After this initial assessment, the learner is working on a series of tasks targeting those areas, which leads to a diagnostic decision and feedback. It is in the design of the feedback that lists like the one above could be useful. With regard to the second purpose of such a list, it could be helpful for selecting particular procedures/​ instruments in the SFL writing classrooms. Feedback in these instruments should not, of course, be the only criterion for selection; however, it is an essential part of the diagnostic cycle. Hence, evaluating how well the feedback in specific instruments fits and directs one’s teaching and learning goals is an important part in the selection process. The third purpose of our list is to guide teachers who are developing their own feedback practices. The bullet points can serve as reflection and stimulation. Diagnostic feedback, playing a seminal role in the diagnostic cycle proposed in this book, also agrees with other cyclical models such as the teacher-​based assessment cycle proposed by Davison and Leung (2009), which we will discuss in more detail in the following Chapter 10.

10 CONCLUSIONS AND WAYS FORWARD

Introduction

The idea for this book was born as we recognized the need to expand the argument in The Diagnosis of Reading in a Second or Foreign Language (Alderson, Haapakangas et al., 2015) to SFL writing. Having that volume as the starting point for our thinking, we constructed the present book as interrelated and intertwined chapters. Even though some specific approaches to diagnosing learners’ SFL writing are discussed in the book, we do not aim at proposing a detailed ‘recipe’ for operationalizing diagnostic assessment by proposing ready-​ m ade tasks or activities. Instead, we encourage the reader to think about these based on the themes covered here. Rather than focusing on specific contexts, such as primary, secondary or tertiary level of education, we build our discussion on all educational contexts to make the discussion more generalizable. This final chapter briefly summarizes the preceding chapters and their main takeaways for diagnosing SFL writing. We then discuss three important themes that are referred to in numerous places across the chapters: (1) role of learners’ L1, SFL proficiency, and L1 writing skill; (2) individualization of diagnosis, and (3) diagnosis in the classroom. These are followed by a consideration of the need to advance our understanding of how SFL writing ability develops and how best to diagnose that development. Finally, we describe some promising ways to improve the diagnosis and areas where further research is needed. In Chapter 2, we discussed various theories of SFL writing development. The major takeaway for diagnosis from that predominantly conceptual review

DOI: 10.4324/9781315511979-10

Conclusions and ways forward  265

was that SFL writing ability is individual, contextual, and multidimensional, and that development follows a unique trajectory for different learners, which makes diagnosis a challenging activity. Therefore, combining several theoretically informed approaches can bring better results for diagnosis. In Chapter 3, we discussed the cognitive basis for the SFL writing ability and its development. A synthesis of various cognitive models of writing led us to conclude that SFL writing develops through the development of both SFL proficiency and writing ability in general. We also outlined the complexity of diagnosing weaknesses emerging from this understanding, as diagnosing SFL writing should also consider possible learning problems/​d isorders. Chapter 4 investigated more closely the contexts, agents, and constructs, as well as intra- and interpersonal factors involved in the diagnosis. The chapter also illustrated and analyzed several concrete instruments and approaches to diagnosing SFL writing. An important takeaway from the chapter was that there is a need for teacher training that aims to improve teachers’ diagnostic competence. In Chapter 5, we discussed the characteristics and design principles of tasks used to diagnose SFL writing. We concluded that the decision about the most appropriate task format depends on the purpose and context of the diagnosis. Both direct and indirect writing tasks can serve useful diagnostic purposes. Direct tasks can be useful for diagnosing learners’ abilities to perform real-​world writing tasks since they probably elicit all relevant interacting components of the complex composition process. However, indirect tasks may be appropriate in diagnosing underlying sub-​components of writing, or learners’ knowledge and awareness of relevant sub-​components measuring the potential for writing. Chapter 6 outlined the distinction between two types of writing processes: writing during one writing session and writing across several drafts of a text. We synthesized the discussion by proposing methods for diagnosing writing processes and strategies for writers at different stages of the writing process. Takeaways from the chapter include the importance of learning to self-​ diagnose one’s writing process and the value of focusing more on the planning and revision stages as writers tend to pay less attention to them compared to text generation. Chapter 7 discussed the assessment and analysis of writing products, focusing on raters and rating scales. We acknowledged the rater effect, emphasizing the importance of rater training on SFL diagnosis and synthesizing what this training can include. Of particular importance is that raters interpret the smallest units of the scales (i.e., individual descriptors) in the same way since they can often agree on the overall score but disagree on the diagnostically crucial details. We also elaborated on the differences between classroom-​based and large-​scale diagnosis, suggesting how rating instruments can be co-​created with learners in the classroom.

266  Conclusions and ways forward

Chapter 8 discussed the automated analysis of writing. Automated tools are particularly useful when teacher support is available and when revision is part of the writing process so learners’ progress across different drafts and texts can be traced. Most such tools focus on the grammatical, lexical, and mechanical features of texts at the phrase, clause, and sentence levels, which is an obvious limitation in their use. Another major issue is the inaccurate identification of errors, which creates challenges with automated feedback. Finally, Chapter 9 focused on feedback in diagnosis. Based on the synthesized research, we proposed a list of characteristics for useful diagnostic feedback and three purposes the list can be used for, viz. validation of feedback in diagnostic instruments; selection of existing diagnostic procedures for a particular context; and teachers’ guidance. Themes bridging the chapters

While the different chapters of the book focused on specific aspects of diagnosing SFL writing, some of the themes spanned several chapters. Here, we focus on several such themes. We first discuss the effect that writers’ linguistic background, language skills, and general experience in writing might have on diagnosing their SFL writing ability. We then analyze how more individualized diagnosis might be achieved, and finally, summarize the characteristics of diagnosis in the classroom. Implications of learners’ L1, SFL proficiency and prior writing experience for diagnosing SFL writing

Learner’s and diagnoser’s first language. The relationship between the SFL learner’s first language and the second or foreign language may need to be considered when diagnosing the learner’s SFL skills. This appears not to have been addressed often in the diagnostic assessment literature, but other fields of applied linguistics can provide us with useful insights. From second language acquisition research, we know that the typological distance between the learner’s L1 –​or other languages that the learner already knows well –​and the language to be learned is an important factor that affects the pace of learning as well as what learners find easy vs difficult to learn (e.g., Ringbom, 2006). Some recent research on the linguistic basis of the Common European Framework of Reference has demonstrated that, for example, the syntactic patterns in learners’ written texts in English as a foreign language can differ considerably as a function of their L1 (e.g., Banerjee et al., 2007; Khushik & Huhta, 2020). Contrastive linguistics and contrastive analysis that started already in the 1960s, are other fields that might be useful for diagnostic insights on the relationship between learners’ L1 and SFL (e.g., James, 1980).

Conclusions and ways forward  267

The degree of (dis)similarity between the learner’s L1 and the SFL affects the type of problems learners are likely to encounter in their SFL writing. In some cases, the writing systems in L1 and the SFL differ, which means that even the basic mechanics of producing the graphical characters can be a challenge. In the case of relatively closely related languages such as German, Swedish, and English, learners face different problems, for example, somewhat different meanings of the words that look alike in the cognate languages. Knowing the typical difficulties that particular L1 speakers have when they write in a particular SFL would, thus, be useful for the diagnoser. This also means that the teacher-​a s-​d iagnoser’s task is easier when the learners’ come from only one L1 background –​or a very limited number of different L1 backgrounds –​and when the teacher and the learners share the same L1. Learner’s proficiency in the second or foreign language. Similar to taking writers’ L1 into account, we also need to consider their SFL proficiency level when diagnosing SFL writing. With increasing proficiency, both learners’ skills and knowledge and attention foci change. This has to do with the fact that as processes increasingly get automated, cognitive capacities for new areas of attention are freed. For example, in the beginning, a learner’s attention is focused on processing words, phrases, and simple sentences, so that there are no free capacities to focus on aspects such as text structure or audience. Later in a learner’s SFL development, when linguistic processes have become more automated, attention can be paid to choosing the appropriate register or effectively structuring one’s line of argument. This has implications both for the writing processes as well as the written products, with distinct differences for writers at different levels of SFL proficiency. These differences need to be accounted for in writing instruction (with teachers focusing on different aspects that suit the learner’s level), in the design of writing tasks (tasks reflecting suitable cognitive and linguistic demands), as well as in assessment scales (reflecting expectations geared towards the targeted SFL proficiency level) and in diagnosing SFL writing abilities. When using the written product as a basis for diagnosis, it is important to consider the characteristics of the tasks that elicited the product, as well as the setting in which the text was produced, for example, written as a test in one session, or drafted and revised over time. The SFL proficiency together with task characteristics and setting will then determine the focus of the diagnosis. If beginners are asked to fill in a form, the diagnosis will perhaps focus on appropriate lexical items and spelling. With advanced learners asked to write a comment for a specific audience and purpose, communicative effectiveness, text structure, and rhetorical means can be the focus of diagnosis. Thus, a clear conceptualization of task characteristics and expected text features aligned to different proficiency levels will facilitate a meaningful diagnosis of written products. Here, existing proficiency scales such as the CEFR can be of help,

268  Conclusions and ways forward

since they describe learners’ proficiency at ascending levels with reference to the kinds of writing tasks they can write, different audiences they can address, or genres they can master. In addition, the CEFR contains scales for the linguistic competences that can be expected at different SFL levels. Similarly, when it comes to diagnosing writing processes, differences in SFL proficiency along with task characteristics will, again, determine the focus. For example, beginning SFL writers, whose attention is most likely focused on processing the vocabulary needed to fulfil a given task, might not have much capacity available for extensive planning. A schema-​based planning strategy –​ perhaps based on prior instruction or derived from their L1 writing experiences –​ may probably be the first strategy available for learners at their earliest stages of SFL writing. Hence, when diagnosing planning strategies, the diagnoser will most likely not be able to suggest other, for example, knowledge-​based, planning strategies, as these require a certain size of meaningful vocabulary, which learners will only have acquired at a later point in their SFL development. Likewise, more cognitively demanding strategies, such as constructive planning, can only be expected at higher SFL proficiency levels. Successful constructive planning, for instance, requires that the writer is both aware of the needs and characteristics of the target audience and rhetorical conventions of the genre, and is able to construct a text that meets those requirements. This presupposes that learners have the necessary linguistic means at their disposal, which cannot be expected from beginning SFL learners. Hence, the diagnoser should have a clear understanding of the processes that can be expected for different levels of SFL proficiency. So far, no comprehensive taxonomy of writing processes at different proficiency levels has been developed, and therefore, the above discussion is rather speculative. What is, however, known (see Chapter 6) is that less proficient writers spend more time on generating text and solving linguistic problems, while more proficient writers spend more time on global planning and higher-​level textual revision. Yet, until more research is conducted aligning processes to proficiency levels, well-​informed diagnosis would benefit from a careful analysis of the processes that were intended by a given writing task, and from an analysis of how well these processes were successfully carried out during the writing assignment. For appropriate diagnosis it is also important to consider the setting in which the writing task was completed, as was pointed out earlier. The broad diagnostic implications of learners’ SFL proficiency discussed above are particularly relevant for writing done under time pressure, which is the case with many classroom texts. The diagnosis of writing without time pressure and with the opportunity to revise one’s text is likely to be more complex because in such settings even beginners can attend to the planning of such global aspects of writing as overall text structure. This is more likely to happen when the SFL learners are experienced writers in their L1. Thus, diagnostic assessment needs

Conclusions and ways forward  269

to take both the learner’s SFL proficiency and the setting of the writing task into account when providing individualized instruction and diagnosis. Finally, the SFL proficiency level may also affect how feedback can be taken up (see Chapter 9). It appears that learners at all levels can benefit from direct corrective feedback, only the more proficient learners may be able to utilize more indirect and explanatory feedback independently. However, less proficient learners can also benefit from such feedback if it is discussed with them in an accessible way. Given the somewhat inconclusive research landscape, diagnostic feedback may be most effective if it is tailored for and negotiated with the targeted learner group in the specific setting. Furthermore, the feedback should be discussed with the learners, to ensure they can understand it, take it up and act on it accordingly. Prior writing experience. Many SFL learners have considerable experience in writing in their first and possibly other languages. This is often the case with adult SFL writers; experience or expertise in writing is usually a matter of age and education. Learners’ previous level of writing expertise may be transferable to their SFL writing, whereby their SFL proficiency level might determine how far such a transfer may be possible. Such transfer, however, may not happen without explicitly drawing writers’ attention to their existing expertise. With regard to writing processes, research on the effect of general writing expertise has discovered differences in novice vs. expert writers’ planning and revision. Planning and revision can take place at two levels: the plan itself or the text. Flower et al. (1989, p. 39) found that less experienced writers focused on evaluating their texts to see if their current text conflicted with their plans, and tried to solve the problems simply by rewriting and by producing new text. Although expert writers did this, too, they often tried to address problems at the abstract plan level, that is, before generating or revising their texts. The latter strategy is potentially more effective as it saves the time that would go into producing text without first having an idea of whether it would work. Regarding revision, research shows that writers focus on modifying their texts at the surface level particularly if they are younger and/​or inexperienced writers or with low SFL proficiency (Fitzgerald, 1987). It should be noted, however, that personal preferences and styles also play a role in decisions to either plan or just write more: some writers simply prefer to plan while they are generating text (Deane et al., 2008). Therefore, the diagnostic implications as far as planning and revision are concerned are not straightforward. As a first step, it might be useful to diagnose writers’ preferred planning and revision styles in SFL with the aim of raising writers’ awareness of their preferences. Next, writers could be asked to compare their preferences in SFL with those in their L1. As a follow up, writers could develop alternatives to those SFL styles that are not effective, transferring

270  Conclusions and ways forward

successful strategies from previous writing experiences to their SFL writing. Process writing (see Chapter 6) could be useful for this activity. Studies comparing how much time novice and expert writers spend on planning, formulating, and revising their texts are also useful for the diagnosis of SFL writing. Experts spend considerable amounts of time planning at all stages of the writing process (see Chapter 6). The amount of time spent on various activities is not directly observable in a finished text; therefore, information about it has to be gathered through self-​a ssessment, observation, or portfolio, for instance. Diagnosis can focus on the amount of planning and revision, and the timing of planning –​before writing /​during writing /​during revision. Feedback should encourage writers to reconsider how much time they spend on planning and evaluating their text, and at what points during the writing process. There is also some research on how writing experience affects the utilization of feedback. As we reported in Chapter 9, it appears that more mature writers may find metacognitive comments targeted at self-​regulation more beneficial, whereas novice writers benefit more from explicit direct feedback. Individualization of the diagnosis of SFL writing

Studies on diagnostic assessment and feedback suggest that much of the feedback from existing diagnostic assessments is too general and that personal, individualized feedback would be more useful. Individualization of the diagnosis of SFL writing was a recurrent theme across several chapters of this book, as well as the preceding section: Considering the writer’s language background, level of SFL proficiency, and previous writing experience or style are important steps toward tailoring diagnosis, feedback, and recommended actions to the needs of the individual learner. However, certain other personal characteristics could take the individualization of diagnosis even further. At a general level, the diagnosis of SFL writing needs to consider the existence of writer types, who differ in which strategies they apply during the writing process (see, e.g., Arnold et al., 2012; Knorr, 2016). One factor along which it is possible to distinguish writer types is the amount and time that writers engage in planning, generating, and revising texts. Some writers prefer a top-​down approach to their writing and want to plan in advance while others want to start writing right away and, thus, opt for a bottom-​down approach to producing texts (see Cumming, 1989; Deane et al., 2008). Other writers plan while they generate text (Manchón & Roca de Larios, 2007). Diagnosis could aim to increase writers’ awareness of their preferred styles, and tailor feedback accordingly.

Conclusions and ways forward  271

Individuals differ also in how they process feedback. Liao (2016; see Chapter 9) identified four learner types among Chinese college-​level students in their uptake of feedback on writing: goal setters, accuracy pursuers, reluctant learners, and late bloomers. The preferred timing and amount of feedback can also vary between learners. Huhta (2010) found that some DIALANG users wanted to know the correctness of their responses immediately, while others found that disturbing and preferred to review their responses only after completing the entire test. While DIALANG uses indirect writing tasks based on multiple-​choice and short-​answer formats, the finding nevertheless suggests that writers differ in how immediate they want their feedback to be. Diagnosers need to become aware of their learners’ preferences, so that the delivery and format of the feedback can be tailored accordingly. This is particularly relevant for classroom diagnosis (see the “Diagnosis of SFL writing in the classroom” section in this chapter). Furthermore, individuals’ beliefs about assessment, feedback, and the role of the teacher, peers, and themselves in the learning process vary, which can even prevent them from utilizing it effectively (Doe, 2015; Hyland & Hyland, 2006; Kunnan, 1995; see Chapter 9). Hence, it may be necessary for the teacher to explain how diagnostic feedback could be used to support learning. In sum, many factors need consideration when individualizing the diagnosis of SFL writing. One approach to personalization is offered by the automated analysis of written texts. One future development of automated writing evaluation (AWE) systems concerns tailoring analysis and feedback to the needs of the learner/​w riter and the teacher (see Chapter 8). More flexible and configurable AWE systems would allow feedback to be rationed into manageable portions, so that learners can better digest and learn from it. Another aspect of such individualization would be to give the learner an option to turn on or off specific types of analyses and feedback for a given writing task. Such flexibility would allow learners to choose the focus and level of granularity of the feedback. It would also allow the teacher to align the constructs to be assessed by the AWE tool more closely to the curriculum. Furthermore, AWE tools could monitor how learners respond to feedback and whether they attend to it, which in turn can be used as part of the next stage of diagnosis and feedback for particular learners. Diagnosis of SFL writing in the classroom

In what follows, we revisit the diagnostic cycle introduced in Chapter 1 (Figure 1.1) and outline how this cycle can inform classroom-​based diagnostic assessment. Figure 10.1 depicts the diagnostic cycle with a view to informing classroom diagnosis.

272  Conclusions and ways forward

FIGURE 10.1 Diagnostic

Cycle in the SFL Classroom.

The classroom context plays a prominent role in the book, for example, in Chapter 4 (key characteristics of diagnosis), Chapter 5 (diagnostic tasks in classroom contexts), Chapter 7 (rating in classroom contexts), and Chapter 9 on feedback. Not wishing to repeat what we have discussed elsewhere in the book, we bring together relevant aspects here, outlining typical questions, practicalities, and constraints that have to be considered in classroom-​based diagnosis. Stage 1: Planning

First, teachers will perceive a need for diagnosis, be it because they are taking over a new class, or be it at the beginning of a new year or course (see also Harding, Alderson, & Brunfaut, 2015). Here, the aim would be to find out where the new learner group is, what challenges they face, and how teaching can be adjusted accordingly. Another typical situation could be that teachers observe some problems and want to know the underlying reasons. It could also be that learners express challenges and issues and want to know where exactly the root of the problem lies. Based on the perceived need, teachers (where feasible in cooperation with their learners) can determine the aims of the diagnosis and address basic questions such as: • What results and insights do we expect? • What aspects do we want to focus on (e.g., processes or products, development, discrete elements, one text or text drafts over time)? • Who will be the diagnoser?

Conclusions and ways forward  273

The answers to these questions will depend on the learner group and the setting. Hence, teachers need to consider the learner variables we discussed previously. Learners’ background will influence decisions on what can be expected and how it can be diagnosed (see Chapter 4). Equally important are the teaching context and institutional constraints, such as available time, resources, staff, or access to existing instruments. These considerations will influence the answers to the following questions that a teacher might have: • What aspects of the writing construct are most relevant for my learners’ level of L2 development and prior writing experience? • Which approaches are practicable, given the constraints I face (see Chapter 4, e.g., Will it have to be a one-​ off snapshot or can I afford longitudinal assessment? Will I have to focus on one product or do I have means to analyze several drafts? Do I have resources for procedural diagnosis, such as conducting think-​a loud or eye-​t racking approaches?) • Do I have access to existing instruments, or do I have the resources to develop my own? • Do I have co-​operators for the diagnosis, for example, are my learners prepared for self–​or peer-​d iagnosis? Do I have colleagues with whom I could cooperate? Next, it is advisable to determine the exact nature of the construct(s) to be diagnosed, namely the aspects of writing that should be the focus, and cross-​ check them with curricular goals, teaching aims, and learning outcomes, so that the diagnosis will be aligned to the contextual goals. This is one opportunity to include learners in the planning phase by asking them for the goals that they want to achieve. These considerations will feed into answering the following questions: • What are the most relevant aspects /​constructs that need to be included (e.g., vocabulary, structuring, genres, revision skills, etc.)? • What approaches are most feasible (e.g., indirect items, direct tasks, process writing approaches)? The teacher can plan on their own, but it may make more sense to seek co-​ operators by planning the diagnosis together with the learners, by cooperating with other teachers or other stakeholders at the school /​institution, or by cooperating with researchers (see the end of this section). In some contexts, it may even make sense to plan the diagnosis with parents. The outcomes of this initial stage should inform decisions about: • aims, outcomes, nature of feedback that one hopes to gain; • agents, diagnosers –​teachers, learners, external agents;

274  Conclusions and ways forward

• focus –​constructs, products, processes, development; • nature of diagnosis –​one-​ off, longitudinal, several tasks, several drafts etc.; • practicalities, available resources and constraints. Stage 2: Operationalization: select or develop instruments

Based on the considerations of Stage 1, teachers need to justify their diagnostic approaches and instruments. In Chapter 4, we illustrated some existing approaches and instruments, so that teachers get an initial idea of what is already available and may be adapted for their contexts and purposes. If there is a specific need, and resources allow, teachers may decide to develop their own diagnostic items, tasks or rating scales; for this, Chapter 5 and Chapter 7 can be helpful. Chapter 6 addressed approaches to diagnose processes, should that be the focus of diagnosis. In any case, teachers need to be able to produce a coherent rationale either for using existing instruments or for developing their own approaches, thus answering the following questions: • Do I need discrete indirect items and /​or direct writing tasks (see Chapter 5)? • Do I need one or several diagnostic tasks (e.g., ascending in complexity, demands)? • Do I need to collect data on relevant processes? What kind of data are these and how can I collect them? • Is there a suitable approach /​instrument that I can use or adapt for my purposes? (considering the costs, accessibility, scoring or rating) Is the feedback that comes with these approaches feasible for my context? • If no approach /​instrument exists, do I have the resources and skills to develop one? Where could I get support? How can I handle scoring and feedback generation? • In which ways can I include my learners into the actual assessment? At the end of this stage, teachers should have the necessary materials prepared and the practicalities arranged to do the assessment and to do the marking, including rater training, or self-​training, where necessary (see Chapter 7). It is of utmost importance to consider the information gained by the assessment and how it can be fed back to the learners in meaningful ways.

Conclusions and ways forward  275

Stage 3: Assessing and analyzing

In this stage, the actual diagnostic assessment(s) take place. This diagnosis could be led by the teacher, the students, or external diagnosers, including the possible use of automated diagnostic systems. Once the data have been collected, the data have to be systematized and prepared for marking, rating, or analysis. Once the data have been processed, the results need to be prepared in such a way that meaningful feedback can be presented to both the teacher and the learners. In this phase, the following questions need to be addressed: • How do I best collect the data and enter them for analysis (e.g., papers that need scanning, raters’ judgments, automatically scored digital texts, answer sheets that can be scanned or hand-​m arked, video recordings)? • How will I process the data (e.g., enter answers /​ratings in spreadsheets, transcribe and qualitatively analyze video data)? • Will I seek collaboration with other teachers, stakeholders or researchers? • In which form do I prepare the feedback (e.g., score results from automatic analyses, ratings accompanied by qualitative descriptors that give students insights into their strengths and weaknesses, evaluation of effectiveness of processes)? • What additional information do I need to consider and provide so that the feedback will target the intended purposes and be meaningful for all agents (e.g., feed forward, follow-​up tasks? Stage 4: Feedback

Chapter 9 provided a detailed discussion of feedback, so here we will remind the reader of its key features only briefly. To summarize, diagnostic feedback aims to stimulate learning, insights, and uptake by the learners, as well as modifications of teaching. Hence, teachers should address the following questions when preparing and delivering feedback: • Is the feedback accessible for the learners (e.g., considering their cognitive and SFL development)? • Are the main strengths and weaknesses clearly presented? • Who will be the agents to deliver and explain the feedback (e.g., teacher, peers or self )? • How will feedback be delivered (e.g., automated, direct, indirect, written, mediated)? • What aspects should be foregrounded and what focus feedback should take? • Do delivery and focus meet the intended aims of the diagnosis?

276  Conclusions and ways forward

• Is the setting prepared (e.g., timing, space) so that feedback can be given and received in a learning-​conducive atmosphere? • Does feedback lead to transparent and clear requested responses and actions? • Does feedback convey a clear message regarding what learners should work on? Stage 5: Actions

Diagnostic feedback is only effective if it leads to meaningful actions. Teachers will want to modify their teaching, input, focus, and delivery of those aspects that turned out challenging. They may want to follow up with mediating relevant aspects, with remedial exercises, tasks, and projects. These remedial actions should, again, be cross-​checked with (external) goals and curricula (see Stage 1). Learners should be directed towards incorporating the feedback into self-​a ssessment, self-​d irected learning, and into planning their next steps. This can take place in close cooperation with peers and the teacher. Developing an action plan based on the feedback can be useful, so that the taking-​up of the diagnosis can be monitored by the teacher and/​or by the learners themselves. The following questions can guide this phase: • Do learners have the opportunity to negotiate the meaning of feedback with their peers and/​or teacher? • Are the remedial action points clear upon which learners and the teacher can act? • Are learners enabled to draw a clear action plan? How is the action plan monitored? • Is there a clear action plan on how to address the weaknesses in teaching and/​or mediation? How is the teacher’s action plan monitored? Stage 6: Evaluating achievements of goals

In this stage, teachers, ideally together with learners, may want to monitor how the action plans are put into practice. Monitoring could take place by, for example, revisiting the goals in the plans, or by checking the accomplished remedial tasks. At the end of the diagnostic cycle, teachers want to find out whether the remedial teaching modifications and the learners’ action plans addressed the diagnosed issues. This stage can also focus on evaluating whether the diagnosis achieved its overall aims (see Stage 1 above). The following questions can guide this stage: • Did the students follow up the actions set out in their plans? Did these actions lead to the expected insights and learning gains?

Conclusions and ways forward  277

• Did the teaching address the diagnosed issues? Did the modifications lead to the expected insights and learning gains? • Did the diagnosis meet the aims set out in stage 1? If not, why? What needs modifying to achieve the aims? To find out whether students can now successfully handle the diagnosed challenges, teachers, and students may have to enter the cycle anew at Stage 1 and diagnose whether the actions actually worked and brought the learners closer to the curricular goals, teaching aims, or learning objectives. This is where one cycle “ends” and a new diagnostic cycle could start. Our discussion above of how the diagnostic cycle can inform classroom-​ based diagnosis aligns well with recent work on formative assessment and assessment for learning (e.g., Black & Wiliam, 2018; Wiliam & Leahy, 2015), frameworks for classroom-​based assessment that help teachers reflect on and plan their classroom-​based assessment practices (Hill, 2017; Hill & McNamara, 2012), as well as models for classroom-​ based assessment (e.g., Davison & Leung, 2009). Davison and Leung’s (2009) model, for example, bears many similarities with the diagnostic cycle model discussed in this book. To start with, it conceptualizes classroom-​ based assessment as a cycle of planning assessment, conducting it, interpreting results, and giving feedback to learners. Furthermore, it perceives teaching, learning, and assessment in the classroom as strongly interrelated. However, the diagnostic cycle model we discuss in the present book relates more specifically to classroom diagnosis. Collaboration between teachers and researchers

Diagnosis requires adequate diagnostic competences from teachers; this can be regarded as one part of language assessment literacy (LAL). LAL encompasses knowledge, skills, principles and practices in language assessment, including the use and interpretation of assessment results (e.g., Baker & Riches, 2018; Davies, 2008; Fulcher, 2012a; Levi & Inbar-​Lourie, 2020; Taylor, 2013). The systematic development of assessment competences is unfortunately often neglected in teacher education (e.g., Hasselgreen et al., 2004, Vogt & Tsagari, 2014; Xu & Brown, 2016). While there is not yet enough insight into exactly how assessment literacy, including diagnostic competences, can best be developed (Harding & Kremmel, 2016), there are indications that collaborative professional development (CPD) between teachers and trainers, and between teachers and researchers can be effective in teacher education (e.g., Cordingley et al., 2005; Johnston, 2009; Kennedy, 2011; Westheimer, 2008). There are a number of promising reports from LAL development contexts on collaborative projects from, for example,

278  Conclusions and ways forward

Baker and Riches (2018), Brunfaut and Harding (2018), Harding and Brunfaut (2020), Kremmel et al. (2018, along with other studies in Xerri & Vella 2018), and Poehner and Inbar-​Lourie (2020). Conducting systematic diagnosis involves collecting, analyzing, and interpreting data, the steps contributing to teachers’ LAL and teachers’ research skills. Such skills require consideration in teacher education and ongoing professional development. Developing research skills for diagnosis could and should also become part of teacher education and CPD, just like the development of LAL. Once such research skills have been developed, they will empower teachers as researchers exploring their practices, as Lantolf and Poehner (2014, p. 7) describe, linking their understanding of practice back to Vygotsky: “For Vygotsky, educational practice is a form of scientific research. It is the laboratory where the principles of the theory are to be tested.” This intertwinedness between theory and practice and the centrality of teachers doing research is also emphasized by Lantolf and Poehner (2011, p. 16): “Praxis does not position teachers merely as consumers of research –​and here we might add consumers of test scores and other outcomes from formal assessment procedures –​but recognizes their expertise as central to the iterative development of theory and practice.” Often, however, teachers’ workload is such that there is not enough time to do research. Here, collaborative research projects between teachers and researchers can be conducted, either between interested individuals or as part of wider school-​ university partnerships. Interesting reports include those by Brunfaut and Harding (2018) on a collaborative project between a UK university and teachers in Luxembourg; by Chan and Davison (2020) on a collaboration conducting action-​based research in Hong Kong; or by Hill and Ducasse (2020) on a collaboration in Australia between one teacher and one researcher on written feedback practice. While there may be challenges to such partnerships such as (implicit) hierarchies, these may be found in any collaborative teaching or research projects. Taking a respectful, equality-​based approach and considering the other partners as experts in their fields may be all that is needed to overcome such challenges. The reports on collaborative partnerships between teachers and researchers (see above) promise rewarding outcomes not only for teachers but also for researchers who may want to put certain diagnostic approaches to the test. Advancing our understanding of SFL writing development and how it can be diagnosed

Alderson (2005) argued that for developing diagnostic assessment much more needs to be learned about how second or foreign language development happens.

Conclusions and ways forward  279

Fifteen years later, this argument still holds. In Chapter 2, we discussed how the two broad views of language development, cognitive and socially-​oriented, inform our understanding of development in SFL writing. However, we feel that SFL writing development still needs further problematizing. We start with discussing two related issues: the tension between depicting individual development with tools, such as proficiency or assessment scales, and the inherent difficulty of describing development with the help of a momentary snapshot in a developmental trajectory. Granularity of diagnosing

Most scales describe typical and generalized features based on groups; yet, diagnosis is individual in nature. Scales such as the CEFR are based on making generalizations at a group level by describing commonalities and typicalities for certain proficiency levels in an abstract way. Even though an individual learner can be placed on a specific level on a scale (e.g., on the CEFR level B1) due to a combination of features in that learner’s performance, the description of writing ability for that level may only partially match the particular individual’s ability. As Popham (2009) notes, “making such vertical scales work properly means abandoning any notions of accurate per student diagnosis” (p. 91; emphasis in the original). A possible solution allowing for a more fine-​g rained diagnosis would be to use more specific diagnostic scales or checklists with a number of specific categories that depict those dimensions of writing that are relevant for particular learners at a particular time in their development of SFL writing ability. Automated tools that can analyze even minute details of the text and writing process may also be helpful for focusing on learner-​relevant aspects. To develop appropriate diagnostic scales and checklists that support individualized diagnosis, however, more research is needed on how different aspects of SFL writing develop, for example, which features typically develop together, or whether there are common strengths and weaknesses in certain areas, for specific learner groups. A moment in time vs longitudinal diagnosis

This brings us to the second challenge in diagnosis –​how to interpret information about a learner’s writing ability at a particular moment, obtained via ratings, checklists, or other means, with regard to the learner’s development over time. The challenge of studying development through a cross-​sectional lens has been raised more generally in SLA research (e.g., Ortega & Byrnes, 2009; Ortega & Iberri-​ Shea, 2005; Ployhart & Vandenberg, 2010). Since development is a longitudinal phenomenon, diagnosis based on assessing learners cross-​sectionally faces similar challenges.

280  Conclusions and ways forward

To trace development, we need longitudinal data. Proficiency scales such as the CEFR may have potential for tracking development if learners are assessed repeatedly over a sufficiently long period of time. Yet, they have their limitations as diagnostic tools. This also holds true for tracing attrition of abilities (see Figure 2.3 in Chapter 2). The study by Kektsidou and Tsagari (2019) illustrates the limitation of general (CEFR) scales in monitoring progress in foreign language development. The authors investigated university students who took DIALANG tests of English three times over three semesters. Kektsidou and Tsagari found that the average progress of the 90 students who participated in all three test administrations was hardly noticeable: the mean increase was only about one-​fi fth of a CEFR level. More than half of the students remained on the same CEFR level from the first to the third administration, one-​third improved by one level and about 15% went down by one level. The implications of the study are ambiguous. On the one hand, it shows that when the period is long enough (about 20 months in this case; Kektsidou, personal communication), even such indirect writing tests reporting on the broad CEFR scale as in DIALANG may detect learner progress. On the other hand, the particular diagnostic approach may not be sensitive enough to capture many learners’ progress. The authors attributed this latter finding to the discrepancy between what DIALANG measures (general English) and what the students’ studied (academic English for economics). However, the CEFR general writing scale used and the indirect approach to assessing writing in DIALANG (see Chapter 4) probably contributed to the difficulty of capturing whatever progress the learners had made. Broad, general writing scales are, thus, limited in their diagnostic value in tracking SFL development over time. More detailed scales focusing on specific dimensions of writing and reporting on bands that are sufficiently narrow are likely to be more sensitive to changes in learners’ abilities (see Chapter 7). Interestingly, however, longitudinal studies of SFL writing by using (rating) scales appear to be rare. Therefore, the value of analytical rating scales for diagnosing SFL development still needs to be investigated. Longitudinal use of appropriately specific, rater-​oriented scales, checklists, and other tools to track development can, thus, contribute to diagnosing SFL writing. Still, these are but a series of images of learners’ performance. Such series might suggest that development happens in a linear fashion, level by level (see also Chapter 2). However, two individual developmental trajectories in between two snapshots of learner performance can be vastly different even if both individuals moved, say, from level A2 to B1 on the CEFR. To illustrate, in a study of Chinese learners of English, Larsen-​Freeman (2006) found that the disaggregated data of learners’ development produced a picture vastly different from the stage-​like development. Individual learners moved along different paths in fluency, accuracy, vocabulary, and grammatical complexity across

Conclusions and ways forward  281

four measurement points over six months, exhibiting both progressions and regressions in all of the studied dimensions. This non-​l inearity and individuality of development is something that the diagnosis of SFL abilities should take into consideration. The findings about variation in individual development and the limitations of general descriptive systems such as the CEFR scales imply that tracking progress needs to be frequent enough and that it should be carried out with approaches that are sensitive to the abilities of interest. Frequency and sensitivity are related: less sensitive instruments such as the more general CEFR scales can be used meaningfully only at quite long intervals whereas more sensitive approaches, for example, a rating scale or a detailed observation scheme, may be administered more frequently. Finally, it is worth pointing out that most teachers are in an advantageous position as regards opportunities to monitor learners’ progress longitudinally, collecting varied information about learner progress by virtue of engaging with them more often. Teachers can also gather more, and more varied, information about learners than most (external) researchers. Some such contextual information may help teachers to form plausible hypotheses about the reasons for particular learners’ struggles and ways to address them, which is difficult by analyzing only the information that scales and checklists can provide. Nature of diagnostic measures: Direct vs indirect

Another point to consider in diagnosing SFL development longitudinally is the nature of the diagnostic measures including their sensitivity and directness. We have already discussed different scales to monitor learners’ progress in writing, and hypothesized that the detailed analytic rating scales are more sensitive and can detect changes in SFL writing ability more easily than the broader general proficiency scales. The same argument can be presented for checklists or automated analyses of the presence or absence of specific features in learners’ writing, since they are similar to analytic scales in that they focus on specific aspects of learners’ texts. A common feature of scales, checklists, and automated tools is that they involve direct assessment of learners’ writing: they are typically applied to real texts written by learners. Therefore, such features of diagnostic writing tasks as their genre, complexity, and communicative function need to be considered when inferences are drawn from learners’ performances. Indirect writing tasks, too, can have a role in diagnosing SFL writing ability, as we argued in Chapter 5. However, there are at least two issues in using indirect measures to track changes in SFL writing ability besides the obvious gap that might exist in learners’ performance on such tasks and their ability to actually write. The challenges are to ensure that (1) the indirect measure is

282  Conclusions and ways forward

sensitive to learning and (2) that the improvement in test scores is caused by learning rather than repeated exposure to the items. Research on tests of science (Naumann et al., 2019), mathematics and English language arts (Polikoff, 2016) demonstrates that items in typical achievement tests vary considerably in their sensitivity to instruction and, hence, to learning. The multiple-​choice and other item types vary in what levels of detail they focus on and how complex phenomena they target, depending on how they are designed. This is one of the likely reasons why the effects of instruction –​and the resulting learning –​vary depending on the study and why some changes in learning are detectable quite rapidly and others only after a considerable period of time. Ways forward in diagnosing SFL writing

Before we conclude the present book, we would like to suggest some potentially useful directions for advancing SFL diagnosis. We will base them both on the themes emerging in this book and those that were not as prominently discussed but which could be useful to explore in future research and practice in SFL diagnosis. More longitudinal research. As we argued earlier in this chapter, there are very few longitudinal studies of SFL writing development (see Schoonen et al., 2011, for a rare example). There are, however, proposals for how SFL development can be studied in a more comprehensive exploratory manner. Barkaoui (2014) suggested researchers could make use of more complex statistical data analyses, such as multilevel modelling and especially latent growth curve modelling that allow for exploring the interrelatedness of changes in several variables. For a more in-​depth discussion of challenges in such longitudinal diagnosis, see, for example, Barkaoui (2017) or Naumann et al. (2019). Such approaches, but also more qualitatively oriented longitudinal inquiries, can help to model and establish individual and group-​level development trajectories. More research on the writing process. The vast majority of SFL writing research, as well as practical assessment, has focused on the writing products, but, as Chapter 6 illustrates, studies exist that shed light on what happens in the different stages of planning, text generation, and revision, and how learners might differ in this. However, more research on the relationship between writing processes and products should be useful for SFL writing diagnosis. There have been some interesting studies, for example, Deane (2014), who, using keystroke-​logging data and e-​rater essay scores of 3,592 students, found that the product and process factors correlated weakly to moderately with each other, or Barkaoui (2019), who explored the pausing behaviour of learners of different proficiency. However, studies looking deeper into the relationship between writing processes and products are needed.

Conclusions and ways forward  283

Combining automated text analysis and diagnosis. In Chapter 8, we outlined directions for future developments in automatic diagnosis of SFL writing. One further step in this direction can be intelligent computer-​ aided language learning (ICALL), which combines artificial intelligence, computational linguistics, and natural language processing. Methods developed in natural language processing (see Chapter 8) could compare certain aspects in learners’ performances –​for example, coherence, lexical variety, and syntactic subordination –​with the occurrence and quality of these aspects as expected according to existing theories of SFL proficiency and writing development. Once specific aspects are identified, diagnostic feedback on how to improve these can be given to learners and their teachers. Indeed, as Meurers (2019) noted, one benefit of ICALL systems is that they allow for immediate formative feedback and support of individual developmental trajectories. A promising project in this realm builds upon the ICALL system Revita https://​rev​ita. cs.helsi​n ki.fi/​ that exists for a range of languages, including German, Russian, and Finnish. Revita automatically analyses written texts that learners upload into the system. It then generates exercises for grammatical and syntactic categories that pose difficulties for learners, such as verb forms and concord in sentences. In its current state, Revita provides limited feedback to learners and does some tracking of their progress, taking into account text difficulty, based on lexical frequency, mean token length, and mean sentence length, learners’ ability based on their previous performance on the assessed aspects, and the relative complexity of specific aspects of language (Hou et al., 2019). A new project aims to merge the existing functionality of Revita with the principles of dynamic assessment (see https://r.jyu.fi/DDLANG_en). The goal is to gain deeper insights into the development of SFL writing in response to mediation. Combining the strengths of diagnostic and dynamic assessment. A direction for future research related more closely to diagnostic assessment per se is exploring the potential of merging two assessment frameworks prominent in the present book –​diagnostic and dynamic assessment. While they each rest on different theoretical bases, merging these frameworks has the potential to benefit both lines of research and related assessment practices. Diagnostic assessment could benefit from dialectical links between assessment and teaching that are so central to dynamic assessment as well as from the reconceptualization of assessment constructs as dynamic. Dynamic assessment, in turn, could be enriched by the construct specification and detailed analysis of learner strengths and weaknesses that diagnostic assessment allows for. The interest in working towards merging diagnostic and dynamic assessment frameworks into one coherent framework seems to be great in the research community, as evidenced in the symposium on this topic at AILA 2021 World Congress. Diagnosis of academic writing in SFL. One theme not prominently featured in the present book, but nevertheless important to address, is the

284  Conclusions and ways forward

diagnosis of academic language. Research on diagnosing academic language can move in two, partly overlapping directions. One of these can root in our discussion of the development of expertise in Chapter 2. The other direction can be informed by research on Content and Language Integrated Learning (CLIL), instruction in an additional language, such as English-​ Medium Instruction, and other types of bilingual education. There have been some recent influential publications on CLIL (e.g., Nikula et al., 2016) as well as those specifically discussing assessment in CLIL (e.g., deBoer & Leontjev, 2020; Quartapelle, 2012), but there have been, to the best of our knowledge, only a few publications with a specific focus on diagnostic assessment (e.g., Knoch, 2009b; 2011b; Xie & Lei, 2022). Diagnosis in integrated writing tasks. As integrated writing tasks are becoming more common in language testing, particularly in the academic field (e.g., Cumming, 2013a), they have diagnostic potential at the crossroads of reading and writing. While current approaches in reporting students’ abilities on integrated tasks tend to report them under the notion of writing, it would be necessary to develop appropriately sensitive tools to capture both reading and writing abilities, as both contribute to the construct (e.g., Gebril & Plakans, 2013; Knoch & Sitajalabhorn, 2013). Although some recent studies contribute towards clarifying valid assessment criteria (Gebril & Plakans, 2013; Plakans & Gebril, 2017; Plakans et al., 2019), no diagnostic rating scale for integrated tasks has been published yet. There are some promising projects under way, for example, a project on modelling integrated academic-​linguistic skills by means of innovative rating scales in combination with automated analyses of the text products, that are to assist in determining their semantic and linguistic closeness to the source texts (Harsch & Hartig, n.d.). This way, it may be possible to link the diagnosis of reading with that of writing (see also recent work on multimodal writing; Yi et al., 2020, and other articles in the special issue of Journal of Second Language Writing, volume 47, 2020). Conclusions

In the introductory chapter, we noted that while diagnostic SFL assessment is an increasingly active area of research and development, more comprehensive accounts of diagnosing SFL writing have been lacking. The present book aims to fill this gap and, thus, provide a companion volume to the book on diagnosing SFL reading (Alderson, Haapakangas et al., 2015). We conclude this book with certain general points concerning the diagnosis of SFL skills, some of which were also presented by Alderson, Haapakangas et al. (2015), as their conclusions are equally valid to writing. First, it should be remembered that weaknesses in writing may be due to a number of different factors and their interaction. Different causes of weaknesses,

Conclusions and ways forward  285

such as disability, lack of a particular skill or motivation, unsuitable learning, teaching or assessment practices, are likely to be based on different factors and might benefit from different activities and interventions. Identifying the reasons for weaknesses can be very difficult and some causes can be beyond what a teacher can influence, for example, problems related to the learner’s background. However, teachers need to be aware of the bigger picture even if they can address only some of the weaknesses in learners’ abilities through teaching. Furthermore, a teacher may not need to try to tackle the issues alone as in many contexts it is possible to consult other subject matter teachers, special education teachers, and school psychologists to figure out the causes of particular problems. Second, it is important to ensure that diagnostic inferences are based on sufficient evidence obtained from a sufficient number of assessment opportunities (see Hill & McNamara, 2012; Hill, 2017), be they classroom observations, pieces of homework, or test items. Diagnostic assessments often focus on narrowly defined aspects of proficiency and, therefore, tests that cover a wider range can easily become long and impractical (e.g., Hughes, 1989). The same challenge of obtaining a reasonably reliable picture of a learner’s performance across the different aspects of writing applies to the diagnosis by the teacher in the classroom. Therefore, as discussed earlier, a longitudinal approach to diagnosis is useful as it allows for splitting assessment into manageable parts and administering them over time to better cover the knowledge and skills of interest. This also allows for covering the domain of writing more comprehensively in terms of genres and text types (see Bouwer et al., 2015; In’nami & Koizumi, 2016; Schoonen, 2012). Furthermore, this allows for a wider range of diagnostic tools and approaches to be applied, which is likely to strengthen the validity of diagnostic evidence. Finally, useful diagnosis is cyclical in nature rather than a one-​off event, as is depicted in Figure 10.1 above. A well-​g rounded understanding of the nature and development of the SFL writing ability, as well as the needs of the particular learners, are a necessary cornerstone of useful diagnosis. This understanding then needs to be operationalized as appropriate, practical diagnostic assessment instruments or approaches that are used to collect enough information of sufficient quality from the learners. The diagnoser needs to interpret the collected information and turn it into intelligible feedback and advice to the learners, and propose appropriate action to address the identified weaknesses. An analysis of the effectiveness of that action then feeds into the next cycle that builds on the experiences from the first, complements it, and takes the next steps in improving learners’ writing ability. Alderson, Haapakangas et al. (2015), at the end of their book on SFL reading, expressed the hope that their book would contribute to the diagnosis of other language skills. The motivation to write this book was in part to fulfil their

286  Conclusions and ways forward

hope. What helped our task was the fact that SFL writing has been researched more widely than reading has been. This also meant that we had to be selective in our coverage, which is bound to affect both the breadth and depth of the issues. Nevertheless, we hope that the current book contributes to a deeper understanding of what the diagnosis of SFL writing entails and what kind of tools and approaches have already been developed for this purpose. Yet, more research is needed, and we have suggested some potentially useful lines of study. Such research is likely to benefit not only diagnostic and other language assessment but also applied linguistics more generally.

REFERENCES

Abbott, R., & Berninger, V. (1993). Structural equation modeling of relationships among developmental skills and writing skills in primary-​and intermediate-​g rade writers. Journal of Educational Psychology, 85(3), 478‒508. https://​doi.org/​10.1037/​ 0022-​0663.85.3.478 Abdel Latif, M. (2013). What do you mean by writing fluency and how can it be validly measured? Applied Linguistics, 34(1), 99–​105. https://​doi.org/​10.1093/​app​l in/​a ms​073 ACTFL American Council on the Teaching of Foreign Languages. (1982). ACTFL Language proficiency projects. The Modern Language Journal, 66(2), 179. https://​doi. org/​10.1111/​j.1540-​4781.1982.tb06​978.x ACTFL American Council on the Teaching of Foreign Languages. (2012). ACTFL proficiency guidelines. American Council on the Teaching of Foreign Languages. www.actfl.org/​sites/​defa​u lt/​fi les/​g ui​deli​nes/​ACT​F LPr​ofic​ienc ​yGui​deli​nes2​012.pdf Ai, H., & Lu. X. (2013). A corpus-​ based comparison of syntactic complexity in NNS and NS university students’ writing. In N. Díaz-​Negrillo, A. Ballier, & P. Thompson (Eds.), Automatic treatment and analysis of learner corpus data (pp. 249–​264). John Benjamins. https://​doi.org/​10.1075/​scl.59.15ai Alamargot, D., & Chanquoy, L. (2001). Through the models of writing (Vol. 9). Kluwer Academic Publishers. Alamargot, D., Dansac, C., Chesnet, D., & Fayol, M. (2007). Parallel processing before and after pauses: A combined analysis of graphomotor and eye movements during procedural text production. In M. Torrance, L. van Waes & D. Galbraith (Eds.), Writing and cognition: Research and applications (pp. 11–​29). Brill. Alanen, R., & Kalaja, P. (2010, March 6–​9). The emergence of L2 English questions across CEFR proficiency levels [Paper presentation]. AAAL, Atlanta, USA. Albrechtsen, D., Haastrup, K., & Henriksen, B. (2008). Vocabulary and writing in a first and second language. Processes and development. Palgrave Macmillan.

288 References

Alderson, J. C. (1991). Bands and scores. In J. C. Alderson & B. North (Eds.). Language testing in the 1990s: The communicative legacy (pp. 71–​86). Modern English Publications, in association with Macmillan. Alderson, J.C. (2005). Diagnosing foreign language proficiency: The interface between learning and assessment. Continuum. Alderson, J.C. (2007a). The CEFR and the need for more research. The Modern Language Journal, 91(4), 659–​663. https://​doi.org/​10.1111/​j.1540-​4781.2007.0062​7_​4.x Alderson, J. C. (2007b). The challenge of (diagnostic) testing: Do we know what we are measuring? In J. Fox, M. Wesche, D. Bayliss, L. Cheng, C. Turner, & C. Doe (Eds.), Language testing reconsidered (pp. 21–​39). University of Ottawa Press. Alderson, J. C., Brunfaut, T., & Harding, L. (2015). Towards a theory of diagnosis in second and foreign language assessment: Insights from professional practice across diverse fields. Applied Linguistics, 36(2), 236–​260. https://​doi.org/​10.1093/​app​l in/​ amt​046 Alderson, J. C., Haapakangas, E.-​L ., Huhta, A., Nieminen, L., & Ullakonoja, R. (2015). The diagnosis of reading in a second foreign language. Routledge. Alderson, J. C., & Huhta, A. (2005). The development of a suite of computer-​based diagnostic tests based on the Common European Framework. Language Testing, 22(3), 301–​320. https://​doi.org/​10.1191/​026553​2205​lt31​0oa Alderson, J. C., & Huhta, A. (2011). Can research into the diagnostic testing of reading in a second or foreign language contribute to SLA research? EUROSLA Yearbook, 11, 30–​52. https://​doi.org/​10.1075/​euro​sla.11.04ald Alderson, J. C., Huhta, A., & Nieminen, L. (2016). Characteristics of weak and strong readers in a foreign language. The Modern Language Journal, 100(4), 853–​879. https://​ doi.org/​10.1111/​modl.12367 Alderson, J. C., Figueras, N., Kuijper, H., Nold, G., Takala, S., & Tardieu, C. (2009). Analysing tests of reading and listening in relation to the Common European Framework of Reference: The experience of The Dutch CEFR Construct Project. Language Assessment Quarterly, 3(1), 3–​30. https://​doi.org/​10.1207/​s15​4343​11la​ q030​1_​2 Alexander, P. (2003). The development of expertise: The journey from acclimation to proficiency. Educational Researcher, 32(8), 10–​14. www.jstor.org/​sta​ble/​3700 ​080 Alexopoulou, T., Michel, M., Murakami, A., & Meurers, D. (2017). Task effects on linguistic complexity and accuracy: A large-scale learner corpus analysis employing natural language processing techniques. Language Learning, 67(S1), 180–​208. https://​ doi.org/​10.1111/​lang.12232 Aljaafreh, A., & Lantolf, J. (1994). Negative feedback as regulation and second language learning in the Zone of Proximal Development. The Modern Language Journal, 78(4), 465–​483. www.jstor.org/​sta​ble/​328​585 Allaw, E. (2019). A learner corpus analysis: Effects of task complexity, task type, and L1 & L2 similarity on propositional and linguistic complexity. International Review of Applied Linguistics in Language Teaching, 59(4), 569–​604. https://​doi.org/​10.1515/​ iral-​2018-​0294 Alp, P, Jürgens, V., Kanne, P., Kasuri M., Liiv, S., Lind, A, Mere, K., Türk, Ü., Sõstar, K., & Tender, T. (2007). European Language Portfolio: Accredited model No. 93.2007. Model for lower-​secondary learners aged 12 to 16. www.coe.int/​en/​web/​portfo​l io/​acc​redi​ ted-​a nd-​reg ​iste​red-​mod​els-​by-​cou​ntry​mode​les-​acc​redi​tes-​ou-​enre​g ist​res-​par-​pays

References  289

ALTE Association of Language Testers in Europe. (1998). Multilingual glossary of language testing terms (Vol. 6). Cambridge University Press. Alves, R., Castro, S. L., Sousa, L., & Strömqvist, S. (2007). Influence of typing skill on pause-​execution cycles in written composition. In M. Torrance, L. van Waes & D. Galbraith (Eds.), Writing and cognition: Research and applications (pp. 55–​65). Elsevier. Antonsson, M., Johansson, C., Hartelius, L., Henriksson, I., Longoni, F., & Wengelin, Å. (2018). Writing fluency in patients with low-​grade glioma before and after surgery. International Journal of Language and Communication Disorders, 53(3), 592–​604. doi: https://​doi.org/​10.1111/​1460-​6984.12373 Applebee, A. (2000). Alternative models of writing development. In R. Indrisano & J. Squire (Eds.), Perspectives on writing: Research, theory, and practice (pp. 90–111). Taylor & Francis. Aram, D. (2010). Writing with young children: A comparison of paternal and maternal guidance. Journal of Research in Reading, 33(1), 4–​19. https://​doi.org/​10.1111/​ j.1467-​9817.2009.01429.x Arnold, S., Chirico, R., & Liebscher, D. (2012). Goldgräber oder Eichhörnchen: Welcher Schreibertyp sind Sie? JoSch –​Journal der Schreibberatung, 4, 82–​97. Aryadoust, V., & Liu, S. (2015). Predicting EFL writing ability from levels of mental representation measured by Coh-​Metrix: A structural equation modelling study. Assessing Writing, 24, 35–​58. https://​doi.org/​10.1016/​j.asw.2015.03.001 Attali, Y. (2004). Exploring the feedback and revision features of Criterion. National Council on Measurement in Education (NCME) & Educational Testing Service, Princeton, NJ. Attali, Y., Powers, D., Freedman, M., Harrison, M., & Obetz, S. (2008). Automated scoring of short-​answer open-​ended GRE subject test items. ETS GRE Board Research Report No. 04-​02, ETS RR-​08-​20. Educational Testing Service. https://​ online​l ibr​a ry.wiley.com/​doi/​pdfdir​ect/​10.1002/​j.2333- ​8504.2008.tb02​106.x Baba, K. (2009). Aspects of lexical proficiency in writing summaries in a foreign language. Journal of Second Language Writing, 18(3), 191–​208. https://​doi.org/​10.1016/​ j.jslw.2009.05.003 Bachman, L. & Cohen, A. (1998). Language testing-​SLA interfaces: An update. In L. Bachman & A. Cohen (Eds.) Interfaces between second language acquisition and language testing research (pp. 1–​31). Cambridge University Press. Bachman, L., & Palmer, A. (2010). Language assessment in practice. Oxford University Press. Baddeley, A. (1986). Working memory. Oxford University Press. Baker, B., & Riches, C. (2018). The development of EFL examinations in Haiti: Collaboration and language assessment literacy development. Language Testing 35(4), 557–​581. https://​doi.org/​10.1177/​02655​3221​7716​732 Ballantine, J., McCourt Larres, P., & Whittington, M. (2003). Evaluating the validity of self-​a ssessment: Measuring computer literacy among entry-​level undergraduates within accounting degree programmes at two UK universities. Accounting Education, 12(2), 97–​112. https://​doi.org/​10.1080/​0963​9280​3200 ​0 091​729 Banerjee, J., Franceschina, F., & Smith, A. (2007). Documenting features of written language production typical of different IELTS band score levels. IELTS Research Reports, Vol. 7. IELTS Australia and British Council. Available at www.ielts.org/​-​/​ media/​resea​rch-​repo​r ts/​ielts_ ​r r_​v​olum​e 07_ ​repo​r t5.ashx

290 References

Banerjee, J., Yan, X., Chapman, M., & Elliot, H. (2015). Keeping up with the times: Revising and refreshing a rating scale. Assessing Writing, 26, 5–​19. https://​doi. org/​10.1016/​j.asw.2015.07.001 Barkaoui, K. (2010a). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly, 7, 54–​74. https://​doi.org/​ 10.1080/​154343​0 090​3464​418 Barkaoui, K. (2010b). Do ESL essay raters’ evaluation criteria change with experience? A mixed-​methods, cross-​sectional study. TESOL Quarterly, 44, 31–​57. Barkaoui, K. (2010c) Think-​a loud protocols in research on essay rating: An empirical study of their veridicality and reactivity. Language Testing 28(1), 51–​75. https://​doi. org/​10.1177/​02655​3221​0376​379 Barkaoui, K. (2011). Effects of marking method and rater experience on ESL essay scores and rater performance. Assessment in Education: Principles, Policy & Practice, 18(3), 279–​293. https://​doi.org/​10.1080/​09695​94x.2010.526​585 Barkaoui, K. (2013). Examining the impact of L2 proficiency and keyboarding skills on scores on TOEFL-​i BT writing tasks. Language Testing 31(2), 241–​259. https://​doi. org/​10.1177/​02655​3221​3509​810 Barkaoui, K. (2014). Quantitative approaches to analyzing longitudinal data in second-​ language research. Annual Review of Applied Linguistics, 34, 65–​101. https://​doi.org/​ 10.1017/​s02671​9051​4000​105 Barkaoui, K. (2017). Examining repeaters’ performance on L2 proficiency tests: A review and a call for research. Language Assessment Quarterly 14(4), 420–​431. https://​ doi.org/​10.1080/​15434​303.2017.1347​790 Barkaoui, K. (2019). Examining sources of variability in repeaters’ L2 writing scores: The case of the PTE Academic writing section. Language Testing, 36(1), 3–​25. https://​doi.org/​10.1177/​02655​3221​7750​692 Barkaoui, K., & Knouzi, I. (2018). The effects of writing mode and computer ability on L2 test-​takers’ essay characteristics and scores. Assessing Writing, 36, 19–​31. https://​ doi.org/​10.1016/​j.asw.2018.02.005 Bartning, I., Martin, M., & Vedder, I. (2010). Communicative proficiency and linguistic development: Intersections between SLA and language testing research. Eurosla Monograph Series 1. European Second Language Association. http://​euro​sla.org/​mon​ogra​phs/​ EM01/​EM01h​ome.html Beare, S., & Bourdages, J. (2010). Skilled writers’ generating strategies in L1 and L2: An exploratory study. In M. Torrance, L. Van Waes & D. Galbraith (Eds.), Writing and cognition: Research and applications (pp. 151–​161). Elsevier. Beaufort, A. (1999). Writing in the real world. Teachers College Press. Beaufort, A. (2000). Learning the trade: A social apprenticeship model for gaining writing expertise. Written Communication, 17(2), 185–​223. https://​doi.org/​ 10.1177%2F0741​0883​0 001​7002​0 02 Beaufort, A. (2007). College writing and beyond: A new framework for university writing instruction. Utah State University Press. https://​doi.org/​10.2307/​j.ctt4cg ​n k0 Beaufort, A. (2012). College writing and beyond: Five years later. Composition Forum, 26, 1–​13. Bennett, R. (2003). Online assessment and the comparability of score meaning. Research Memorandum RM-​ 03-​ 05. Educational Testing Service. www.ets.org/​Media/​ Resea​rch/​pdf/​R M-​03-​05-​Benn​ett.pdf

References  291

Bennett, R. (2011) Formative assessment: A critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–​25. https://​doi.org/10.1080/​09695​ 94X.2010.513​678​ Bennett, R., & Gitomer, D. (2009). Transforming K-​ 12 assessment: Integrating accountability testing, formative assessment and professional support. In J. Cumming & C. Wyatt-​Smith (Eds.), Educational assessment in the 21st century: Connecting theory and practice (pp. 43–​61). Springer. https://​doi.org/​10.1007/​978-​1- ​4020-​9964-​9_ ​3 Bereiter, C. (1980): Development in writing. In L. Gregg & E. Steinberg (Eds.), Cognitive processes in writing (pp. 73–​93). Lawrence Erlbaum Associates. Bereiter, C., & Scardamalia, M. (1987). The psychology of written composition. Lawrence Erlbaum Associates. Berninger, V., Abbott, R., Abbott, S., Graham, S., & Richards, T. (2002). Writing and reading: Connections between language by hand and language by eye. Journal of Learning Disabilities, 35(1), 39‒56. https://​doi.org/​10.1177/​0 02​2219​4020​3500​104 Berninger, V., Nielsen, K., Abbott R., Wijsman, E., & Raskind, W. (2008). Writing problems in developmental dyslexia: Under-​recognized and under-​t reated. Journal of School Psychology, 46(1), 1‒21. https://​doi.org/​10.1016/​j.jsp.2006.11.008 Biber, D., Nekrasova, T., & Horn, B. (2011). The effectiveness of feedback for L1-​ English and L2-​w riting development: A meta-​a nalysis. ETS Research Report Series, 2011, i-​99. https://​doi.org/​10.1002/​j.2333-​8504.2011.tb02​241.x Bitchener, J. (2017). Why some L2 learners fail to benefit from written corrective feedback. In H. Nassaji & E. Kartchava (Eds.), Corrective feedback in second language teaching and learning: Research, theory, applications, implications (pp. 129–​140). Routledge. Bitchener, J., & Storch, N. (2016). Written corrective feedback for L2 development. Multilingual Matters. Blackhurst, A. (2005). Perspectives on applications of technology in the field of learning disabilities. Learning Disability Quarterly, 28(2), 175–​178. https://​doi.org/​10.2307/​ 1593​622 Black, P., & Wiliam, D. (2018). Classroom assessment and pedagogy. Assessment in Education: Principles, Policy & Practice, 25(6), 551–​575. https://​doi.org/​10.1080/​09695​ 94X.2018.1441​807 Börner, W. (1989): Didaktik schriftlicher Textproduktion in der Fremdsprache. In G. Antons & H. Krings (Eds.), Textproduktion. Ein interdisziplinärer Forschungsüberblick (pp. 348–​376). Niemeyer. Bourdin, B., & Fayol, M. (1994). Is written language production more difficult than oral language production? A working memory approach. International Journal of Psychology, 29(5), 591‒620. https://​doi.org/​10.1080/​0 02075​9940​8248​175 Bourke, L., & Adams, A.-​M. (2010). Cognitive constraints and the early learning goals of writing. Journal of Research of Reading, 33(1), 94‒110. https://​doi.org/​10.1111/​ j.1467-​9817.2009.01434.x Bouwer, R., Béguin, A., Sanders, T., & van den Bergh, H. (2015). Effect of genre on the generalizability of writing scores. Language Testing, 32(1), 83–​100. https://​doi. org/​10.1177/​02655​3221​4542​994 Breetvelt, I., van den Bergh, H., & Rijlaarsdam, G. (1994). Relations between writing processes and text quality: When and how? Cognition and Instruction, 12, 103–​123. https://​doi.org/​10.1207/​s15326​90xc​i120​2 _ ​2

292 References

Breland, H., Lee, Y.-​W., & Muraki, E. (2004). Comparability of TOEFL CBT writing prompts: Response mode analysis. ETS Research Report Series, 2004, i-​39. https://​ doi.org/​10.1002/​j.2333-​8504.2004.tb01​950.x Brindley, G. (1998). Describing language development? Rating scales and SLA. In L. Bachman & A. Cohen (Eds.), Interfaces between second language acquisition and language testing research (pp. 112–​140). Cambridge University Press. Brindley, G. (Ed.) (2000). Studies in immigrant English language assessment (Vol 1). National Centre for English Language Teaching and Research, Macquarie University. Brindley, G. (2001). Investigating rater consistency in competency-​based language assessment. In G. Burrows (Ed.), Studies in immigrant English language assessment, Vol. 2 (pp. 59–​80). National Centre for English Language Teaching and Research, Macquarie University. Bronfenbrenner, U. (1979). The ecology of human development. Harvard University Press. Bronfenbrenner, U., & Morris, P. (2006). The bioecological model of human development. In R. Lerner & W. Damon (Eds.), Theoretical models of human development (5th ed., pp. 793–​828). Wiley. Brown, G., Anderson, A., Shillcock, R., & Yule, G. (1984). Teaching talk: Strategies for production and assessment. Cambridge University Press. Browne, C., Culligan, B., & Phillips, J. (2013). The New General Service List. www. newgen​eral​serv ​icel ​ist.org/​ Brunfaut, T., & Harding, L. (2018). Teachers setting the assessment (literacy) agenda: A case study of a teacher-​led national test development project in Luxembourg. In D. Xerri & P. Vella Briffa (Eds.), Teacher involvement in high stakes language testing (pp. 155–​172). Springer. Burke, J., & Cizek, G. (2006). Effects of composition mode and self-​perceived computer skills on essay scores of sixth graders. Assessing Writing, 11, 148–​166. https://​doi.org/​ 10.1016/​j.asw.2006.11.003 Burrows, G. (Ed.) (2001). Studies in immigrant English language assessment, Vol. 2. National Centre for English Language Teaching and Research, Macquarie University. Burstein, J. (2013). Automated essay evaluation and scoring. In C. Chapelle (Ed.), The encyclopedia of applied linguistics. https://​doi.org/​10.1002/​978140​5198​431.wbeal1​046 Burstein, J., Braden-​Harder, L., Chodorow, M., Hua, S., Kaplan, B., Kukich, K., Lu, C., Nolan, J., Rock, D., & Wolff, S. (1998). Computer analysis of essay content for automated score prediction: A prototype automated scoring system for GMAT analytical writing assessment essays. ETS Research Report Series, 1998(1), i–​67. https://​doi.org/​10.1002/​j.2333-​8504.1998.tb01​764.x Burstein, J., Chodorow, M., & Leacock, C. (2004). Automated essay evaluation: The Criterion online writing service. AI Magazine, 25(3), 27. https://​ojs.aaai.org/​a im​ agaz​i ne/​i ndex.php/​a im​agaz​i ne/​a rti​cle/​v iew/​1774 Burstein, J., Riordan, B., & McCaffey, D. (2020). Expanding automated writing evaluation. In D. Yan, A. Rupp & P. Foltz (Eds.), Handbook of automated scoring. Theory into practice, (pp. 329–​346). CRC Press. Butler, D., & Winne, P. (1995). Feedback and self-​regulated learning: A theoretical synthesis. Review of Educational Research, 65(3), 245–​281. https://​doi.org/​10.2307/​ 1170​684 Cahill, A., & Evanini, K. (2020). Natural language processing for writing and speaking. In D. Yan, A. Rupp & P. Foltz (Eds.), Handbook of automated scoring. Theory into practice, (pp. 69–​92). CRC Press.

References  293

Camp, H. (2012). The psychology of writing development ‒ And its implications for assessment. Assessing Writing, 17(2), 92–​105. https://​doi.org/​10.1016/​ j.asw.2012.01.002 Caravolas, M. (2004). Spelling development in alphabetic writing systems: A cross-​ linguistic perspective. European Psychologist, 9(1), 3‒14. https://​doi.org/​10.1027/​ 1016-​9040.9.1.3 Caravolas, M., Hulme, C., & Snowling, M. J. (2001). The foundations of spelling ability: Evidence from a 3-​year longitudinal study. Journal of Memory and Language, 45(4), 751–​774. https://​doi.org/​10.1006/​jmla.2000.2785 Carlsen, C. (2010). Discourse connectives across CEFR-​levels: A corpus based study. In I. Bartning, M. Martin & I. Vedder (Eds.), Communicative proficiency and linguistic development: Intersection between SLA and language testing research. Eurosla Monograph Series 1, (pp. 191–​210). European Second Language Association. http://​euro​sla.org/​ mon​ogra​phs/​EM01/​EM01​tot.pdf Carter, M. (1990). The idea of expertise: An exploration of cognitive and social dimensions of writing. College Composition and Communication, 41(3), 265–​86. https://​doi.org/​10.2307/​357​655 Chambers, L. (2008). Computer-​ based and paper-​ based writing assessment: A comparative text analysis. Research Notes, 34, 9–​15. Chan, C., & Davison, C. (2020). Learning from each other: School-​ u niversity collaborative action research as praxis. In M. Poehner & O. Inbar-​L ourie (Eds.). Towards a reconceptualization of second language classroom assessment. Praxis and researcher-​ teacher partnership (pp. 129–​149). Springer. Chan, S., Bax, S., & Weir, C. (2018). Researching the comparability of paper-​based and computer-​based delivery in a high-​stakes writing test. Assessing Writing, 36, 32–​48. https://​doi.org/​10.1016/​j.asw.2018.03.008 Chang, C. (2015) Teacher modeling on EFL reviewers’ audience-​aware feedback and affectivity in L2 peer review. Assessing Writing, 25, 2–​21.https://​doi.org/​10.1016/​ j.asw.2015.04.001 Chapelle, C., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, 32(3), 385–​405. https://​doi. org/​10.1177/​02655​3221​4565​386 Chapelle, C., & Douglas, D. (2006). Assessing language through computer technology. Cambridge University Press. Chen, T. (2016). Technology-​supported peer feedback in ESL/​EFL writing classes: A research synthesis. Computer Assisted Language Learning, 29(2), 365–​397. https://​doi. org/​10.1080/​09588​221.2014.960​942 Cheng, A. (2007). Transferring generic features and recontextualizing genre awareness: Understanding writing performance in the ESP genre-​based literacy framework. English for Specific Purposes, 26, 287–​307. https://​doi.org/​10.1016/​ j.esp.2006.12.002 Cheng, K.-​H., Liang, J.-​C., & Tsai, C.-​C. (2015). Examining the role of feedback messages in undergraduate students’ writing performance during an online peer assessment activity. The Internet and Higher Education, 25, 78–​84. https://​doi.org/​ 10.1016/​J.IHE​DUC.2015.02.001 Chenoweth, N., & Hayes, J. (2001). Fluency in writing. Generating text in L1 and L2. Written Communication, 18(1), 80–​98. https://​doi.org/​10.1177/​0741​0883​0101​8001​0 04

294 References

Christie, F. (2012). Language education through the school years: A functional perspective. Wiley Blackwell. Cohen, A. (1994). Assessing language abilities in the classroom. Heinle & Heinle. Connelly, V., Gee, D., & Walsh, E. (2007). A comparison of keyboarded and written compositions and the relationship with transcription speed. The British Journal of Educational Psychology, 77, 479–​492. https://​doi.org/​10.1348/​0 0070​9906​X116​768 Cotos, E. (2014). Genre-​based automated writing evaluation for L2 research writing: From design to evaluation and enhancement. Springer. Corder H. P. (1967/​1981). The significance of learners’ errors. Reprinted in: Corder, S.P., Error analysis and interlanguage (pp. 5–​13). Oxford University Press. Cordingley, P., Bell, M., Thomason, S., & Firth, A. (2005).The impact of collaborative CPD on classroom teaching and learning–​Review: What do teacher impact data tell us about collaborative CPD? Social Science Research Unit, Institute of Education, University of London. Council of Europe. (2001). The Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge University Press. Council of Europe. (2009). Relating language examinations to the Common European Framework of Reference for Languages: Learning, teaching, assessment (CEFR). A manual. Council of Europe Language Policy Division. www.coe.int/​en/​web/​com​ mon-​europ​ean-​f ramew​ork-​refere​nce-​langua​ges/​relat​i ng-​exami​nati​ons-​to-​the-​cefr Council of Europe. (2011). Manual for language test development and examining. For use with the CEFR. www.coe.int/​en/​web/​com ​mon-​europ​ean-​f ramew​ork-​refere​ nce-​langua​ges/​dev​elop​i ng-​tests-​examin​i ng Council of Europe. (2020). Common European Framework of Reference for Languages: Learning, teaching, assessment–​ Companion volume. Council of Europe Publishing. www.coe.int/​lang-​cefr Council of Writing Program Administrators, National Council of Teachers of English & National Writing Project. (2011). Framework for success in postsecondary writing. Urbana, IL. https://​lead.nwp.org/​w p-​cont​ent/​uplo​ads/​2017/​03/​Framework_ ​For_​ Succe​ss_​i​n _​Po​stse​cond​a ry_​Writ​i ng.pdf Coxhead, A. (2000). A New Academic Word List. TESOL Quarterly, 34(2), 213–​238. https://​doi.org/​10.2307/​3587​951 Crossley, S., Kyle, K., & Dascalu, M. (2019). The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap. Behavior Research Methods, 51, 14–​27. https://​doi.org/​10.3758/​s13​428-​018-​1142-​4 Crossley, S., Kyle, K., & McNamara, D. (2016). The development and use of cohesive devices in L2 writing and their relations to judgments of essay quality. Journal of Second Language Writing, 32, 1–​16. https://​doi.org/​10.1016/​j.jslw.2016.01.003 Crossley, S., Kyle, K., & McNamara, D. (2017). Sentiment Analysis and Social Cognition Engine (SEANCE): An automatic tool for sentiment, social cognition, and social order analysis. Behavior Research Methods 49(3), 803–​821. https://​doi.org/​ 10.3758/​s13​428-​016-​0743-​z Crossley, S., & McNamara, D. (2009). Computational assessment of lexical differences in L1 and L2 writing. Journal of Second Language Writing, 18(2), 119–​135. https://​doi. org/​10.1016/​j.jslw.2009.02.002 Crossley, S., & McNamara, D. (2011). Understanding expert ratings of essay quality: Coh-​Metrix analyses of first and second language writing. International

References  295

Journal of Continuing Engineering Education and Lifelong Learning, 21, 170–​191. https://​ doi.org/​10.1504/​ijce​ell.2011.040​197 Crossley, S., & McNamara, D. (2014). Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing, 26, 66–​79. https://​doi.org/​10.1016/​j.jslw.2014.09.006 Crystal, D. (2006). Language and the internet. Cambridge University Press. CTE College voor Toetsen en Examens. (2017, February 8–​10). International symposium ‘Diagnostic testing in education’. Utrecht, The Netherlands. www.exa​menb​lad.nl/​ onderw​erp/​i ntern​atio​nal-​sympos​ium-​d ia​g nos​t ic/​2017 Cumming, A. (1989). Writing expertise and second-language proficiency. Language Learning, 39, 81–​135. https://​doi.org/​10.1111/​j.1467-​1770.1989.tb00​592.x Cumming, A. (1990). Expertise in evaluating second language compositions. Language Testing, 7, 31–​51. https://​doi.org/​10.1177/​026​5532​2900 ​0700​104 Cumming, A. (Ed.). (2006). Goals for academic writing: ESL students and their instructors. John Benjamins. Cumming, A. (2009). Language assessment in education: Tests, curricula and teaching. Annual Review of Applied Linguistics, 29, 90–​100. https://​doi.org/​10.1017/​s02671​9050​ 9090​084 Cumming, A. (2013a). Assessing integrated writing tasks for academic purposes: Promises and perils. Language Assessment Quarterly, 10(1), 1–​8. https://​doi.org/​10.1080/​15434​ 303.2011.622​016 Cumming, A. (2013b). Multiple dimensions of academic language and literacy development. Language Learning, 63(Suppl. 1), 130–​152. https://​doi.org/​10.1111/​ j.1467-​9922.2012.00741.x Cumming, A., Kantor, R., & Powers, D.E. 2002. Decision making while rating ESL/​ EFL writing tasks: A descriptive framework. The Modern Language Journal 86(1), 67–​96. https://​doi.org/​10.1111/​1540-​4781.00137 Cumming, A., & So, S. (1996). Tutoring second language text revision. Does approach to instruction or the language of communication make a difference? Journal of Second Language Writing, 5, 197–​2 28. https://​doi.org/​10.1016/​ s1060-​3743(96)90002- ​8 Dai, J., Raine, R., Roscoe, R., Cai, Z., & McNamara, D. (2011). The Writing-​Pal tutoring system: Development and design. Journal of Engineering and Computer Innovations, 2(1), 1–​11. https://​doi.org/​10.1016/​j.comp​com.2014.09.002 Davies, A. (2008). Textbook trends in teaching language testing. Language Testing, 25(3), 327–​347. https://​doi.org/​10.1177/​02655​3220​8090​156 Davies, A., & Elder, C. (2005). Validity and validation in language testing. In E Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 795–​ 814). Lawrence Erlbaum. Davison, C., & Leung, C. (2009). Current issues in English language teacher-​based assessment. TESOL Quarterly, 43(3), 393–​415. https://​doi.org/​10.1002/​j.1545-​ 7249.2009.tb00​242.x De Angelis, G., & Jessner, U. (2012). Writing across languages in a bilingual context: A dynamic systems theory approach. In R. Machón (Ed.), L2 writing development: Multiple perspectives (pp. 47–​68). De Gruyter. Deane, P. (2013a). On the relation between automated essay scoring and modern views of the writing construct. Assessing Writing, 18, 7–​24. https://​doi.org/​10.1016/​ j.asw.2012.10.002

296 References

Deane, P. (2013b). Covering the construct: An approach to automated essay scoring motivated by a socio-​cognitive framework for defining literacy skills. In M. Shermis & J. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 298–​312). Routledge. Deane, P. (2014). Using writing process and product features to assess writing quality and explore how those features relate to other literacy tasks. ETS Research Report Series, 2014(1), 1–​23. https://​doi.org/​10.1002/​ets2.12002 Deane, P., Odendahl, N., Quinlan, T., Fowles, M., Welsh, C., & Bivens-​Tatum, J. (2008). Cognitive models of writing: Writing proficiency as a complex integrated skill. ETS Research Report Series, i–​368. https://​doi.org/​10.1002/​j.2333-​8504.2008. tb02​141.x Deane, P., & Zhang, M. (2015), Exploring the feasibility of using writing process features to assess text production skills. ETS Research Report Series, 1–​16. https://​doi. org/​doi:10.1002/​ets2.12071 deBoer, M., & Leontjev, D. (Eds.) (2020). Assessment and learning in Content and Language Integrated Learning (CLIL) classrooms. Springer. De Bot, K., Lowie, W., & Verspoor, M. (2007). A Dynamic Systems Theory approach to second language acquisition. Bilingualism: Language and Cognition, 10(1), 7–​21. http://​doi.org/​10.1017/​S13667​2890​6002​732 de Jong, J. (2004, May 14). What is the role of the Common European Framework of Reference for Languages: Learning, teaching, assessment? [Paper presentation] EALTA, Kranjska Gora, Slovenia. www.ealta.eu.org Dencla, M., & Rudel, R. (1974). Rapid ‘automatized’ naming of picture objects, colors, letters and numbers by normal children. Cortex, 10(2), 186–​202. https://​doi.org/​ 10.1016/​s0010-​9452(74)80009-​2 Depalma, M.-​J., & Ringer, J. (2011). Toward a theory of adaptive transfer: Expanding disciplinary discussions of “transfer” in second-​language writing and composition studies. Journal of Second Language Writing, 20, 134–​147. http://​doi.org/​10.1016/​ j.jslw.2011.02.003 Depalma, M.-​ J., & Ringer, J. (2013). Adaptive transfer, genre knowledge, and implications for research and pedagogy: A response. Journal of Second Language Writing, 22, 465–​470. http://​doi.org/​10.1016/​j.jslw.2013.09.002 Dikli, S. (2006). An overview of automated scoring of essays. Journal of Technology, Learning, and Assessment, 5(1). https://​ejourn​a ls.bc.edu/​i ndex.php/​jtla/​a rti​cle/​ view/​1640 Dikli, S., & Bleyle, S. (2014). Automated essay scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22, 1–​17. https://​doi.org/​10.1016/​J.ASW.2014.03.006 Doe, C. (2014). Diagnostic English Language Needs Assessment (DELNA). Test review. Language Testing, 1(4), 537–​543. https://​doi.org/​10.1177/​02655​3221​4538​225 Doe, C. (2015). Student interpretations of diagnostic feedback. Language Assessment Quarterly, 12(1), 110–​135. https://​doi.org/​10.1080/​15434​303.2014.1002​925 Donovan, C. (2001). Children’s development and control of written story and informational genres: Insights from one elementary school. Research in the Teaching of English, 35, 395–​4 47. Douglas, D. 1994. Quantity and quality in speaking test performance. Language Testing, 11(2), 125–​144. https://​doi.org/​10.1177/​026​5532​2940​1100​203

References  297

Douglas, D., & Hegelheimer, V. (2007). Assessing language using computer technology. Annual Review of Applied Linguistics, 27, 115–​32. https://​doi.org/​10.1017/​S02671​9050​ 8070​062 Douglas, D., & Selinker, L. (1992). Analysing oral proficiency test performance in general and specific purpose contexts. System, 20, 317–​328. https://​doi.org/​10.1016/​ 0346-​251x(92)90043-​3 Douglas, D., & Selinker, L. (1993). Performance on a general versus a field-​specific test of speaking proficiency by international teaching assistants. In D. Douglas & C. Chapelle (Eds.), A new decade of language testing research (pp. 235–​256). TESOL Publications. Downing, S., & Haladyna, T. (2006). Handbook of test development. Routledge. Dunlop, M. (2017). Maximizing feedback for language learning: English language learners’ attention, affect, cognition and usage of computer-​delivered feedback from an English language reading proficiency assessment [Doctoral dissertation, University of Toronto]. https://​ tsp​a ce.libr​a ry.utoro​nto.ca/​bitstr​e am/​1807/​78966/​3/​D unl​op_ ​M​a ggi​e _ ​2 0​1706 ​_​ PhD​_​the​sis.pdf East, M. (2009). Evaluating the reliability of a detailed analytic scoring rubric for foreign language writing. Assessing Writing, 14(2), 88–​115. https://​doi.org/​10.1016/​ j.asw.2009.04.001 Ecke, P. (2004). Language attrition and theories of forgetting: A cross-​d isciplinary review. International Journal of Bilingualism, 8(3), 321–​354. https://​doi.org/​10.1177/​ 13670​0690​4008​0 030​901 Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155–​85. https://​doi.org/​ 10.1177/​02655​3220​7086​780 Edelenbos, P., & Kubanek-​German, A. (2004). Teacher assessment: The concept of ‘diagnostic competence.’ Language Testing, 21(3), 259–​283. https://​doi.org/​10.1191/​ 026553​2204​lt28​4oa Elder, C. (1993). How do subject specialists construe classroom language prof iciency? Language Testing, 10(3), 235–​2 54. https://​d oi.org/​10.1177/​0 26​5532​ 2930​1000​3 03 Elder, C. (2003). The DELNA initiative at the University of Auckland. TESOLANZ Newsletter, 12(1), 15–​16. Elder, C., & Erlam, R. (2001). Development and validation of the Diagnostic English Language Needs Assessment (DELNA): Final report. Department of Applied Language Studies and Linguistics, University of Auckland. Elder, C., & Von Randow, J. (2008). Exploring the utility of a web-​based English language screening tool. Language Assessment Quarterly, 5(3), 173–​194. https://​doi. org/​10.1080/​154343​0 080​2229​334 Ellis, R. (2008). A typology of written corrective feedback types. ELT Journal, 63(2), 97–​107. https://​doi.org/​10.1093/​elt/​ccn​023 Ellis, R. (2010). EPILOGUE: A framework for investigating oral and written corrective feedback. Studies in Second Language Acquisition, 32(2), 335–​349. https://​doi.org/​ 10.1017/​S02722​6310​9990​544 Ellis R. (2017). Oral corrective feedback in L2 classrooms: What we know so far. In H. Nassaji & E. Kartchava (Eds.), Corrective feedback in second language teaching and learning: Research, theory, applications, implications (pp. 3–​18). Routledge. https://​doi. org/​10.4324/​978131​5621​432-​2

298 References

Endres, H. (2012). A comparability study of computer-​based and paper-​based writing tests. Research Notes, 49, 26–​32. Engeström, Y. (1987). Learning by expanding: An activity-​theoretical approach to developmental research. Orienta-​konsultit. Enright, M., & Quinlan, T. (2010). Complementing human judgment of essays written by English language learners with e-​rater scoring. Language Testing, 27(3), 317–​334. https://​doi.org/​10.1177/​02655​3221​0363​144 Faez, F., Taylor, S. K., Majhanovich, S., Brown, P., & Smith, M. (2011). Teachers’ reactions to CEFR’s task-​based approach for FSL classrooms. Synergies Europe, 6, 109–​120. Fayol, M. (1999). From on-​ l ine management problems to strategies in written composition. In M. Torrance & G. Jeffery (Eds.), The cognitive demands of writing. Processing capacity and working memory in text production (pp. 13‒23). Amsterdam University Press. Ferenz, O. (2005). EFL writers’ social networks: Impact on advanced academic literacy development. Journal of English for Academic Purposes, 4(4), 339–​351. https://​doi.org/​ 10.1016/​j.jeap.2005.07.002 Ferris, D. (2002). Treatment of error in second language student writing. The University of Michigan Press. Ferris, D. (2006). Does error feedback help student writers? New evidence on the short–​a nd long-​term effects of written error correction. In K. Hyland & F. Hyland (Eds.), Feedback in second language writing (pp. 81–​104). Cambridge University Press. Feuerstein R., Feuerstein R. S., & Falik L. (2010). Beyond smarter: Mediated learning and the brain’s capacity for change. Teachers College, Columbia University. Fitzgerald, J. (1987). Research on revision in writing. Review of Educational Research, 57(4), 481–​506. https://​doi.org/​10.3102/​0 03465​4305​7004​481 Fitzgerald, J. (2006). Multilingual writing in preschool through 12th grade: The last 15 years. In C. MacArthur, S. Graham & J. Fitzgerald (Eds.), Handbook of writing research (pp. 337‒354). Guilford Press. Fitzgerald, J., & Shanahan, T. (2000). Reading and writing relations and their development. Educational Psychologist, 35(1), 39‒50. https://​doi.org/​10.1207/​S15326​ 985E​P350​1_​5 Floropoulou, C. (2002). Foreign language learners’ attitudes to self-​ a ssessment and DIALANG: A comparison between Greek and Chinese learners of English [Unpublished master’s thesis] Lancaster University, UK. Flower, L., & Hayes, J. (1980). The dynamics of composing: making plans and juggling constraints. In L. Gregg & E. Steinberg (Eds.), Cognitive processes of writing (pp. 31–​ 50). Lawrence Erlbaum Associates. Flower, L., Schriver, K., Carey, L., Haas, C., & Hayes, J. (1989). Planning in writing: The cognition of a constructive process. Carnegie-​Mellon University. Foltz, P., Streeter, L., Lochhaum, K., & Landauer, T. (2013). Implementation and applications of the Intelligent Essay Assessor. In M. Shermis & J. Burstein, (Eds.), Handbook of automated essay evaluation (pp. 68–​88). Routledge. Folz, P., Yan, D., & Rupp, A. (2020). The past, present, and future of automated scoring. In D. Yan, A. Rupp & P. Foltz (Eds.), Handbook of automated scoring. Theory into practice, (pp. 1–​9). CRC Press. Forsberg, F., & Bartning, I (2010). Can linguistic features discriminate between the communicative CERF-​ levels? A pilot study of written L2 French. In I.

References  299

Bartning, M. Martin & I. Vedder (Eds.), Communicative proficiency and linguistic development: Intersection between SLA and language testing research. Eurosla Monograph Series 1 (pp. 133–​158). European Second Language Association. http://​euro​sla.org/​ mon​ogra​phs/​EM01/​EM01h​ome.html Fox, J., Haggerty, J., & Artemeva, N. (2016). Mitigating risk in first-​ year engineering: Post-​admission diagnostic assessment in a Canadian university. In J. Read (Ed.), Post-​admission language assessment of university students. Springer. Freedman, A., & Pringle, I. (1980). Writing in the college years: Some indices of growth. College Composition and Communication, 31, 311–​324. https://​doi.org/​10.2307/​356​491 Fulcher, G. (2003). Testing second language speaking. Longman/​Pearson Education. Fulcher, G. (2010) Practical language testing. Hodder Education. Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment Quarterly, 9(2), 113–​132. https://​doi.org/​10.1080/​15434​303.2011.642​041 Fulcher, G., & Davidson, F. (2009). Test architecture, test retrofit. Language Testing, 26(1), 123–​144. https://​doi.org/​10.1177/​02655​3220​8097​339 Fulcher, G., Davidson, F., & Kemp, J. (2011). Effective rating scale development for speaking tests: Performance decision trees. Language Testing, 28(1), 5–​29. https://​doi. org/​10.1177/​02655​3220​9359​514 Fuller, D. (1995). Development of topic-​comment algorithms and test structures in written compositions of students in grades one through nine [Unpublished doctoral dissertation]. University of Washington. Galbraith, D., van Waes, L., & Torrance M. (2007). Introduction. In M. Torrance, L. van Waes & D. Galbraith (Eds.), Writing and cognition: Research and applications (pp. 1‒10). Elsevier. Gass, S. (1997). Input, interaction, and the second language learner. Lawrence Erlbaum Associates. Gebril, A., & Plakans, L. (2013). Toward a transparent construct of reading-​to-​w rite tasks: The interface between discourse features and proficiency. Language Assessment Quarterly, 10(1), 9‒27. https://​doi.org/​10.1080/​15434​303.2011.642​040 Gibbons, P. (2002). Scaffolding language, scaffolding learning: Teaching second language learners in the mainstream classroom. Heinemann. Gick, M., & Holyoak, K. (1987). The cognitive basis of knowledge transfer. In S. Cormier & J. Hagman (Eds.), Transfer of learning: Contemporary research and applications (pp. 9–​47). Academic Press. Glaboniat, M., Müller, M., Rusch, P., Schmitz, H., & Wertenschlag, L. (2013). Profile Deutsch. Lernzielbestimmungen, Kannbeschreibungen und kommunikative Mittel für die Niveaustufen A1, A2, B1, B2, C1 und C2 des “Gemeinsamen europäischen Referenzrahmens für Sprachen”. Klett Sprachen. Glynn, S., Britton, B., Muth, K., & Dogan, N. (1982). Writing and revising persuasive documents: Cognitive demands. Journal of Educational Psychology, 74, 557‒567. https://​doi.org/​10.1037/​0 022-​0663.74.4.557 Goldberg, A., Russell, M., & Cook, A. (2003). The effect of computers on student writing: A meta-​analysis of studies from 1992 to 2002. The Journal of Technology, Learning and Assessment, 2(1). https://​ejourn​a ls.bc.edu/​i ndex.php/​jtla/​a rti​cle/​ view/​166 Gorman, T., Purves, A., & Degenhart, R. E. (1988).The IEA Study of Written Composition I. The international writing tasks and scoring scales. Pergamon Press.

300 References

Grabin, S., & Llosa, L. (2020). Toward an integrative framework for understanding multimodal writing in the content areas. Journal of Second Language Writing, 47. https://​doi.org/​10.1016/​j.jslw.2020.100​710 Grabowski, J. (2007). The writing superiority effect in the verbal recall of knowledge: Sources and determinants. In M. Torrance, L. van Waes & D. Galbraith (Eds.), Writing and cognition: Research and applications (pp. 165‒179). Elsevier. Graesser, A., McNamara, D., Louwerse, M., & Cai, Z. (2004). Coh-​Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36, 193–​202. https://​doi.org/​10.3758/​bf0​3195​564 Green, A. (2012). Language functions revisited: Theoretical and empirical bases for language construct definition across the ability range. English Profile Studies volume 2. Cambridge University Press. Green, A., & Maycock, L. (2004). Computer-​based IELTS and paper-​based versions of IELTS. Research Notes, 18, 3– ​6. Guo, L., Crossley, S. A., & McNamara, D. (2013). Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing, 18(3), 218–​238. https://​doi.org/​ 10.1016/​j.asw.2013.05.002 Guo, H., Deane, P., van Rijn, P., Zhang, M., & Bennett, R. (2018). Modeling basic writing processes from keystroke logs. Journal of Educational Measurement, 55(2), 194–​ 216. https://​doi.org/​10.1111/​jedm.12172 Gustilo, L., & Magno, C. (2015). Explaining L2 writing performance through a chain of predictors: A SEM approach. The Southeast Asian Journal of English Language Studies, 21(2), 115–​130. https://​doi.org/​10.17576/​3l-​2015-​2102- ​09 Gyllstad, H., Granfeldt, J., Bernardini, P., & Källkvist, M. (2014). Linguistic correlates to communicative proficiency levels of the CEFR: The case of syntactic complexity in written L2 English, L3 French and L4 Italian. In L. Roberts, I. Vedder & J. Hulstijn (Eds.), EUROSLA Yearbook 14 (pp.1–​30). John Benjamins. Hafner, C., & Ho, W. (2020). Assessing digital multimodal composing in second language writing: Towards a process-​based model. Journal of Second Language Writing, 47. https://​doi.org/​10.1016/​j.jslw.2020.100​710 Hamp-​Lyons, L. (1991). Scoring procedures for ESL contexts. In L. Hamp-​Lyons (Ed.), Assessing second language writing in academic contexts (pp. 241–​276). Ablex. Hamp-​Lyons, L. (1995). Rating nonnative writing: The trouble with holistic scoring. TESOL Quarterly, 29, 759–​762. https://​doi.org/​10.2307/​3588​173 Hamp-​ Lyons, Liz. (2003). Writing teachers as assessors of writing. In B. Kroll (Ed.), Exploring the dynamics of second language writing (pp. 162–​189). Cambridge University Press. Harding, L., Alderson, J. C., & Brunfaut, T. (2015). Diagnostic assessment of reading and listening in a second or foreign language: Elaborating on diagnostic principles. Language Testing, 32(3), 317–​336. https://​doi.org/​10.1177/​02655​3221​4564​505 Harding, L., & Brunfaut. T. (2020). Trajectories of language assessment literacy in a teacher-​ researcher partnership: Locating elements of praxis through narrative inquiry. In M. Poehner & O. Inbar-​L ourie (Eds.), Towards a reconceptualization of second language classroom assessment. Praxis and researcher-​teacher partnership (pp.61–​81). Springer.

References  301

Harding, L., Brunfaut, T., Huhta, A., Alderson, J.C, Fish, A., 6 Kremmel, B. (2018, May 25–​27). DIALANG 2.0: Charting a course for revision and expansion of an online diagnostic testing system [Work-​ i n-​ progress presentation]. 15th Annual EALTA Conference, Bochum, Germany. Harding, L., & Kremmel, B. (2016). Teacher assessment literacy and professional development. In D. Tsagari & J. Banerjee (Eds.), Handbook of second language assessment (pp. 413–​427). De Gruyter. Harrington, S., Shermis, M., & Rollins, A. (2000). The influence of word processing on English placement test results. Computers and Composition, 17(2), 197–​210. https://​ doi.org/​10.1016/​s8755-​4615(00)00029-​3 Harrison, G. L., Goegan, L. D., Jalbert, R., McManus, K., Sinclair, K., & Spurling, J. (2016). Predictors of spelling and writing skills in first–​and second-​language learners. Reading and Writing. An Interdisciplinary Journal, 29(1), 69‒89. https://​doi. org/​10.1007/​s11​145-​015-​9580-​1 Harsch, C. (2007). Der gemeinsame europäische Referenzrahmen für Sprachen. Leistung und Grenzen. VDM Verlag Dr. Müller. Harsch, C. (2010). Schreibbewertung im Zuge der Normierung der KMK-​ Bildungsstandards: Der „niveauspezifische Ansatz“ und ausgewählte Schritte zu seiner Validierung. In K. Aguado, H. Vollmer & K. Schramm (Eds.), Fremdsprachliches Handeln beobachten, messen und evaluieren: Neue methodische Ansätze der Kompetenzforschung und Videographie. KFU Kolloquium Fremdsprachenunterricht (pp. 99–​117). Peter Lang. Harsch, C. (2014) General language proficiency revisited: Current and future issues, Language Assessment Quarterly, 11(2), 152–​169. https://​doi.org/​10.1080/​15434​ 303.2014.902​059 Harsch, C., & Hartig, J.(n.d.). The MASK Project-​Modeling of integrated academic-​language competences. www.lab.uni-​bre​men.de/​proj​ect- ​6 -​the-​m ask-​proj​ect-​model​i ng-​of-​i nt​ egra​ted-​acade​m ic-​langu​age-​compe​nten​ces/​ Harsch, C., & Hartig, J. (2015). What are we aligning tests to when we report test alignment to the CEFR? Language Assessment Quarterly, 12(4), 333–​362. https://​doi. org/​10.1080/​15434​303.2015.1092​545 Harsch, C., & Martin, G. (2012). Adapting CEF-​ descriptors for rating purposes: Validation by a combined rater training and scale revision approach. Assessing Writing, 17, 228–​250. https://​doi.org/​10.1016/​j.asw.2012.06.003 Harsch, C., & Martin, G. (2013). Comparing holistic and analytic scoring methods: Issues of validity and reliability. Assessment in Education, 20(3), 281–​307. https://​doi.org/​ 10.1080/​09695​94x.2012.742​422 Harsch, C., Pant, H. A., & Köller, O. (Eds.) (2010). Calibrating standards-​based assessment tasks for English as a first foreign language. Standard-​ setting procedures in Germany. Waxmann. Harsch, C., & Rupp, A. (2011). Designing and scaling level-​specific writing tasks in alignment with the CEFR: A test-​centered approach. Language Assessment Quarterly, 8(1), 1–​33. https://​doi.org/​10.1080/​15434​303.2010.535​575 Harsch, C., Neumann, A., Lehmann, R., & Schröder, K. (2007). Schreibfähigkeit. In B. Beck & E. Klieme (Eds.), Sprachliche Kompetenzen. Konzepte und Messung. DESI-​ Studie (pp. 42–​62). Beltz.

302 References

Harsch, C., Schröder, K., & Neumann, A. (2008). Schreiben Englisch. In DESI Konsortium (Eds.), Unterricht und Kompetenzerwerb in Deutsch und Englisch (pp. 139–​ 148). Beltz. Hasselgreen, A., Carlsen, C., & Helness, H. (2004). European survey of language testing and assessment needs. Part 1: General findings. European Association for Language Testing and Assessment (EALTA). www.ealta.eu.org/​docume​nts/​resour​ces/​sur ​vey-​rep​ort-​ pt1.pdf Haswell, R. (1991). Gaining ground in college writing: Tales of development and interpretation. Southern Methodist University Press. Haswell, R. (2000). Documenting improvement in college writing: A longitudinal approach. Written Communication, 11(3), 307–​352. https://​doi.org/​10.1177/​0741​0883​ 0001​7003​0 01 Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–​112. https://​doi.org/​10.3102/​0 034​6543​0298​487 Hawe, E., & Dixon, H. (2014) Building students’ evaluative and productive expertise in the writing classroom. Assessing Writing, 19, 66–​79. https://​doi.org/​10.1016/​ j.asw.2013.11.004 Hawkins, J., & Filipović, L. (2012). Criterial features in L2 English. Cambridge University Press. Hayes, J. (1996). A new framework for understanding cognition and affect in writing. In M. Levy & S. Ransdell (Eds.), The science of writing: Theories, methods, individual differences, and applications (pp. 1–​27). Lawrence Erlbaum Associates. Hayes, J. (2011). Kinds of knowledge-​telling: Modeling early writing development. Journal of Writing Research, 3(2), 73–​92. https://​doi.org/​10.17239/​jowr-​2011.03.02.1 Hayes, J. (2012). Modeling and remodeling writing. Written Communication, 29(3), 369–​ 388. https://​doi.org/​10.1177/​07410​8831​2451​260 Hayes, J., & Flower, L. (1980). Identifying the organization of writing processes. In L. Gregg & E. Steinberg (Eds.), Cognitive processes of writing (pp. 3–​30). Lawrence Erlbaum Associates. Hayes, J., Flower, L., Schriver, K., Stratman, J., & Carey, L. (1987). Cognitive processes in revision. In S. Rosenberg (Ed.), Advances in applied psycholinguistics, Vol 2: Reading, writing and language processing (pp. 176–​240). Cambridge University Press. Heald-​Taylor, G. (1994). Whole language strategies for ESL students. Dominie Press. Heikkilä, R., Aro, M., Närhi, V., Westerholm, J., & Ahonen, T. (2013). Does training in syllable recognition improve reading speed? A computer-​based trial with poor readers from second and third grade. Scientific Studies of Reading, 17(6), 398–​414. https://​doi.org/​10.1080/​10888​438.2012.753​452 Heck, R., & Crislip, M. (2001). Direct and indirect writing assessments: Examining issues of equity and utility. Educational Evaluation and Policy Analysis, 23, 275–​292. https://​doi.org/​10.3102/​016237​3702​3003​275 Hempelmann, C., Rus, V., Graesser, A., & McNamara, D. (2006). Evaluating state-​ of-​the-​art treebank-​style parsers for Coh-​Metrix and other learning technology environments. Natural Language Engineering, 12, 131–​144. https://​doi.org/​10.1017/​ S13513​2490​6004​207 Henry, K. (1996). Early L2 writing development: A study of autobiographical essays by university-level students of Russian. The Modern Language Journal, 80(3), 309–​326. https://​doi.org/​10.1111/​j.1540-​4781.1996.tb01​613.x

References  303

Hermet, M., Szpakowicz, S., & Duquette, L. (2006). Automated analysis of students’ free-​text answers for computer-​a ssisted assessment. In D. Sharma, R. Sangal, & A. Singh (Eds.), Proceedings of the 13th Conference on Natural Language Processing (pp. 835–​ 845). www.acl​web.org/​a nthol​ogy/​volu​mes/​W16- ​63 Hill, K. (2017). Understanding classroom-​based assessment practices: A precondition for teacher assessment literacy. Papers in Language Testing and Assessment, 6(1), 1–​ 17. www.alta​a nz.org/​uplo​ads/​5/​9/​0/​8/​5908​292/​3.si1h​i ll_​fi na​l _​fo​r mat​ted_​proo​ fed.pdf Hill, K., & Ducasse, M. (2020). Advancing written feedback practice through a teacher-​researcher collaboration in a university Spanish program. In M. Poehner & O. Inbar-​L ourie (Eds.), Towards a reconceptualization of second language classroom assessment. Praxis and researcher-​teacher partnership (pp. 153–​172). Springer. https://​doi. org/​10.1007/​978-​3 -​030-​35081-​9_​8 Hill, K., & McNamara, T. (2012). Developing a comprehensive, empirically based research framework for classroom-​based assessment. Language Testing, 29(3), 395–​ 420. https://​doi.org/​10.1177/​02655​3221​1428​317 Hillocks, G. (1986). Research on written composition: New directions for teaching. National Council of Teachers of English. Hintikka, S., Aro, M., & Lyytinen, H. (2005). Computerized training of the correspondences between phonological and orthographic units. Written Language & Literacy, 8, 79–​102. https://​doi.org/​10.1075/​wll.8.2.07hin Hirose, K. (2006). Dialogue: Pursuing the complexity of the relationship between L1 and L2 writing. Journal of Second Language Writing, 15(2), 142–​146. https://​doi.org/​ 10.1016/​j.jslw.2006.04.002 Hirose, K., & Sasaki, M. (1994). Explanatory variables for Japanese students’ expository writing in English: An exploratory study. Journal of Second Language Writing, 3, 203–​ 229. https://​doi.org/​10.1016/​1060-​3743(94)90017-​5 Hoang, G., & Kunnan, A. (2016). Automated essay evaluation for English language learners: A case study of MY Access. Language Assessment Quarterly, 13(4), 359–​376. https://​doi.org/​10.1080/​15434​303.2016.1230​121 Holopainen, L., Ahonen, T., & Lyytinen, H. (2002). The role of reading by analogy in first grade Finnish readers. Scandinavian Journal of Educational Research, 46, 83–​98. https://​doi.org/​10.1080/​0 03138​3012​0115​624 Horkay, N., Bennett, R., Allen, N., Kaplan, B., & Yan, F. (2006). Does it matter if I take my writing test on computer? An empirical study of mode effects in NAEP. Journal of Technology, Learning, and Assessment, 5(2). https://​ejourn​a ls.bc.edu/​i ndex. php/​jtla/​a rti​cle/​v iew/​1641 Hoskyn, M., & Swanson, H. L. (2003). The relationship between working memory and writing in younger and older adults. Reading and Writing: An Interdisciplinary Journal, 16(8), 759–​784. https://​doi.org/​doi:10.1023/​A:102​732 Hou, J., Koppatz, M., María, J., Quecedo, H., Stoyanova, N., Kopotev, M., & Yangarber, R. (2019). Modeling language learning using specialized Elo ratings. In H. Yannakoudakis, E. Kochmar, C. Leacock, N. Madnani, I. Pilán & T. Zesch (Eds.), Innovative Use of NLP for Building Educational Applications: Proceedings of the 14th Workshop (pp. 494–​506). The Association for Computational Linguistics. https://​ doi.org/​10.18653/​v1/​w19-​4 451 Hughes, A. (1989). Testing for language teachers. Cambridge University Press.

304 References

Huhta, A. (2010). Innovations in diagnostic assessment and feedback: An analysis of the usefulness of the DIALANG language assessment system. [Unpublished doctoral dissertation]. University of Jyväskylä. www.jyu.fi/​hytk/​fi/​laitok ​set/​solki/​henki ​loku​nta/​henki​ loku​nta/​huhta-​a ri/​huhta-​2010-​phd-​d isse​r tat​ion-​on-​d ial​a ng.pdf Huhta, A. (forthcoming). Diagnostic assessment revisited. In F. Hult, Francis & B. Spolsky (Eds.), Handbook of Educational Linguistics (2nd edition). Wiley. Huhta, A. (in print) Improving the impact of technology on diagnostic language assessment. In K. Sadeghi & D. Douglas (Eds.), Fundamental considerations in technology mediated language assessment. Cambridge University Press. Huhta, A., Alanen, R., Tarnanen, M., Martin, M., & Hirvelä, T. (2014). Assessing learners’ writing skills in a SLA study: Validating the rating process across tasks, scales and languages. Language Testing, 31(3) 307–​328. https://​doi.org/​10.1177/​02655​ 3221​4526​176 Huhta, A., & Boivin, N. (2023). Changes in language assessment through the lens of New Materialism. In J. Ennser-​K ananen & T. Saarinen (Eds.), New Materialist explorations into language education. Springer. Huhta, A., & Figueras, N. (2004). Using the CEF to promote language learning through diagnostic testing. In K. Morrow (Ed.), Insights from the Common European Framework (pp. 65–​76). Oxford University Press. Huhta, A., Hirvelä, T., & Banerjee, J. (2005). European survey of language testing and assessment needs. Report: part two –​ regional findings. European Association for Language Testing and Assessment. www.ealta.eu.org/​resour​ces.htm Huhta, A., & Leontjev, D. (forthcoming). Predicting EFL skills in a 5-​year longitudinal setting: Contribution of L1 and L2 proficiency and cognitive abilities [Manuscript submitted for publication]. Centre for Applied Language Studies, University of Jyväskylä. Huhta, A., Luoma, S., Oscarson, M., Sajavaara, K., Takala, S., & Teasdale, A. (2002). DIALANG –​A diagnostic language assessment system for learners. In J. C. Alderson, (Ed.), Common European Framework of Reference for Languages: Learning, teaching, assessment. Case studies (pp. 130–​145). Council of Europe. https://​r m.coe. int/​168​069f​403 Huisman, B., Saab, N., van Driel, J., & van den Broek, P. (2018). Peer feedback on academic writing: Undergraduate students’ peer feedback role, peer feedback perceptions and essay performance. Assessment & Evaluation in Higher Education, 43(6), 955–​968. https://​doi.org/​10.1080/​02602​938.2018.1424​318 Hulstijn, J. (2011). Language proficiency in native and nonnative speakers: An agenda for research and suggestions for second-​language assessment. Language Assessment Quarterly, 8, 229–​249. https://​doi.org/​10.1080/​15434​303.2011.565​844 Hulstijn, J., Alderson, J.C., & Schoonen, R. (2010). Developmental stages in second-​ language acquisition and levels of second-​language proficiency: Are there links between them? In I. Bartning, M. Martin & I. Vedder (Eds.), Communicative proficiency and linguistic development: intersections between SLA and language testing research. Eurosla monographs series No. 1 (pp. 11–​20). European Second Language Association. http://​ euro​sla.org/​mon​ogra​phs/​EM01/​EM01h​ome.html Hunt, K. (1965). Grammatical structures written at three grade levels. NCTE Research Report No. 3. National Council of Teachers of English. Hyland, K. (2005). Metadiscourse: Exploring interaction in writing. Continuum.

References  305

Hyland, F. (2011). The language learning potential of form-​ focused feedback on writing. In R. Mancho ́n (Ed.), Learning-​to-​w rite and writing-​to-​learn in an additional language (pp. 159–​179). John Benjamins. Hyland, K. (2008). Academic discourse. Continuum. Hyland, K., & Hyland, F. (2006). Feedback on second language students’ writing. Language Teaching, 39(2), 83–​101. https://​doi.org/​doi:10.1017/​S02614​4 480​6003​399 In’nami, Y., & Koizumi, R. (2016). Task and rater effects in L2 speaking and writing: A synthesis of generalizability studies. Language Testing, 33(3), 341–​366. https://​doi. org/​10.1177/​02655​3221​5587​390 Indrisano, R., & Squire, J. R. (2000). Perspectives on writing: Research, theory, and practice. International Reading Association. Isaacs, T., Trofimovich, P., & Foote, A. (2018). Developing a user-​oriented second language comprehensibility scale for English-​medium universities. Language Testing, 35(2), 193–​216. https://​doi.org/​10.1177/​02655​3221​7703​433 Iwashita, N., McNamara, T., & Elder, C. (2001). Can we predict task difficulty in an oral proficiency test? Exploring the potential of an information-processing approach to task design. Language Learning, 51(3), 401–​436. https://​doi.org/​10.1111/​ 0023-​8333.00160 James, C. (1980). Contrastive analysis. Longman. James, M. (2008). The influence of perceptions of task similarity/​d ifference on learning transfer in second language writing. Written Communication, 25, 76–​103. https://​doi. org/​10.1177/​07410​8830​7309​547 James, M. (2014). Learning transfer in English-​ for-​ academic-​ purposes contexts: A systematic review of research. Journal of English for Academic Purposes, 14, 1–​13. https://​ doi.org/​10.1016/​j.jeap.2013.10.007 Jang, E., & Wagner, M. (2014). Diagnostic feedback in the classroom. In A. Kunnan (Ed.), Companion to language assessment. Wiley-​Blackwell. Jang, E. (2009). Cognitive diagnostic assessment of L2 reading comprehension ability: Validity arguments for Fusion Model application to LanguEdge assessment. Language Testing, 26(1), 31–​73. https://​doi.org/​10.1177/​02655​3220​8097​336 Jang, E., Dunlop, M., Park, G., & van der Boom, E. (2015). How do young students with different profiles of reading skill mastery, perceived ability, and goal orientation respond to holistic diagnostic feedback? Language Testing, 32(3), 359–​383. https://​ doi.org/​10.1177/​02655​3221​5570​924 Jarvis, D. (2002). The Process Writing Method. The Internet TESL Journal, 8(7). http://​ ite​slj.org/​Tec​h niq​ues/​Jar ​v is-​Writ ​i ng.html Jarvis, S. (2002). Short texts, best-​fitting curves and new measures of lexical diversity. Language Testing, 19(1), 57–​84. https://​doi.org/​10.1191/​026553​2202​lt22​0oa Johnston, B. (2009). Collaborative teacher development. In A. Burns & J. Richards (Eds.), Cambridge guide to second language teacher education (pp. 241–​249). Cambridge University Press. Jones, J. (2010). The role of assessment for learning in the management of primary to secondary transition: Implications for language teachers. The Language Learning Journal, 38(2), 175–​191. https://​doi.org/​10.1080/​095717​3090​2928​052 Jones, N. (1993). An item bank for testing English language proficiency: Using the Rasch model to construct an objective measure [Unpublished doctoral dissertation]. University of Edinburgh.

306 References

Jongejan, W., Verhoeven, L., & Siegel, L. (2007). Predictors of reading and spelling abilities in first–​a nd second-​language learners. Journal of Educational Psychology, 99(4), 835–​851. https://​doi.org/​10.1037/​0 022-​0663.99.4.835 Just, M., & Carpenter, P. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99(1), 122–​149. https://​doi. org/​10.1037/​0 033-​295x.99.1.122 Just M., Carpenter P., & Keller T. (1996). The capacity theory of comprehension: New frontiers of evidence and arguments. Psychological Review, 103(4), 773–​780. https://​ doi.org/​10.1037/​0 033-​295x.103.4.773 Kang, E., & Han, Z. (2015). The efficacy of written corrective feedback in improving L2 written accuracy: A meta-​analysis. The Modern Language Journal, 99(1), 1–​18. https://​doi.org/​10.1111/​modl.12189 Karimi-​A ghdam, S. (2016). Moving toward a supertheory for all seasons: Dialectical Dynamic Systems Theory and Sociocultural Theory –​A reply to McCafferty (2016). Language and Sociocultural Theory, 3(1), 89–​96. https://​doi.org/​10.1558/​l st. v3i1.30478 Keilty, M., & Harrison, G. (2015). Linguistic and literacy predictors of early spelling in first and second language learners. The Canadian Journal of Applied Linguistics, 18(1), 87–​106. https://​journ​a ls.lib.unb.ca/​i ndex.php/​CJAL/​a rti​cle/​v iew/​21278 Kektsidou, N., & Tsagari, D. (2019). Using DIALANG to track English language learners’ progress over time. Papers in Language Testing and Assessment, 8(1), 1–​30. https://​hdl.han​d le.net/​10642/​7216 Kellogg, R. (1996). A model of working memory in writing. In C. Levy & S. Ransdell (Eds.), The science of writing: Theories, methods, individual differences and applications (pp. 57–​71). Lawrence Erlbaum Associates. Kellogg, R. (1999). Components of working memory in text production. In M. Torrance & G. Jeffery (Eds.), The cognitive demands of writing. Processing capacity and working memory in text production (pp. 43‒61). Amsterdam University Press. Kellogg, R., Olive, T., & Piolat, A. (2007). Verbal, visual, and spatial working memory in written language production. Acta Psychologica, 124(3), 382–​397. https://​doi.org/​ 10.1016/​j.act​psy.2006.02.005 Kellogg, R., Turner, C., Whiteford, A., & Mertens, A. (2016). The role of working memory in planning and generating written sentences. Journal of Writing Research, 7(3), 397–​416. https://​doi.org/​10.17239/​jowr-​2016.07.03.04 Kennedy, A. (2011). Collaborative continuing professional development (CPD) for teachers in Scotland: Aspirations, opportunities and barriers. European Journal of Teacher Education, 34(1), 25–​41. https://​doi.org/​10.1080/​02619​768.2010.534​980 Khushik G., & Huhta, A. (2020). Investigating syntactic complexity in EFL learners’ writing across Common European Framework of Reference levels A1, A2, and B1. Applied Linguistics, 41(4), 506–​532. https://​doi.org/​10.1093/​app​l in/​a my ​064 Khushik G., & Huhta, A. (2022). Syntactic complexity in Finnish-​background EFL learners’ writing at CEFR levels A1–​B2. European Journal of Applied Linguistics, 10(1), 142–​184. https://​doi.org/​10.1515/​eujal-​2021- ​0 011 Kiely, R. (2018). Developing students’ self-​a ssessment skills: The role of the teacher. In J. McE. Davis, J. Norris, M. Malone & T. McKay (Eds.), Useful assessment and evaluation in language education (pp. 3–​19). Georgetown University Press.

References  307

Kim, Y.-​H. (2010). An argument-​based validity inquiry into the empirically-​derived descriptor-​ based diagnostic (EDD) assessment in ESL academic writing [Unpublished doctoral dissertation]. University of Toronto. Kim, Y.-​H. (2011). Diagnosing EAP writing ability using the Reduced Reparameterized Unified Model. Language Testing, 28(4), 509–​541. https://​doi.org/​10.1177/​02655​ 3221​1400​860 Kim, H. R., Bowles, M., Yan, X., & Chung, S. J. (2018). Examining the comparability between paper-​and computer-​based versions of an integrated writing placement test. Assessing Writing, 36, 49–​62. https://​doi.org/​10.1016/​j.asw.2018.03.006 Kim, Y.-​S., Gatlin, B., Al Otaiba, S., & Wanzek, J. (2018). Theorization and an empirical investigation of the component-​based and developmental text writing fluency construct. Journal of Learning Disabilities, 51(4), 320–​335. https://​doi.org/​ 10.1177/​0 0222​1941​7712​016 King, F., Rohani, F., Sanfilippo, C., & White, N. (2008). Effects of handwritten versus computer-​w ritten modes of communication on the quality of student essays. CALA report 208, Center for Advancement of Learning and Assessment, Florida State University. Klimova, B. (2015). Diary writing as a tool for students’ self-​reflection and teacher’s feedback in the course of academic writing. Procedia –​ Social and Behavioral Sciences, 197, 549–​553. https://​doi.org/​10.1016/​j.sbs​pro.2015.07.189 Kluger, A., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-​a nalysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254–​284. https://​doi.org/​10.1037/​0 033-​2909.119.2.254 KMK (Standing Conference of the German Cultural Ministries). (2004). Bildungsstandards für die erste Fremdsprache (Englisch/​Französisch) für den Hauptschulabschluss. Luchterhand. KMK (Standing Conference of the German Cultural Ministries). (2006). Gesamtstrategie der Kultusministerkonferenz zum Bildungsmonitoring. Wolters-​K luwe. KMK (Standing Conference of the German Cultural Ministries). (2018). Vereinbarung zur Weiterentwicklung der Vergleichsarbeiten (VERA). www.kmk.org/​fi lead​m in/​ Date​ien/​vero​effe​ntli​chun​gen_ ​besc​h lue​sse/​2 012/​2 012_ ​03_ ​0​8 _​We​iter​entw ​ickl​u ng-​ VERA.pdf Knoch, U. (2007). Diagnostic writing assessment: The development and validation of a rating scale [Unpublished doctoral dissertation]. University of Auckland. Knoch, U. (2009a). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing, 26 (2), 275–​304. https://​doi.org/​10.1177/​02655​3220​8101​0 08 Knoch, U. (2009b). Diagnostic writing assessment: The development and validation of a rating scale. Peter Lang. Knoch, U. (2011a). Investigating the effectiveness of individualized feedback to rating behaviour –​A longitudinal study. Language Testing, 28, 179–​200. https://​doi.org/​ 10.1177/​02655​3221​0384​252 Knoch, U. (2011b). Rating scales for diagnostic assessment of writing: What should they look like and where should the criteria come from? Assessing Writing, 16(2), 81–​96. https://​doi.org/​10.1016/​j.asw.2011.02.003 Knoch, U., & Sitajalabhorn, W. (2013). A closer look at integrated writing tasks: Towards a more focussed definition for assessment purposes. Assessing Writing, 18(4), 300–​ 308. https://​doi.org/​10.1016/​j.asw.2013.09.003 Knorr, D. (2016). ‘Modell Phasen und Handlungen akademischer Textproduktion.’ Eine Visualisierung zur Beschreibung von Textproduktionsprojekten. In S. Ballweg

308 References

(Ed.), Schreibberatung und Schreibtraining. Impulse aus Theorie, Empirie und Praxis (pp. 251–​273). Peter Lang. Kobayashi, H., & Rinnert, C. (2012). Understanding L2 writing from a multicompetence perspective: Dynamic repertoires of knowledge and text construction. In R. Manchón (Ed.), L2 writing: Multiple perspectives (pp. 101–​134). DeGruyter. Koda, K. (1993). Transferred L1 strategies and L2 syntactic structure in L2 sentence comprehension. The Modern Language Journal, 77(4), 490–​500. https://​doi.org/​ 10.1111/​j.1540-​4781.1993.tb01​997.x Kormos, J. (2012). The role of individual differences in L2 writing. Journal of Second Language Writing, 21(4), 390–​403. https://​doi.org/​10.1016/​j.jslw.2012.09.003 Kremmel, B., Eberharter, K., Holzknecht, F., & Konrad, E. (2018). Fostering language assessment literacy through teacher involvement in high-​stakes test development. In D. Xerri & P. Vella Briffa (Eds.), Teacher involvement in high stakes language testing (pp. 173–​194). Springer. Krings, H. (1986). Translation problems and translation strategies of advanced German learners of French (L2). In J. House & S–​Blum-​Kulka (Eds.) Interlingual and intercultural communication. Discourse and cognition in translation and second language acquisition studies (pp. 263–​76). Gunter Narr. Krings, H. (1989). Schreiben in der Fremdsprache –​Prozeßanalysen zum “vierten skill”. In G. Antos & H. Krings (Eds.), Textproduktion. Ein interdisziplinärer Forschungsüberblick (pp. 377–​436). Niemeyer. Kuiken, F., & Vedder, I. (2007). Task complexity and measures of linguistic performance in L2 writing. International Review of Applied Linguistics, 45(3), 261–​284. https://​doi. org/​10.1515/​i ral.2007.012 Kuiken, F., & Vedder, I. (2008). Cognitive task complexity and written output in Italian and French as a foreign language. Journal of Second Language Writing, 17, 48–​ 60. https://​doi.org/​10.1016/​j.jslw.2007.08.003 Kuiken, F. & Vedder, I. (2014). Rating written performance: What do raters do and why? Language Testing, 31(3) 329–​348. https://​doi.org/​10.1177/​02655​3221​4526​174 Kulhavy, R., & Wager, W. (1993). Feedback in programmed instruction: Historical context and implications for practice. In J. Dempsey & G. Sales (Eds.), Interactive Instruction and Feedback. Educational Technology Publications. Kunnan, A. (1995). Test taker characteristics and test performance: A structural modeling approach. Cambridge University Press. Kunnan, A., & Jang, E. (2011). Diagnosing feedback in language assessment. In M. Long & C. Doughty (Eds.), Handbook of language teaching (pp. 610–​627). Wiley-​Blackwell. Kyle, K. (2016). Measuring syntactic development in L2 writing: Fine grained indices of syntactic complexity and usage-​ based indices of syntactic sophistication [Doctoral dissertation, Georgia State University]. http://​schol​a rwo​rks.gsu.edu/​a le​sl_ ​d ​iss/​35 Kyle, K., & Crossley, S. (2015), Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4), 757–​786. https://​doi.org/​ 10.1002/​tesq.194 Kyle, K., & Crossley, S. (2016). The relationship between lexical sophistication and independent and source-​based writing. Journal of Second Language Writing, 3, 12–​24. https://​doi.org/​10.1016/​j.jslw.2016.10.003 Kyle, K., Crossley, S., & Berger, C. (2018). The Tool for the Automatic Analysis of Lexical Sophistication (TAALES): Version 2.0. Behavior Research Methods, 50, 1030–​ 1046. https://​doi.org/​10.3758/​s13​428- ​017- ​0924- ​4

References  309

Kyle, F., Kujala, J., Richardson, U., Lyytinen, H., & Goswami, U. (2013). Assessing the effectiveness of two theoretically motivated computer-​a ssisted reading interventions in the United Kingdom: GG Rime and GG Phoneme. Reading Research Quarterly, 48(1), 61–​76. https://​doi.org/​10.1002/​r rq.038 Lam, R. (2016). Assessment as learning: Examining a cycle of teaching, learning, and assessment of writing in the portfolio-​based classroom. Studies in Higher Education, 41(11), 1900–​1917. https://​doi.org/​10.1080/​03075​079.2014.999​317 Landauer, T., Laham, D., & Foltz, P. (2003). Automatic essay assessment. Assessment in Education: Principles, Policy & Practice, 10(3), 295–​308. https://​doi.org/​10.1080/​0969​ 5940​3200​0148​154 Landerl, K. (2000). Influences of orthographic consistency and reading instruction on the development of nonword reading skills. European Journal of Psychology of Education, 3, 239–​257. https://​doi.org/​10.1007/ ​bf0​3173​177 Lantolf, J. P. (2000). Sociocultural theory and second language learning. Oxford University Press. Lantolf, J. P., Kurtz, L., & Kisselev, O. (2016). Understanding the revolutionary character of L2 development in the ZPD: Why levels of mediation matter. Language and Sociocultural Theory, 3(2), 177–​196. https://​doi.org/​10.1558/​l st.v3i2.32867 Lantolf, J. P., & Poehner, M. E. (2011). Dynamic assessment in the classroom: Vygotskian praxis for second language development. Language Teaching Research, 15(1), 11–​33. https://​doi.org/​10.1177/​13621​6881​0383​328 Lantolf, J. P., & Poehner, M. E. (2014). Sociocultural theory and the pedagogical imperative in L2 education: Vygotskian praxis and the research/​practice divide. Routledge. Lantolf, J. P., & Thorne, S. (2007). Sociocultural theory and second language learning. In B. van Patten & J. Williams (Eds.), Theories in second language acquisition: An introduction (pp. 197–​219). Lawrence Erlbaum Associates. Larsen-​Freeman, D. (1997). Chaos/​complexity science and second language acquisition. Applied Linguistics, 18(2), 141–​165. https://​doi.org/​10.1093/​app​l in/​18.2.141 Larsen-​ Freeman, D. (2000). Techniques and principles in language teaching. Oxford University Press. Larsen-​Freeman, D. (2002). Language acquisition and language use from a chaos/​ complexity theory perspective. In C. Kramsch (Ed.), Language acquisition and language socialization (pp.33–​46). Continuum. Larsen-​Freeman, D. (2006). The emergence of complexity, fluency, and accuracy in the oral and written production of five Chinese learners of English. Applied Linguistics, 27(4), 590–​619. http://​doi.org/​10.1093/​app​l in/​a ml​029 Larsen-​Freeman, D. (2012). Complex, dynamic systems: A new transdisciplinary theme for applied linguistics? Language Teaching, 45(2), 202–​221. https://​doi.org/​10.1017/​ s02614​4 481​1000​061 Leacock, C., & Chodorow, M. (2003). C-​rater: Automated scoring of short-​answer questions. Computers and the Humanities, 37, 389–​405 https://​doi.org/​10.1023/​ A:102577​9619​903 Leaker, C., & Ostman, H. (2010). Composing knowledge: Writing, rhetoric, and reflection in prior learning assessment. College Composition and Communication, 61(4), 691–​717. Lee, H. K. (2004). A comparative study of ESL writers’ performance in a paper-​based and a computer-​delivered writing test. Assessing Writing, 9(1), 4–​26. https://​doi.org/​ 10.1016/​j.asw.2004.01.001

310 References

Lee, I. (2014). Revisiting teacher feedback in EFL writing from sociocultural perspectives. TESOL Quarterly, 48(1), 201–​213. https://​doi.org/​10.1002/​tesq.153 Lee, I. (2017). Classroom writing assessment and feedback in L2 school contexts. Springer Nature. https://​doi.org/​10.1007/​978-​981-​10-​3924-​9 Lee, I., & Coniam, D. (2013). Introducing assessment for learning for EFL writing in an assessment of learning examination-​d riven system in Hong Kong. Journal of Second Language Writing, 22(1), 34–​50. https://​doi.org/​10.1016/​J.JSLW.2012.11.003 Lee, Y.-​J. (2002). A comparison of composing processes and written products in timed-​ essay tests across paper-​and-​pencil and computer modes. Assessing Writing, 11(2), 135–​157. https://​doi.org/​10.1016/​s1075-​2935(03)00003-​5 Lee, Y.-​W. (2015) Diagnosing diagnostic language assessment. Language Testing, 32(3) 299–​316. https://​doi.org/​10.1177/​02655​3221​4565​387 Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication, 30(3), 358–​392. https://​doi.org/​10.1177%2F0​7410​8831​3491​692 Leontyev, A. (1981). Problems of the development of the mind. Progress Publishers. Leontjev, D. (2014). The effect of automated adaptive corrective feedback: L2 English questions. APPLES: Journal of Applied Language Studies, 8(2), 43–​66. http://​app​les. jyu.fi/​A rti​cleF​i le/​downl​oad/​459 Leontjev, D. (2016a). ICAnDoiT: The impact of computerised adaptive corrective feedback on L2 English learners [Doctoral dissertation, University of Jyväskylä] https://​jyx.jyu.fi/​ han​d le/​123456​789/​49346 Leontjev, D. (2016b). Exploring and reshaping learners’ beliefs about the usefulness of corrective feedback. ITL –​ International Journal of Applied Linguistics, 167(1), 46–​77. https://​doi.org/​10.1075/​itl.167.1.03leo Leontjev, D., Huhta, A., & Mäntylä, K. (2016). Word derivational knowledge and writing proficiency: How do they link? System, 59, 73–​89. https://​doi.org/​10.1016/​ j.sys​tem.2016.03.013 Leucht, M., Harsch, C., Nöth, D. & Köller, O. (2009). Bereitstellung eines normierten Aufgabenpools für kompetenzbasierte Vergleichsarbeiten im Fach Englisch in der 8. Jahrgangsstufe im Schuljahr 2008/​2009. Internal unpublished technical report. IQB: Berlin. Levelt, W. (1989). Speaking: From intention to articulation. MIT Press. Levi, T., & Inbar-​ L ourie, O. (2020). Assessment literacy or language assessment literacy: Learning from the teachers. Language Assessment Quarterly, 17(2), 168–​182. https://​doi.org/​10.1080/​15434​303.2019.1692​347 Levy, C., & Marek, P. (1999). Testing components of Kellogg’s multicomponent model of working memory in writing: The role of phonological loop. In M. Torrance & G. Jeffery (Eds.), The cognitive demands of writing. Processing capacity and working memory in text production (pp. 25‒41). Amsterdam University Press. Li, J. (2006). The mediation of technology in ESL writing and its implications for writing assessment. Assessing Writing, 11(1), 5–​21. https://​doi.org/​10.1016/​j.asw.2005.09.001 Li, J., Link, S., & Hegelheimer, V. (2015). Rethinking the role of automated writing evaluation (AWE) feedback in ESL writing instruction. Journal of Second Language Writing, 27, 1–​18. https://​doi.org/​10.1016/​j.jslw.2014.10.004 Liao, H.-​C. (2016). Enhancing the grammatical accuracy of EFL writing by using an AWE-​a ssisted process approach. System, 62, 77–​92. https://​doi.org/​10.1016/​J.SYS​ TEM.2016.02.007

References  311

Lidz, C. (1991). Practitioner’s guide to dynamic assessment. Guilford. Lidz, C., & Gindis, B. (2003). Dynamic assessment of the evolving cognitive functions in children. In A. Kozulin, B. Gindis, V. Ageyev & M. Miller (Eds.), Vygotsky’s educational theory in cultural context (pp. 99–​116). Cambridge University Press. Lightbown P., & Spada, N. (1999). How languages are learned. Oxford University Press. Lievens, F. (2001). Assessor training strategies and their effects on accuracy, interrater reliability, and discriminant validity. Journal of Applied Psychology, 86(2), 255–​264. https://​doi.org/​10.1037/​0 021-​9010.86.2.255 Lim, G. (2010). Investigating prompt effects in writing performance assessment. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 8, 95–​116. Lim, J. (2019) An investigation of the text features of discrepantly-​ scored ESL essays: A mixed methods study. Assessing Writing, 39, 1–​13. https://​doi.org/​10.1016/​ j.asw.2018.10.003 Lim, H., & Kahng, J. (2012). Review of Criterion for English language learning. Language Learning & Technology, 16(2), 38–​45. http://​d x.doi.org/​10125/​4 4285 Lim, J., & Polio, C. (2020). Multimodal assignments in higher education: Implications for multimodal writing tasks for L2 writers. Journal of Second Language Writing, 47. https://​doi.org/​10.1016/​j.jslw.2020.100​710 Lindgrén, S.-​ A., & Laine, M. (2011). Multilingual dyslexia in university students: Reading and writing patterns in three languages. Clinical Linguistics & Phonetics, 25(9), 753‒766. https://​doi.org/​10.3109/​02699​206.2011.562​594 Link, S., Dursun, A., Karakaya, K., & Hegelheimer, V. (2014). Towards better ESL practices for implementing automated writing evaluation. CALICO Journal, 31(3), 323–​344. https://​doi.org/​10.11139/​cj.31.3.323-​344 Liskin-​Gasparro, J. (2003). The ACTFL proficiency guidelines and Oral Proficiency Interview: A brief history and analysis of their survival. Foreign Language Annals, 36(4), 484–​490. https://​doi.org/​10.1111/​j.1944–​9720.2003.tb02​137.x Little, D. (2005). The Common European Framework and the European Language Portfolio: Involving learners and their judgements in the assessment process. Language Testing, 22(3), 321–​336. https://​doi.org/​10.1191/​026553​2205​lt31​1oa Little, D. (2010). The European Language Portfolio and self-​a ssessment: Using “I can” checklists to plan, monitor and evaluate language learning. In M. Schmidt, N. Naganuma, F. O’Dwyer, A. Imig & K. Sakai (Eds.), Can do statements in language education in Japan and beyond (pp. 157–​166). Asahi Press. Liu, S., & Kunnan, A. J. (2016). Investigating the application of automated writing evaluation to Chinese undergraduate English majors: A case study of WriteToLearn. CALICO Journal, 33(1), 71–​91. https://​doi.org/​10.1558/​cj.v33i1.26380 Liu, F., & Stapleton, P. (2018). Connecting writing assessment with critical thinking: An exploratory study of alternative rhetorical functions and objects of enquiry in writing prompts. Assessing Writing, 38, 10–​20. https://​doi.org/​10.1016/​j.asw.2018.09.001 Llosa, L. (2011). Standards-​based classroom assessments of English proficiency: A review of issues, current developments, and future directions for research. Language Testing, 28(3), 367–​382. https://​doi.org/​10.1177/​02655​3221​1404​188 Llosa, L., Beck, S., & Zhao, C. (2011). An investigation of academic writing in secondary schools to inform the development of diagnostic classroom assessments. Assessing Writing, 16, 256–​273. https://​doi.org/​10.1016/​j.asw.2011.07.001

312 References

Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–​496. https://​doi.org/​10.1075/​ ijcl.15.4.02lu Lu, X. (2011). A corpus-​based evaluation of syntactic complexity measures as indices of college-​level ESL writers’ language development. TESOL Quarterly, 45(1), 36–​62. https://​doi.org/​10.5054/​tq.2011.240​859 Lu, X. (2017). Automated measurement of syntactic complexity in corpus-​based L2 writing research and implications for writing assessment. Language Testing, 34(4), 493–​511. https://​doi.org/​10.1177/​02655​3221​7710​675 Lu, X., & Ai, H. (2015). Syntactic complexity in college-​level English writing: Differences among writers with diverse L1 backgrounds. Journal of Second Language Writing, 29, 16–​27. https://​doi.org/​10.1016/​j.jslw.2015.06.003 Lumley, T. (2002). Assessment criteria in a large-​scale writing test: What do they really mean to the raters? Language Testing, 19(3), 246–​276. https://​doi.org/​10.1191/​ 026553​2202​lt23​0oa Lumley, T. (2005). Assessing second language writing: The rater’s perspective. Peter Lang. Luoma, S., & Tarnanen, M. (2003). Creating a self-​ rating instrument for second language writing: From idea to implementation. Language Testing, 20(4), 440–​465. https://​doi.org/​10.1191/​026553​2203​lt26​7oa MacArthur, C. (2016). Instruction in evaluation and revision. In C. MacArthur, S. Graham & J. Fitzgerald (Eds.), Handbook of writing research (2nd ed.) (pp. 272–​287). Guilford Press. MacArthur, C., & Graham, S. (2016). Writing research from a cognitive perspective. In C. MacArthur, S. Graham & J. Fitzgerald (Eds.), Handbook of writing research (2nd ed.) (pp. 24‒40). Guilford Press. MacCann, R., Eastment, B., & Pickering, S. (2002). Responding to free response examination questions: Computer versus pen and paper. British Journal of Educational Technology, 33(2), 173–​188. https://​doi.org/​10.1111/​1467-​8535.00251 Mahfoodh, O. (2017). “I feel disappointed”: EFL university students’ emotional responses towards teacher written feedback. Assessing Writing, 31, 53–​72. https://​doi. org/​10.1016/​J.ASW.2016.07.001 Manchón, R. (2012). L2 writing development: Multiple perspectives. De Gruyter. Manchón, R. (2013). Writing. In F. Grosjean & P. Li (Eds.), The psycholinguistics of bilingualism (pp. 100‒115). Wiley-​Blackwell. Manchón, R., & Roca de Larios, J. (2007). On the temporal nature of planning in L1 and L2 composing. Language Learning, 57(4), 549‒593. https://​doi.org/​10.1111/​ j.1467-​9922.2007.00428.x Manchón, R., Roca de Larios, J., & Murphy, L. (2009). The temporal dimension and problem-​solving nature of foreign language composing processes. Implications for theory. In R. Manchón (Ed.), Writing in foreign language contexts. Learning, teaching, and research (pp. 102‒192). Multilingual Matters. Manchón, R., & Williams, J. (2016). L2 writing and SLA studies. In R. Manchón & P. Matsuda (Eds.), Handbook of second and foreign language writing (pp. 567‒586). De Gruyter. Mangen, A., & Velay, J. (2012). The haptics of writing: Cross-​d isciplinary explorations of the impact of writing technologies on the cognitive-​ sensorimotor processes involved in writing. In M. Torrance, D. Alamargot, M. Castelló, F. Ganier, O. Kruse,

References  313

A. Mangen, L. Tolchinsky, & L. van Waes (Eds.), Learning to write effectively: Current trends in European research (pp. 405‒407). Brill. Mäntylä, K., & Huhta A. (2013). Knowledge of word parts. In J. Milton & T. Fitzpatrick (Eds.), Dimensions of vocabulary knowledge (pp. 45‒59). Palgrave. Martin, M., Mustonen, S., Reiman, & Seilonen, M. (2010. On becoming an independent user. In I. Bartning, M. Martin & I. Vedder (Eds.), Communicative proficiency and linguistic development: intersections between SLA and language testing research. EUROSLA Monograph Series, 1. (pp. 57–​88). European Second Language Association. http://​euro​sla.org/​mon​ogra​phs/​EM01/​EM01h​ome.html Matsuda, P. (1997). Contrastive rhetoric in context: A dynamic model of L2 writing. Journal of Second Language Writing, 6(1), 45‒60. https://​doi.org/​10.1016/​ s1060-​3743(97)90005-​9 Mawlawi Diab, N. (2015). Effectiveness of written corrective feedback: Does type of error and type of correction matter? Assessing Writing, 24, 16–​34. https://​doi.org/​ 10.1016/​J.ASW.2015.02.001 Maycock, L., & Green, A. (2005) The effects on performance of computer familiarity and attitudes towards CB IELTS, ResearchNotes, 20, 3–​8. McCutchen, D. (1996). A capacity theory of writing: Working memory in composition. Educational Psychology Review, 8(3), 299–​325. https://​doi.org/​10.1007/ ​BF0​1464​076 McCutchen, D., Francis, M., & Kerr, S. (1997). Revising for meaning: Effects of knowledge and strategy. Journal of Educational Psychology, 89(4), 667–​676. https://​doi. org/​10.1037/​0 022-​0663.89.4.667 McNamara, T. (1996). Measuring second language performance. Longman. McNamara, D. S, Crossley, S. A., & McCarthy, P. M. (2010). Linguistic features of writing quality. Written Communication, 27(1), 57–​86. https://​doi.org/​10.1177/​07410​ 8830​9351​547 McNamara, D., Crossley, S., Roscoe, R., Allen, L., & Dai, J. (2015). A hierarchical classification approach to automated essay scoring. Assessing Writing, 23, 35–​59. https://​doi.org/​10.1016/​j.asw.2014.09.002 McNamara, D., Graesser, A., McCarthy, P., & Cai, Z. (2014). Automated evaluation of text and discourse with Coh-​Metrix. Cambridge University Press. McNamara, D., Louwerse, M., McCarthy, P., & Graesser, A. (2010). Coh-​ Metrix: Capturing linguistic features of cohesion. Discourse Processes, 47, 292–​330. https://​doi.org/​10.1080/​016385​3090​2959​943 Meara, P. (2004). Modeling vocabulary loss. Applied Linguistics, 25(2), 137–​155. https://​ doi.org/​10.1093/​app​l in/​25.2.137 Meara, P., & Milton, J. (2003). X-​L ex: The Swansea Levels Test. (Computer software). Longonstics. Meurers, D. (2019, June 1). Computational linguistic analysis, assessment, and language development –​Considering language and task [Paper presentation]. EALTA. Dublin, Ireland. www.ealta.eu.org/​ Milanovic, M., Saville, N., & Shuhong, S. (1996). A study of the decision-​m aking behaviour of composition markers. In M. Milanovic & N. Saville (Eds.), Studies in language testing 3: Performance, cognition and assessment (pp. 92–​114). Cambridge University Press. Miller, G., & Miller, R. (2017, April). Roxify Online: Helping students Improve their writing through online feedback [paper presentation]. HASALD 2017. Hong Kong. https://​doi.org/​10.22236/​J ER_​Vol3​Issu​e2pp​152-​167

314 References

Miller, M., & Crocker, L. (1990). Validation methods for direct writing assessment. Applied Measurement in Education, 3(3), 285–​296. https://​doi.org/​10.1207/​s15​3248​ 18am​e 030​3 _ ​6 Milton, J. (2009). Measuring second language vocabulary acquisition. Multilingual Matters. Milton, J. (2010) The development of vocabulary breadth across the CEFR levels. A common basis for the elaboration of language syllabuses, curriculum guidelines, examinations, and textbooks across Europe. In I. Bartning, M. Martin & I. Vedder (Eds.), Communicative proficiency and linguistic development: intersections between SLA and language testing research. EUROSLA Monograph Series, 1 (pp. 211–​231). European Second Language Association. http://​euro​sla.org/​mon​ogra​phs/​EM01/​EM01h​ ome.html Milton, J. (2013). Measuring the contribution of vocabulary knowledge to proficiency in the four skills. In C. Bardel, C. Lindqvist & B. Laufer (Eds.) L2 vocabulary acquisition, knowledge and use: New perspectives on assessment and corpus analysis. Eurosla Monograph Series, 2 (pp. 57–​78). European Second Language Association. www. euro​sla.org/​mon​ogra​phs/​EM02/​EM02h​ome.php Milton J., & Alexiou, T. (2009) Vocabulary size and the Common European Framework of Reference for Languages. In B. Richards, H. Daller, D. Malvern, P. Meara, J. Milton & J. Treffers-​Daller (Eds.), Vocabulary studies in first and second language acquisition (pp. 194–​211). Palgrave Macmillan. Milton, J., Wade, J., & Hopkins, N. (2010). Aural word recognition and oral competence in English as a foreign language. In R. Chacón-​Beltrán, C. Abello-​Contesse & M. Torreblanca-​L ópez (Eds.), Insights into non-​native vocabulary teaching and learning (pp. 83–​98). Multilingual Matters. Mizumoto, A., Sasao, Y., & Webb, S. A. (2019). Developing and evaluating a computerized adaptive testing version of the Word Part Levels Test. Language Testing, 36(1), 101–​123. https://​doi.org/​10.1177/​02655​3221​7725​776 Mogey, N., Paterson, J., Burk, J., & Purcell, M. (2010). Typing compared with handwriting for essay examinations at university: Letting the students choose. ALT-​ J Research in Learning Technology, 18(1), 29–​47. https://​doi.org/​10.1080/​096877​6100​ 3657​580 Monteiro, K., Crossley, S. A., & Kyle, K. (2018, March). In search of new benchmarks: Using non-​native lexical frequency and contextual diversity indices to assess language learning [Paper presentation]. AAAL, Chicago, USA. Morken, F., & Helland, T. (2013). Writing in dyslexia: Product and process. Dyslexia, 19(3), 131‒148. https://​doi.org/​10.1002/​dys.1455 Morris-​Friehe, M., & Leuenberger, J. (1992) Direct and indirect measures of writing for nonlearning disabled and learning disabled college students. Reading and Writing, 4, 281–​296. https://​doi.org/​10.1007/​BF0​1027​152 Murakami, A. (2016). Modeling systematicity and individuality in nonlinear second language development: The case of English grammatical morphemes. Language Learning, 66(4), 834–​871. https://​doi.org/​10.1111/​lang.12166 Murphy, L., & Roca de Larios, J. (2010). Searching for words: One strategic use of the mother tongue by advanced Spanish EFL writers. Journal of Second Language Writing, 19, 61–​81. https://​doi.org/​10.1016/​j.jslw.2010.02.001 Nassaji, H. (2017). Negotiated oral feedback in response to written errors. In H. Nassaji & E. Kartchava (Eds.), Corrective feedback in second language teaching and learning (pp. 114–​128). Routledge.

References  315

Nassaji, H., & Swain, M. (2000). A Vygotskian perspective on corrective feedback in L2: The effect of random versus negotiated help on the learning of English articles. Language Awareness, 9(1), 34–​51. https://​doi.org/​10.1080/​096584​1000​8667​135 Naumann, A., Rieser, S., Musow, S., Hochweber, J., & Hartig, J. (2019). Sensitivity of test items to teaching quality. Learning and Instruction, 60, 41–​53. https://​doi.org/​ 10.1016/​j.learn​i nst​r uc.2018.11.002 Nelson, J, Liu, Y, Fiez, J., & Perfetti, C. (2009) Assimilation and accommodation patterns in ventral occipitotemporal cortex in learning a second writing system. Human Brain Mapping, 30, 819–​820. https://​doi.org/​10.1002/​hbm.20551 Neumann, A. (2012). Advantages and disadvantages of different text coding procedures for research and practice in a school context. In van Steendam, E., Tillema, M., Rijlaarsdam, G. & van den Bergh, H. (Eds.), Measuring writing: Recent insights into theory, methodology and practices (pp. 33–​54). Brill. Newell, A. (1992). Precis of unified theories of cognition. Behavioral and Brain Sciences, 15, 425–​492. https://​doi.org/​10.1017/​S01405​25X0 ​0 069​478 Nikula, T., Dafouz, E., Moore, P., & Smit, U. (Eds.). (2016). Conceptualising integration in CLIL and multilingual education. Multilingual Matters. Norris, J. M., Brown, J. D., Hudson, T. D., & Bonk, W. (2002). Examinee abilities and task difficulty in task-​based second language performance assessment. Language Testing, 19(4), 395–​418. https://​doi.org/​10.1191/​026553​2202​lt23​7oa Norris, J. M., & Manchón, R. (2012). Investigating L2 writing development from multiple perspectives: Issues in theory and research. In R. Manchón (Ed.), L2 writing development: Multiple perspectives (pp. 221–​244). De Gruyter. https://​doi.org/​10.1515/​ 978193​4078​303.221 North, B. (2000). The development of a common framework scale of language proficiency. Peter Lang. North, B., & Schneider, G. (1998). Scaling descriptors for language proficiency scales. Language Testing, 15(2), 217–​263. https://​doi.org/​10.1177/​026​5532​2980​1500​204 OECD (Organisation for Economic Co-​operation and Development). (2018). PISA 2018 results (Volume I). What students know and can do. www.oecd.org/​pisa/​publi​cati​ ons/​pisa-​2018-​resu ​lts-​vol​u me-​i-​5f07c​754-​en.htm Oh, S. (2018) Investigatingtest-​takers’ use of linguistic tools in second language academic writing assessment [Doctoral dissertation, Teachers College, Columbia University]. https://​ doi.org/​10.7916/​D8B00​H DQ Olive, T. (2004). Working memory in writing: Empirical evidence from the dual-​ task technique. European Psychologist, 9(1), 32–​42. https://​doi.org/​10.1027/​ 1016-​9040.9.1.32 Olive, T., & Kellogg, R. T. (2002). Concurrent activation of high-​and low-​level writing processes. Memory & Cognition, 30, 594–​600. https://​doi.org/​10.3758/ ​BF0​ 3194​960 Olive, T., Kellogg, R. T., & Piolat, A. (2008). Verbal, visual, and spatial working memory demands during text composition. Applied Psycholinguistics, 29(4), 669–​687. https://​doi.org/​10.1017/​S01427​1640​8080​284 O’Malley, J. M., & Chamot, A. U. (1990). Learning strategies in second language acquisition. Cambridge University Press. O’Neill, P., Adler-​ K assner, L., Fleischer, C., & Hall, A. M. (2012). Creating the framework for success in postsecondary writing. College English, 76, 520–​524.

316 References

Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-​level L2 writing. Applied Linguistics, 24(4), 492–​518. https://​doi.org/​10.1093/​app​l in/​24.4.492 Ortega, L., & Byrnes, H. (2009). The longitudinal study of advanced L2 capacities. Routledge. Ortega, L., & Iberri-​ Shea, G. (2005). Longitudinal research in second language acquisition: Recent trends and future directions. Annual Review of Applied Linguistics, 25, 26–​45. https://​doi.org/​10.1017/​S02671​9050​5000 ​024 Oxford, R. (1993). Research on second language learning strategies. Annual Review of Applied Linguistics, 13, 175–​187. https://​doi.org/​10.1017/​S02671​9050 ​0 002​452 Page, E. (1966). The imminence of grading essays by computer. Phi Delta Kappan, 47, 238–​243. www.jstor.org/​sta​ble/​20371​545 Page, E., & Paulus, D. (1968). The analysis of essays by computer. Final Report to the U.S. Department of Health, Education and Welfare. Office of Education, Bureau of Research. Paek, R. (2005). Recent trends in comparability studies. Pearson Educational Measurement. http://​i ma​g es.pea​r son​a sse​s sme​n ts.com/​i ma​g es/​t mrs/​t mrs ​_ ​r g/​t rends​c omp​s tud​ ies.pdf Pearson (2010). Intelligent Essay Assessor (IEA) fact sheet. https://​m lm.pear​son.com/​g lo​ bal/​a ss​ets/​upl​oad/​I EA-​FactSh​eet.pdf?v143​8887​065 Perkins, D., & Salomon, G. (1989). Are cognitive skills context-​bound? Educational Researcher, 18(1), 16–​25. https://​doi.org/​10.3102/​0 01318​9X01​8001​016 Perry, W., Jr. (1970). Forms of intellectual and ethical development in the college years: A scheme. Holt, Rinehart, and Winston. Piaget, J. (1959). The language and thought of the child (4th Ed.) M. Gabain & R. Gabain (Trans.), Routledge & Kegan Paul. Piaget, J. (1967). Six psychological studies. In A. Tenzer & D. Elkind (Trans.), Random House. (Original work published 1964). Piolat, A., Barbier, M.-​L ., & Royssey, J.-​Y. (2008). Fluency and cognitive effort during first-​and second-​ language notetaking and writing by undergraduate students. European Psychologist, 13(2), 114‒125. https://​doi.org/​10.1027/​1016-​9040.13.2.114 Pirnay-​Dummer, P. (2016). Linguistic analysis tools. In C. MacArthur, S. Graham & J. Fitzgerald (Eds.), Handbook of writing research (2nd ed.) (pp. 427‒442). Guilford Press. Plakans, L., Gebril, A., & Bilki, Z. (2019). Shaping a score: Complexity, accuracy, and fluency in integrated writing performances. Language Testing, 36(2), 161‒179. https://​ doi.org/​10.1177%2F0​2655​3221​6669​537 Plakans, L., & Gebril, A. (2017). Exploring the relationship and organization and connection with scores in integrated writing assessment. Assessing Writing, 31, 98‒112. https://​doi.org/​10.1016/​j.asw.2016.08.005 Ployhart, R. E., & Vandenberg, R. J. (2010). Longitudinal research: The theory, design, and analysis of change. Journal of Management, 36(1), 94–​120. https://​doi.org/​10.1177/​ 01492​0630​9352​110 Poehner, M. E. (2007). Beyond the test: L2 dynamic assessment and the transcendence of mediated learning. The Modern Language Journal, 91(3), 323–​340. https://​doi.org/​ 10.1111/​j.1540-​4781.2007.00583.x Poehner, M.E. (2008). Dynamic assessment: A Vygotskian approach to understanding and promoting L2 development. Springer. Poehner, M.E. (2012). The Zone of Proximal Development and the genesis of self-​ assessment. The Modern Language Journal, 96(4), 610–​622. https://​doi.org/​10.1111/​ j.1540-​4781.2012.01393.x

References  317

Poehner, M.E., & Inbar-​L ourie, O. (Eds.) (2020). Towards a reconceptualization of second language classroom assessment. Praxis and researcher-​teacher partnership. Springer. Poehner, M.E., Kinginger, C., van Compernolle, R., & Lantolf, J. (2018). Pursuing Vygotsky’s dialectical approach to pedagogy and development: A response to Kellogg. Applied Linguistics, 39(3), 429–​433. https://​doi.org/​10.1093/​app​l in/​a mx ​033 Poehner, M.E., & Leontjev, D. (2020). To correct or to cooperate: Mediational processes and L2 development. Language Teaching Research, 24(3), 295–​316. https://​doi.org/​ 10.1177/​13621​6881​8783​212 Polikoff, M. (2016). Evaluating the instructional sensitivity of four states’ student achievement tests. Educational Assessment, 21(2), 102–​119. https://​doi.org/​10.1080/​ 10627​197.2016.1166​342 Polio, C., & Williams, J. (2009). Teaching and testing writing. In M. Long & C. Doughty (Eds.), The handbook of language teaching (pp. 476–​517). Blackwell. Pollitt, A. (1991). Giving students a sporting chance: Assessment by counting and by judging. In J. C. Alderson & B. North (Eds.), Language testing in the 1990s: The communicative legacy (pp. 46–​59). Macmillan Education. Pollitt, A., & Murray, N. (1996). What raters really pay attention to. In M. Milanovic & N. Saville (Eds.), Studies in language testing 3: Performance, cognition and assessment (pp. 74–​91). Cambridge University Press. Popham, W. J. (2009). Diagnosing the diagnostic test. Educational Leadership, 66(6), 90–​91. Prabhu, N. (1987). Second language pedagogy. Oxford University Press. Prior, P. (2006). A sociocultural theory of writing. In C. MacArthur, S. Graham & J. Fitzgerald (Eds.), Handbook of writing research (pp. 54–​66). Guilford Press. Pulakos, E. (1986). The development of training programs to increase accuracy with different rating tasks. Organizational Behavior and Human Decision Processes, 38, 76–​91. https://​doi.org/​10.1016/​0749-​5978(86)90027- ​0 Purpura, J. (2004). Assessing grammar. Cambridge University Press. Purpura, J. (2014). Language learner styles and strategies. In M. Celce-​Murcia, D. Brinton & A. Snow (Eds.), Teaching English as a second or foreign language (4th ed.) (pp. 532–​549). National Geographic Learning, Cengage Learning. Purpura, J. (2016). Second and foreign language assessment. The Modern Language Journal, 100(S1), 190–​208. https://​doi.org/​10.1111/​modl.12308 Purpura, J. E. (2021). A rationale for using a scenario-​based assessment to measure competency-​based, situated second and foreign language proficiency. In M. Masperi, C. Cervini, & Y. Bardière (Eds.), Évaluation des acquisitions langagières: du formatif au certificatif, mediAzioni 32 (pp. A54–​A96). www.med​iazi​oni.sit​lec.unibo.it Quartapelle, F. (Ed.) (2012). Assessment and evaluation in CLIL. AECLIL. http://​aec​ lil.alt​ervi​sta.org/​Sito/​w p-​cont​ent/​uplo​ads/​2 013/​02/​A EC​LIL-​A ss​e ssm​ent-​a nd-​eva​ luat​ion-​i n-​CLIL.pdf Quispersaravia, A., Perez, W., Sobrevilla, M., & Alva-​Manchego, F. (2016). Coh-​ Metrix-​ Esp: A complexity analysis tool for documents written in Spanish. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk & S. Piperidis. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 4694–​ 4698). European Language Resources Association. www.acl​web.org/​a nthol​ogy/​ L16-​1745.pdf

318 References

Quixal, M., & Meurers, D. (2016). How can writing tasks be characterized in a way serving pedagogical goals and automatic analysis needs? CALICO Journal, 33(1), 19–​48. https://​doi.org/​10.1558/​cj.v33i1.26543 Rahimi, M., Kushki, A., & Nassaji, H. (2015). Diagnostic and developmental potentials of dynamic assessment for L2 writing. Language and Sociocultural Theory, 2(2), 185–​ 208. https://​doi.org/​10.1558/​l st.v2i2.25956 Ranalli, J., Link, S., & Chukharev-​Hudilainen, E. (2017). Automated writing evaluation for formative assessment of second language writing: Investigating the accuracy and usefulness of feedback as part of argument-​based validation. Educational Psychology, 37(1), 8–​25. https://​doi.org/​10.1080/​01443​410.2015.1136​407 Ransdell, S., Arecco M., & Levy, C.(2001). Bilingual long-​ term working memory: The effects of working memory loads on writing quality and fluency. Applied Psycholinguistics, 22(1), 113–​128. https://​doi.org/​10.1017/​S01427​1640​1001​060 Read, J., & von Randow, J. (2016). Extending post-​entry assessment to the doctoral level: New challenges and opportunities. In J. Read (Ed.), Post-​admission language assessment of university students (pp. 137–​156). Springer. Reiff, M., & Bawarshi, A. (2011). Tracing discursive resources: How students use prior genre knowledge to negotiate new writing contexts in first-​ year composition. Written Communication, 28(3), 312–​337. https://​doi.org/​10.1177/​07410​8831​1410​183 Révész, A., Lu, X., & Pellicer-​Sánchez, A. (2022), Directions for future methodologies to capture the processing dimension of L2 writing and written corrective feedback. In R. Manchón & C. Polio, (Eds.), The Routledge handbook of second language acquisition and writing (pp. 339–​355). Routledge. Riazi, A. M. (2016). The Routledge encyclopedia of research methods in applied linguistics: quantitative, qualitative, and mixed-​methods research. Routledge. Richardson, U., & Lyytinen, H. (2014). The GraphoGame method: The theoretical and methodological background of the technology-​enhanced learning environment for learning to read. Human Technology, 10(1), 39–​60. http://​d x.doi.org/​10.17011/​ht/​ urn.20140​5281​859 /​ https://​ht.csr-​pub.eu/​i ndex.php/​ht/​a rti​cle/​v iew/​153 Ringbom, H. (2006). The importance of different types of similarity in transfer studies. In J. Arabski (Ed.), Cross-​linguistic influences in the second language lexicon (pp. 36–​45). Multilingual Matters. Rinnert, C., & Kobayashi, H. (2009). Situated writing practices in foreign language settings: The role of previous experience and instruction. In R. Manchón (Ed.), Writing in foreign language context: Learning, teaching, and research (pp. 23–​ 48). Multilingual Matters. Roberts, L., Pallotti G., & Bettoni, C. (2011). EUROSLA Yearbook: Volume 11. John Benjamins Publishing Company. https://​doi.org/​10.1075/​euro​sla.11 Robinson, P. (2001a). Task complexity, cognitive resources, and syllabus design: A triadic framework for investigating task influences on SLA. In P. Robinson (Ed.), Cognition and second language instruction (pp. 287–​318). Cambridge University Press. Robinson, P. (2001b). Task complexity, task difficulty, and task production: Exploring interactions in a componential framework. Applied Linguistics, 22, 27–​57. https://​doi. org/​10.1093/​app​l in/​22.1.27 Robinson, P. (2005). Cognitive complexity and task sequencing: A review of studies in a Componential Framework for second language task design. International Review of Applied Linguistics in Language Teaching, 43(1), 1–​33. https://​doi.org/​10.1515/​ iral.2005.43.1.1

References  319

Robinson, P. (2007). Criteria for grading and sequencing pedagogic tasks. In M. del Pilar & G. Mayo (Ed.), Investigating tasks in formal language learning (pp. 7–​27). Multilingual Matters. Robinson, P., & Gilabert, R. (2007). Task complexity, the Cognition Hypothesis and second language learning and performance. IRAL –​ International Review of Applied Linguistics in Language Teaching, 45(3), 161–​176. https://​doi.org/​10.1515/​ IRAL.2007.007 Robinson, B., & Mervis, C. (1998). Disentangling early language development: Modeling lexical and grammatical acquisition using an extension of case-​study methodology. Developmental Psychology, 34, 363–​375. https://​doi.org/​10.1037/​0 012-​1649.34.2.363 Roca de Larios, J., Marín, J., & Murphy, L. (2001). A temporal analysis of formulation processes in L1 and L2 writing. Language Learning, 51(3), 497–​538. https://​doi.org/​ 10.1111/​0 023-​8333.00163 Roca de Larios, J., Manchón, R., Murphy, L., & Marín, J. (2008). The foreign language writer’s strategic behaviour in the allocation of time to writing processes. Journal of Second Language Writing, 17(1), 30–​47. https://​doi.org/​10.1016/​j.jslw.2007.08.005 Roca de Larios, J., Manchón, R., & Murphy, L. (2006). Generating text in native and foreign language writing: A temporal analysis of problem solving formulation processes. The Modern Language Journal, 90(1), 100–​114. https://​doi.org/​10.1111/​ j.1540-​4781.2006.00387.x Roca de Larios, J., Marín, J., & Murphy, L. (2001). A temporal analysis of formulation processes in L1 and L2 writing. Language Learning, 51(3), 497–​538. https://​doi.org/​ 10.1111/​0 023-​8333.00163 Roca de Larios J., Murphy L., & Marín, J. (2002). Critical examination of L2 writing process research. In S. Ransdell & M. Barbiere (Eds.), New directions for research in L2 writing (pp. 11–​47). Springer. Roca de Larios, J., Nicolás-​Conesa, F., & Coyle, Y. (2016). Focus on writers: Processes and strategies. In R. Manchón & P. Matsuda (Eds.), Handbook of second and foreign language writing (pp. 267‒286). De Gruyter. Romova, Z., & Andrew, M. (2011) Teaching and assessing academic writing via the portfolio: Benefits for learners of English as an additional language. Assessing Writing, 16, 111–​122. https://​doi.org/​10.1016/​j.asw.2011.02.005 Roozen, K. (2010). Tracing trajectories of practice: Repurposing in one student’s developing disciplinary writing processes. Written Communication, 27(3), 318–​354. https://​doi.org/​10.1177/​07410​8831​0373​529 Roscoe, R., & McNamara, D. (2013). Writing Pal: Feasibility of an intelligent writing strategy tutor in the high school classroom. Journal of Educational Psychology, 105(4), 1010–​1025. https://​doi.org/​10.1037/​a0032​340 Roscoe, R., Allen, L., Weston, J., Crossley, S., & McNamara, D. (2014). The Writing Pal intelligent tutoring system: Usability testing and development. Computers and Composition, 34, 39–​59. https://​doi.org/​10.1016/​j.comp​com.2014.09.002 Rosmawati, R. (2016). Dynamic development and interactions of complexity, accuracy, and fluency in ESL academic writing [Doctoral dissertation, The University of Sydney]. http://​hdl.han​d le.net/​2123/​15901 Rupp, A., Vock, M., Harsch, C., & Köller, O. (2008): Developing standards-​based assessment tasks for English as a first foreign language–​Context, processes and outcomes in Germany. Waxmann.

320 References

Russell, M. (1999). Testing on computers: A follow-​up study comparing performance on computer and on paper. Education Policy Analysis Archives, 7(20). https://​epaa.asu. edu/​i ndex.php/​epaa/​a rti​cle/​v iew/​555/​678 Russell, M., & Haney, W. (1997). Testing writing on computers. Education Policy Analysis Archives, 5(3). https://​doi.org/​10.14507/​epaa.v5n3.1997 Russell, M., & Plati, T. (2002). Does it matter with what I write? Comparing performance on paper, computer and portable writing devices. Current Issues in Education, 5(4). https://​cie.asu.edu/​ojs/​i ndex.php/​cieat​a su/​a rti​cle/​v iew/​1621/​662 Russell, M., & Tao, W (2004) The influence of computer-​print on rater scores. Practical Assessment, Research, and Evaluation, 9, Article 10. https://​doi.org/​10.7275/​2efe-​t s97 Sadeghi, K., & Rahmati, T. (2017). Integrating assessment as, for, and of learning in a large-​scale exam preparation course. Assessing Writing, 34, 50–​61. https://​doi.org/​ 10.1016/​j.asw.2017.09.003 Sadler, R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–​144. https://​doi.org/​10.1007/ ​BF0 ​0117​714 Sala-​ Bubaré, A., & Castelló, M. (2017). Writing regulation processes in higher education: a review of two decades of empirical research. Reading and Writing, 1–​21. http://​doi.org/​10.1007/​s11​145-​017-​9808-​3 Sasaki, M. (2000). Toward an empirical model of EFL writing processes: An exploratory study. Journal of Second Language Writing, 9(3) 259–​291. https://​doi.org/​10.1016/​ S1060-​3743(00)00028-​X Sasaki, M. (2002). Building an empirically-​ based model of EFL learners’ writing processes. In S. Ransdell & M.-​L . Barbier (Eds.), New directions for research in L2 writing (pp. 49–​80). Springer. Sasaki, M., & Hirose, K. (1996). Explanatory variables for EFL students’ expository writing. Language Learning, 46(1), 137–​168. https://​doi.org/​10.1111/​j.1467-​ 1770.1996.tb00​643.x Scarino, A. (2009). Assessing intercultural capability in learning languages: Some issues and considerations. Language Teaching, 42(1), 67–​80. https://​doi.org/​10.1017/​S02614​ 4480​8005​417 Shermis, M,. & Burstein, J. (Eds.) (2013). Handbook of automated essay evaluation: Current applications and future directions. Routledge. Shermis, M., Burstein, J., & Bursky, S. (2013). Introduction to automated essay evaluation. In M. Shermis & J. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and future directions (pp. 1–​15). Routledge. Shrestha, P. (2020). Dynamic assessment of students’ academic writing. Springer Schärer, R. (2000). Final report: European Language Portfolıo pılot project phase 1998–​2000. Council of Europe. Schleppegrell, M. J. (2004). The language of schooling: A functional linguistics perspective. Routledge. Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and Second Language Instruction (pp. 3–​32). Cambridge University Press. Schneider, G., & Lenz, P. (2001): European Language Portfolio. Guide for developers. Council of Europe. https://​r m.coe.int/​168​0459​f a3 Schoonen, R. (2005). Generalizability of writing scores: An application of structural equation modeling, Language Testing, 22(1), 1–​30. https://​doi.org/​10.1191%2F02​ 6553​2205​lt29​5oa

References  321

Schoonen, R. (2012). The validity and generalizability of writing scores: The effect of rater, task and language. In E. van Steendam, M. Tillema, G. Rijlaarsdam & H. van den Bergh (Eds.), Measuring writing: Recent insights into theory, methodology and practice (pp. 1–​22). Brill. Schoonen, R., van Gelderen, A., Stoel, R. D., Hulstijn, J., & de Glopper, K. (2011). Modeling the development of L1 and EFL writing proficiency of secondary school students. Language Learning, 61(1), 31–​79. https://​doi.org/​10.1111/​ j.1467-​9922.2010.00590.x Schoonen, R., Snellings, P., Stevenson, M. & van Gelderen, A. (2009). Towards a blueprint of the foreign language writer: The linguistic and cognitive demands of foreign language writing. In R. Manchón (Ed.), Writing in foreign language contexts: Learning, teaching and research (pp. 77–​101). Multilingual Matters. Schoonen, R., van Gelderen, A., de Glopper, K., Hulstijn, J., Simis, A., Snellings, P. & Stevenson, M. (2003). First language and second language writing: the role of linguistic knowledge, speed of processing, and metacognitive knowledge. Language Learning, 53(1), 165–​202. https://​doi.org/​10.1111/​1467-​9922.00213 Schultz, M. (2013). The IntelliMetric automated essay scoring engine–​ A review and an application to Chinese essay scoring. In M. Shermis & J. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 89–​ 98). Routledge. Seow, A. (2002). The writing process and process writing. In J. Richards & W. Renandya (Eds.), Methodology in language teaching (pp. 315–​320). Cambridge University Press. Sfard, A. (1989). Transition from operational to structural conception: The notion of function revisited. In G. Vergnaud, J. Rogalski & M. Artigue (Eds.), Proceedings of the 13th PME International Conference, 3, 151–​158. Shale, D. (1986). Essay reliability: Form and meaning. In E. White, W. Lutz & S. Kamusikiri (Eds.), Assessment of writing: Politics, policies, practices (pp. 76–​96). MLAA. Shanahan, T. (2016). Relationship between reading and writing development. In C. A. MacArthur, S. Graham & J. Fitzgerald (Eds.), Handbook of writing research (2nd ed.) (pp. 194‒207). Guilford Press. Shaw, S. (2003). Legibility and the rating of second language writing: the effect on examiners when assessing handwritten and word-​processed scripts. Research Notes 11, 7‒10. Shaw, S. (2005). Evaluating the impact of word processed text on writing quality and rater behaviour. Research Notes 22, 13‒19. Shaw S., & Weir C. (2007). Examining writing: research and practice in assessing second language writing. Cambridge University Press. Shermis, M., Mzumara, H., Olson, J., & Harrington, S. (2001). On-​line grading of student essays: PEG goes on the World Wide Web. Assessment & Evaluation in Higher Education, 26(3), 247–​259. https://​doi.org/​10.1080/​026029​3012​0 052​404 Shermis, M., Burstein, J., & Bursky, S. (2013). Introduction to automated essay evaluation. In M. Shermis & J. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 1–​15). Routledge. Shin, D., Cimasco, T., & Yi, Y. (2020) Development of metalanguage for multimodal composing: A case study of an L2 writer’s design of multimedia texts. Journal of Second Language Writing, 47, 100714. https://​d oi.org/​10.1016/​ j.jslw.2020.100​714

322 References

Shohamy, E., Gordon, C., & Kraemer, R. (1992). The effect of raters’ background and training on the reliability of direct writing tests. The Modern Language Journal, 76(1), 27–​33. https://​doi.org/​10.2307/​329​895 Shrestha, P., & Coffin, C. (2012). Dynamic assessment, tutor mediation and academic writing development. Assessing Writing, 17(17), 55–​70. https://​doi.org/​10.1016/​ j.asw.2011.11.003 Silva, T. (1993). Toward an understanding of the distinct nature of L2 writing: The ESL research and its implications. TESOL Quarterly, 27(4), 657‒677. https://​doi.org/​ 10.2307/​3587​400 Skehan, P. (1998). A cognitive approach to language learning. Oxford University Press. Skehan P., & Foster, P. (1999).The influence of task structure and processing conditions on narrative retellings. Language Learning, 49(1), 93–​120. https://​doi.org/​10.1111/​ 1467-​9922.00071 Skehan, P., & Foster, P. (2001). Cognition and tasks. In P. Robinson (Ed.), Cognition and second language instruction (pp. 183–​205). Cambridge University Press. https://​doi. org/​10.1017/​CBO97​8113​9524​780.009 Slomp, D. H. (2012). Challenges in assessing the development of writing ability: Theories, constructs and methods. Assessing Writing, 17, 81–​91. https://​doi.org/​10.1016/​ j.asw.2012.02.001 Smit, D. W. (2004). The end of composition studies. Southern Illinois University Press. Smith, D. (2000). Rater judgments in the direct assessment of competency-​ based second language writing ability. In G. Brindley (Ed.), Studies in immigrant English language assessment (pp. 159–​89). National Centre for English Language Teaching and Research, Macquarie University. Snellings, P., van Gelderen, A. & de Glopper, K. (2002). Lexical retrieval: An aspect of fluent second language production that can be enhanced. Language Learning, 52(4), 723–​754. https://​doi.org/​10.1111/​1467-​9922.00202 Snellings, P., van Gelderen, A. & de Glopper, K. (2004a). The effect of enhanced lexical retrieval on second language writing: A classroom experiment. Applied Psycholinguistics, 25(2), 175–​200. https://​doi.org/​10.1017/​S01427​1640​4001​092 Snellings, P., van Gelderen, A., & de Glopper, K. (2004b). Validating a test of second language written lexical retrieval: a new measure of fluency in written language production. Language Testing, 21(2), 174–​201. https://​doi.org/​10.1191/​026553​2204​ lt27​6oa Spandel, V. (2012). Creating young writers: Using the six traits to enrich writing process in primary classrooms. Pearson. Spelman Miller, K. (2000). Academic writers on-​line: Investigating pausing in the production of text. Language Teaching Research, 4(2), 123–​148. https://​doi.org/​ 10.1177/​136​2168​8000​0400​203 Struthers, L., Lapadat, J., & MacMillan, P. (2013) Assessing cohesion in children’s writing: Development of a checklist. Assessing Writing, 18, 187–​201. https://​doi.org/​ 10.1016/​j.asw.2013.05.001 Stevenson, M. (2005) Reading and writing in a foreign language. A comparison of conceptual and linguistic processes in Dutch and English [Unpublished doctoral dissertation]. University of Amsterdam. https://​hdl.han​d le.net/​11245/​1.310​727 Stevenson, M., Schoonen, R., & de Glopper, K. (2006). Revising in two languages: A multi-​d imensional comparison of online writing revisions in L1 and FL. Journal of Second Language Writing, 15(3), 201–​233. https://​doi.org/​10.1016/​j.jslw.2006.06.002

References  323

Stevenson, M., & Phakiti, A. (2014). The effects of computer-​ generated feedback on the quality of writing. Assessing Writing, 19, 51–​65. https://​doi.org/​10.1016/​ J.ASW.2013.11.007 Sternberg, R. J., & Grigorenko, E. L. (2002). Dynamic Testing. The nature and measurement of learning potential. Cambridge University Press. Stewart, O. (2013). The influence of limiting working memory resources on contextual facilitation in language processing [Unpublished doctoral dissertation]. University of Edinburgh. Storch, N. (2013). Collaborative writing in L2 classrooms. Multilingual Matters. Strobl, C., Ailhaud, E., Benetos, K., Devitt, A., Kruse, O., Proske, A., & Rapp, C. (2019). Digital support for academic writing: A review of technologies and pedagogies. Computers & Education, 131, 33– ​48. https://​doi.org/​10.1016/​j.comp​ edu.2018.12.005 Swain, M. (1985). Communicative competence: Some roles of comprehensible input and comprehensible output in its development. In S. Gass & C. Madden (Eds.), Input in second language acquisition (pp. 235–​253). Newbury House. Swales, J., & Feak, C. (2004). Academic writing for graduate students: Essential tasks and skills. Michigan Series in English for Academic & Professional Purposes. Tang, C., & Liu, Y.-​T. (2018). Effects of indirect coded corrective feedback with and without short affective teacher comments on L2 writing performance, learner uptake and motivation. Assessing Writing, 35, 26–​40. https://​doi.org/​10.1016/​ J.ASW.2017.12.002 Taylor, L. (2013). Communicating the theory, practice and principles of language testing to test stakeholders: Some reflections. Language Testing 30(3), 403–​412. https://​doi. org/​10.1177/​02655​3221​3480​338 Tilma, C. (2014). The dynamics of foreign versus second language development in Finnish writing [doctoral dissertation, University of Jyväskylä]. http://​u rn.fi/​ URN:ISBN:978-​951-​39-​5869-​5 Tolchinsky, L. (2016). From text to language and back. The emergence of written language. In C. A. MacArthur, S. Graham & J. Fitzgerald (Eds.), Handbook of writing research (2nd ed.) (pp. 144‒159). Guilford Press. Tomlin, R., & Villa, V. (1994). Attention in cognitive science and second language acquisition. Studies in Second Language Acquisition, 16(2), 183–​203. https://​doi.org/​ 10.1017/​S02722​6310​0 012​870 Torrance, M. (2016). Understanding planning in text production. In C. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of writing research (2nd ed.) (pp. 72–​87). Guilford Press. Torrance, M., & Galbraith, D. (2006). The Processing Demands of Writing. In C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of writing research. 67–​80. Guilford Press. Torrance, M., & Jeffery, G. (1999). Writing processes and cognitive demands. In M. Torrance & G. C. Jeffery (Eds.), The cognitive demands of writing. Processing capacity and working memory in text production (pp. 1‒11). Amsterdam University Press. Turner, C. E., & Purpura, J. E. (2016). Learning-​oriented assessment in second and foreign language classrooms. In D. Tsagari & J. Banerjee (Eds.), Handbook of Second Language Assessment (pp. 255–​274). De Gruyter.

324 References

Turner, C. E., & Upshur, J. A. (2002). Rating scales derived from student samples: Effects of the scale maker and the student sample on scale content and student scores. TESOL Quarterly, 36(1), 49–​70. https://​doi.org/​10.2307/​3588​360 Ünaldi, I. (2016). Self and teacher assessment as predictors of proficiency levels of Turkish EFL learners. Assessment & Evaluation in Higher Education, 41(1), 67–​80. https://​doi.org/​10.1080/​02602​938.2014.980​223 UNESCO (United Nations Educational, Scientific and Cultural Organization). (2006). Education for all global monitoring report 2006: Literacy for life. https://​u nes​doc. une​sco.org/​a rk:/​48223/​pf000 ​0141​639 UNESCO (United Nations Educational, Scientific and Cultural Organization). (2008). The Global Literacy Challenge. https://​u nes​doc.une​sco.org/​a rk:/​48223/​pf000​ 0163​170 University of Auckland. (n.d). DELNA handbook for students. Retrieved from https://​ cdn.auckl​a nd.ac.nz/​a ss​ets/​ece/​for/​curr​ent-​stude​nts/​acade​m ic-​i nfo​r mat​ion/​engl​ish-​ langu​age-​supp​ort/​docume​nts/​delna-​handb​ook.pdf Unsworth, L., & Mills, K. (2020). English language teaching of attitude and emotion in digital multimodal composition. Journal of Second Language Writing, 47, 100712. https://​doi.org/​10.1016/​j.jslw.2020.100​712 Uppstad, P., & Solheim, O. (2007). Aspects of fluency in writing. Journal of Psycholinguistic Research, 36, 79–​87. https://​doi.org/​10.1007/​s10​936-​0 06-​9034-​7 Valdes, G., Haro, P., & Echevarriarza, M. (1992). The development of writing abilities in a foreign language: Contributions toward a general theory of L2 writing. The Modern Language Journal, 76(3), 333–​352. https://​doi.org/​10.1111/​j.1540-​4781.1992. tb07​0 03.x van den Bergh, H., Rijlaarsdam, G., & van Steendam, E. (2016). Writing process theory: A functional dynamic approach. In C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of writing research (2nd ed.) (p. 57–​71). Guilford Press. Vanderberg, R., & Swanson, H. L. (2007). Which components of working memory are important in the writing process? Reading and Writing: An Interdisciplinary Journal, 20(7), 721‒752. https://​doi.org/​10.1007/​s11​145- ​0 06-​9046- ​6 Van Lier, L. (2006). The ecology and semiotics of language learning: A sociocultural perspective. Springer. Van Waes, L., Leijten, M., Lindgren, E., & Wengelin, Å. (2016). Keystroke logging in writing research: Analyzing online writing processes. I. In C. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of writing research (2nd ed.) (pp. 410–​426). Guilford Press. van Weijen, D. (2009). Writing processes, text quality, and task effects: Empirical studies in first and second language writing. LOT. www.lotp​ubli​cati​ons.nl/​Docume​nts/​201_ ​f ​u llt​ ext.pdf Vaughan, C. (1991). Holistic assessment: What goes on in the rater’s mind? In L. Hamp Lyons (Ed.), Assessing second language writing in academic contexts (pp. 111–​125). Ablex. Verspoor, M. H., & Smiskova, H. (2012). Foreign language writing development from a dynamic usage based perspective. In R. Manchón (Ed.), L2 writing development: Multiple perspectives (pp. 17–​46). De Gruyter Mouton. Verspoor, M., de Bot. K., & Lowie. W. (2004). Dynamic systems theory and variation: a case study in L2-​w riting. In M. Hannay, H. Aertsen, &, R. Lyall (Eds.), Words in their places. A Festschrift for J. Lachlan (pp. 407–​421). Free University Press.

References  325

Vögelin, C., Jansen, T., Keller, S., Machts, N., & Möller, J. (2019). The influence of lexical features on teacher judgements of ESL argumentative essays. Assessing Writing, 39, 50–​63. https://​doi.org/​10.1016/​j.asw.2018.12.003 Vogt, K., & Tsagari, D. (2014) Assessment literacy of foreign language teachers: Findings of a European study. Language Assessment Quarterly. 11(4), 374–​402. https://​doi.org/​ 10.1080/​15434​303.2014.960​046 Vygotsky, L. S. (1978). Mind in society. The development of higher psychological processes. Harvard University Press. Vygotsky, L. S. (1987). The collected works of L. S. Vygotsky, Vol. 1, Problems of general psychology: Including the volume Thinking and speech. In R. Rieber & A. Carton (Eds.). Plenum. (Original Work Published 1934) Wang, W., & Wen, Q. (2002). L1 use in the L2 composing process: An exploratory study of 16 Chinese EFL writers. Journal of Second Language Writing 11, 225–​246. https://​doi.org/​10.1016/​S1060-​3743(02)00084-​X Wardle, E. (2007). Understanding ‘transfer’ from FYC: Preliminary results of a longitudinal study. Writing Program Administration, 31(2), 65–​85. Warschauer, M., & Grimes, D. (2008). Automated Writing Assessment in the Classroom. Pedagogies: An International Journal, 3(1), 22–​36. https://​doi.org/​10.1080/​155448​0 070​ 1771​580 Wechsler, D. (1945). Standardized memory scale for clinical use. Journal of Psychology, 19, 57–​95. https://​doi.org/​10.1080/​0 0223​980.1945.9917​223 Weigle, S. (1994). Effects of training on raters of ESL compositions. Language Testing, 11(2), 197–​223. https://​doi.org/​10.1177/​026​5532​2940​1100​206 Weigle, S. (1999). Investigating rater/​ prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing writing, 6(2), 145–​178. https://​doi.org/​10.1016/​S1075-​2935(00)00010-​6 Weigle, S. (2002). Assessing writing. Cambridge University Press. Weigle S. (2005). Second language writing expertise. In K. Johnson (Ed.), Expertise in second language learning and teaching (pp. 128–​149). Palgrave Macmillan. Weigle, S. (2007). Teaching writing teachers about assessment. Journal of Second Language Writing, 16(3), 194–​209. https://​doi.org/​10.1016/​j.jslw.2007.07.004 Weigle, S. (2013). English as a second language writing and automated essay evaluation. In M. Shermis & J. Burstein, (Eds.), Handbook of automated essay evaluation (pp. 36–​ 54). Routledge. Weir, C., Yan, J., O’Sullivan, B., & Bax, S. (2007). Does the computer make a difference?: The reaction of candidates to a computer-​based versus a traditional hand-​w ritten form of the IELTS writing component: Effects and impact. IELTS Research Reports, 7, 311–​347. Wengelin, Å. (2002). Text production in adults with reading and writing difficulties. Vol. 20 Gothenburg Monographs of Linguistics. University of Gothenburg. Wengelin, Å. (2006). Examining pauses in writing: Theory, methods and empirical data. In K. Sullivan & E. Lindgren (Eds.), Computer key-​stroke logging and writing: Methods and applications (pp. 107–​130). Brill. Wengelin, Å. (2007) The word-​level focus in text production by adults with reading and writing difficulties. In M. Torrance, L. van Waes & D. Galbraith (Eds.), Writing and cognition: Research and applications (pp. 67–​82). Elsevier.

326 References

Wengelin, Å., Torrance, M., Holmqvist, K., Simpson, S., Galbraith, D., Johansson, V., & Johansson, R. (2009). Combined eyetracking and keystroke-​logging methods for studying cognitive processes in text production. Behavior Research Methods, 41, 337–​ 351. https://​doi.org/​10.3758/​BRM.41.2.337 Wengelin, Å., Johanson, V., & Johanson, R. (2014). Expressive writing in Swedish 15-​year-​olds with reading and writing difficulties. In B. Arfé, J. Dockrell & V. Berninger (Eds.), Writing development in children with hearing loss, dyslexia, or oral language problems: Implications for assessment and instruction (pp. 242–​ 269). Oxford University Press. Wengelin, Å., Frid, J., Johansson, R., & Johansson, V. (2019). Combining keystroke logging with other methods. Towards an experimental environment for writing process research. In E. Lindgren & K. Sullivan (Eds.), Observing writing: Insights from keystroke logging and handwriting (pp. 30–​49). Brill. Whithaus, C., Harrison, S.B., & Midyette, J. (2008). Keyboarding compared with handwriting on a high-​stakes writing assessment: Student choice of composing medium, raters’ perceptions, and text quality. Assessing Writing, 13(1), 4–​25. https://​ doi.org/​10.1016/​j.asw.2008.03.001 Wertsch, J. V. (1998). Mind as action. Oxford University Press. Westheimer, J. (2008). Learning among colleagues: Teacher community and shared enterprise in education. In M. Cochran-​Smith, S. Feiman-​Nemser, D. J. McIntyre & K. Demers (Eds.), Handbook of research on teacher education (3rd ed.) (pp. 756–​783). Routledge. Wigglesworth, G., & Storch, N. (2012). Feedback and writing development through collaboration: A socio-​ cultural approach. In R. M. Manchon (Ed.), L2 writing development: Multiple perspectives (pp. 69–​99). De Gruyter. Wiliam, D., & Leahy, S. (2015). Embedding formative assessment: Practical techniques for K-​ 12 classrooms. Learning Sciences International. Wilson, J. (2013). The role of social relationships in the writing of multilingual adolescents. In L. De Oliveira & T. Silva (Eds.), L2 writing in secondary classrooms: Student experiences, academic issues, and teacher education (pp. 87–​103). Routledge. Wilson, J., Roscoe, R., & Ahmed, Y. (2017). Automated formative writing assessment using a levels of language framework. Assessing Writing, 34, 16–​36. https://​doi.org/​ 10.1016/​j.asw.2017.08.002 Wind, A. (2013). Second language writing development from a Dynamic Systems Theory perspective. Papers from the Lancaster University Postgraduate Conference in Linguistics & Language Teaching 2013. www.lancas​ter.ac.uk/​f ass/​eve​nts/​laelp​gcon​fere​ nce/​pap​ers/​v 08/​Att​i la-​M-​Wind.pdf Wohlpart, A., Lindsey, C., & Rademacher, C. (2008). The reliability of computer software to score essays: Innovations in a humanities course. Computers and Composition, 25(2), 203–​223. https://​doi.org/​10.1016/​j.comp​com.2008.04.001 Wolf, M. (1986). Rapid alternating stimulus (R.A.S.) naming: A longitudinal study in average and impaired readers. Brain and Language, 27(2), 360–​379. https://​doi.org/​ 10.1016/​0 093-​934X(86)90025-​8 Wolfe, E., Bolton, S., Feltovich, B., & Niday, D. (1996). The influence of student experience with word processors on the quality of essays written for a direct writing assessment. Assessing Writing, 3(2), 123–​147. https://​doi.org/​10.1016/​ S1075-​2935(96)90010-​0

References  327

Wolfe, E., & Manalo, J. (2005). An investigation of the impact of composition medium on the quality of TOEFL writing scores. ETS Research Report Series, i–​58. https://​ doi.org/​10.1002/​j.2333-​8504.2004.tb01​956.x Xerri, D. & Vella Briffa, D. (Eds.). (2018). Teacher involvement in high stakes language testing. Springer. Xi, X. (2017). What does corpus linguistics have to offer to language assessment? Language Testing, 34(4), 565–​577. https://​doi.org/​10.1177/​02655​3221​7720​956 Xie, Q. (2017). Diagnosing university students’ academic writing in English: Is cognitive diagnostic modelling the way forward? Educational Psychology, 37(1), 26–​ 47. https://​doi.org/​10.1080/​01443​410.2016.1202​900 Xie, Q., & Lei, Y. (2022). Diagnostic assessment of L2 academic writing product, process and self-​regulatory strategy use with a comparative dimension. Language Assessment Quarterly, 19(3), 231–​263. https://​doi.org/​10.1080/​15434​303.2021.1903​470 Xu, Y. & Brown, G. (2016). Teacher assessment literacy in practice: A reconceptualization. Teaching and Teacher Education 58, 149–​162. https://​doi.org/​10.1016/​j.tate.2016.05.010 Yan, D., & Bridgeman, B. (2020). Validation of automated scoring systems. In D. Yan, A. Rupp & P. Foltz (Eds.), Handbook of automated scoring. Theory into practice (pp. 297–​318). CRC Press. Yan, D., Rupp, A., & Foltz, P. (Eds.) (2020). Handbook of automated scoring. Theory into practice. CRC Press. Yang, R. (2003). Investigating how test-​ t akers use the DIALANG feedback [Unpublished master’s thesis]. Lancaster University. Yang, W., & Sun, Y. (2015). Dynamic development of complexity, accuracy and fluency in multilingual learners’ L1, L2 and L3 writing. Theory and Practice in Language Studies, 5(2), 298–​308. http://​doi.org/​10.17507/​t pls.0502.09 Yang, Y.-​F. (2018). New language knowledge construction through indirect feedback in web-​based collaborative writing. Computer Assisted Language Learning, 31(4), 459–​ 480. https://​doi.org/​10.1080/​09588​221.2017.1414​852 Yeong, S., & Rickard Liow, S. (2011). Cognitive-​l inguistic foundations of early spelling development in bilinguals. Journal of Educational Psychology, 103(2), 470‒488. https://​ doi.org/​10.1037/​a0022​437 Yi, Y. (2012). Implementing a cognitive diagnostic assessment in an institutional test: a new networking model in language testing and experiment with a new psychometric model and task type [Unpublished doctoral dissertation]. University of Illinois at Urbana Champaign. Yi, Y., Shin, D., & Cimasco, T. (2020) Editorial. Special issue: Multimodal composing in multilingual learning and teaching contexts. Journal of Second Language Writing, 47, https://​doi.org/​10.1016/​j.jslw.2020.100​710 Yoon, H.-​J. (2017). Textual voice elements and voice strength in EFL argumentative writing. Assessing Writing, 32, 72–​84. https://​doi.org/​10.1016/​j.asw.2017.02.002 Zhang, Z. (Victor), & Hyland, K. (2018). Student engagement with teacher and automated feedback on L2 writing. Assessing Writing, 36, 90–​102. https://​doi.org/​ 10.1016/​J.ASW.2018.02.004 Zhao, C. (2013). Measuring authorial voice strength in L2 argumentative writing: The development and validation of an analytic rubric. Language Testing, 30(2), 201–​230. https://​doi.org/​10.1177/​02655​3221​2456​965 Zhao, C. (2017). Voice in timed L2 argumentative essay writing. Assessing Writing, 31, 73–​83. https://​doi.org/​10.1016/​j.asw.2016.08.004

328 References

Zhou, Y., & Dai, Z. (2016). Empirical studies on correlations between lexical knowledge and English proficiency of Chinese EFL learners in mainland China over the past two decades. Journal of Education and Practice, 7(26), 152–​158. www.iiste.org/​Journ​ als/​i ndex.php/​J EP/​a rti​cle/​v iew/​33058/​33954 Zimmermann, R. (2000). L2 writing: subprocesses, a model of formulating and empirical findings. Learning and Instruction, 10(1), 73–​99. https://​doi.org/​10.1016/​ S0959-​4752(99)00019-​5

INDEX

Note: Page locators in bold and italics represents tables and figures, respectively. acquisition metaphor 16 adaptive transfer 38–40 agency/agents 226, 241–3, 258–9, 261 agents in diagnosis-as-research 80–2 American Council on the Teaching of Foreign Languages (ACTFL) 19; Proficiency Guidelines 19 associative writing 17 Authorial Voice Analyzer 232 automated analysis of writing 205–6, 266; need for 206–7; strengths of 206–7; use of natural language processing (NLP) technologies 218 automated feedback 224–7; agency/ agents 226; delivery 226; expected response 227; focus 226; timing 226–7 automated scoring systems 208–10 automated text analysis tools for research purposes 214–18, 283; diagnostic usefulness of 217–18 automated writing evaluation systems 210–13; constructs measured by 219–22; diagnostic usefulness of 213–14; feedback 249–51; future developments in 231–4; tasks suitable for 222–4; usefulness of 227–31 automated writing evaluation (AWE) systems 207–8

Bereiter–Scardamalia’s model of writing process 59–60, 60; fixed-topic strategy 59; flexible-focus strategy 59; knowledge-telling to knowledgetransforming strategies 59–60, 60–1, 68; topic-elaboration strategy 60 Borner’s model of SFL writing development 69, 69 brain imaging 83, 84, 154 butterfly effect 44 classroom-based diagnostic assessment 21, 99, 121, 141–2, 200–2, 265, 271–7, 272 coding/encoding skills 15 cognitive views on writing development 9–10, 17–31, 68–72, 69–70, 265; Borner’s model 69, 69; cognitive models of L1 writing 55–64; communicative and linguistic features 18–28; division of resources between working memory 66–8; insights for diagnostic assessment 18, 25–6; in L1 and SFL learners 17–18; L1 influence 69, 71; long-term memory (LTM) and 64–5; relationship between domains of listening and reading 53–5; resources of working memory 65–6; role of

330 Index

language 68–9; transcription 54; Zimmermann’s model 70, 70–1; see also Common European Framework of Reference for Languages (CEFR) Coh-Metrix 214, 214–15, 218, 228, 232 collaborative professional development (CPD) 277–8 Common European Framework of Reference for Languages (CEFR) 4, 19, 99–101, 129–32, 136, 183, 257; diagnostic potential 25–6, 28; English Profile program 26; illustrative examples of research 26–8; language learning 23–6; levels of proficiency 19, 20, 48, 49; limitations 21–2; linear and nonlinear development 23, 23–4; linguistic characteristics of 26; microand macro-level development 22–3; Profile Deutsch program 26; relevance 21; relevance to diagnosing writing 82, 87; scale systems 19, 20–1, 21–2, 48; syntactic complexity 26–8; vocabulary size, investigation of 27–8; word derivation 28 communicative writing 17, 100, 132 conflict resolution 160–1 constructive planning 158–61; building an initial task representation 158; conflict resolution 160–1; creating how-to elaborations 159–60; generating network of working goals 158–9; goals 159; integrated plans 159 constructs 84–6 contexts in diagnosis-as-research 80–2 Council of Europe 4, 6, 19, 120, 183 Criterion 210, 213, 218; feedback 228–30 delivery of feedback 226, 243–4, 250, 259, 261 diagnosing writing 265; academic writing 283–4; agents and contexts 80–2; in classroom 271–7, 272; constructs and instruments 84–6; development, processes and products 82–4, 83; directness of writing tasks 85–6; direct tasks 86, 281–2; dynamic assessment 85, 283; examples 87–110; future research and practice 282–4; granularity of 279; indirect tasks 86, 281–2; individualization of 270–1; integrated writing tasks 284;

inter-and intrapersonal factors 86–7; learners’ L1, SFL proficiency and prior writing experience, implication of 266–70; longitudinal diagnosis 84, 84; monitoring progress 279–81; principles of diagnostic assessment 81; process-oriented approach 83; relevance of CEFR 87; of SFL writing 115–17, 262–3; strategies 150; tests and instruments 87–114; types of diagnosers 153 diagnostic cycle 4–7, 5, 7, 14, 52, 79–81, 118, 142, 148, 176, 194, 206, 213–4, 235, 271–7, 272 Diagnostic English Language Needs Assessment (DELNA) 10, 97–9, 115; as basis for rating scale development 186, 187; description of 98; diagnostic part 97; large-scale diagnosis 202; rating procedures 99; screening part 97 diagnostic feedback on writing 240–9; agency/agents 241–3; delivery of 243–5; effectiveness of 249; focus of 245–7; implications for diagnosing SFL writing 262–3; requested responses 248; timing of 247–8; understanding of goals 240; wording and presentation 243 diagnostic methods 153–7, 155, 173; brain-imaging or neuroimaging 156–7; checklists 154; diaries 154; discussion 156; eye-tracking 156; feedback tools 156; interview 156; keystroke-logging 156; observation and self-observation 154; portfolio 154; retrospective interviews 156; think-aloud 156; verbal protocols 156; word processing tools 156 diagnostic rating scales 177, 265; assessment criteria 181–2; challenges for human raters 191–3; descriptors 183–4; design principles 182–9, 187, 190; holistic and analytic scales 178–80; implications for diagnosis 193–5; importance of feedback 201–2; language features 188–9; large-scale monitoring vs classroombased diagnosis 200–2; level-specific performance criteria 180–1; limitations 202–3; properties of 177–82; qualitative and quantitative 183; rater training 195–200; task-specific scales 182; see also rater training

Index  331

diagnostic second or foreign language (SFL) assessment 1–2; basic questions about 7–9; in classroom contexts 5; formative assessment 5; learners’ performances, interpretation of 6; reading ability 2–4; self or peerassessment 5; writing ability 2–4 DIALANG 6, 8, 10, 25, 87, 93–7, 94–6, 113–16, 120, 225; description of 94–5; feedback in 86, 96–7, 248, 258–60; inferencing in 95; users’ perception of 96; writing component of 96 directness of writing tasks 85–6 discourse community knowledge 36, 41 dynamic assessment (DA) 34, 85, 110–13, 283; interactionist 34, 113–14; interventionist 34, 110–12; mediation 251–6; reciprocity in 34; transcendence in 34 Dynamic Systems Theory of writing development 44–7; cause-and-effect relationship between variables 45; core principles 44; examples 46–7; insights for diagnosis 46; nonlinearity 45–6; self-adaptation and self-organization 45; understanding of writing development in 45–6; variation in 45–6 dyslexia 77–8; Jyvaskyla Longitudinal Study 89; in L1 164 EDD-checklist 10 Educational Testing Service (ETS) 208–10, 220 Ekapeli/GraphoLearn 88–91, 90, 116; effectiveness in improving writing 91; learning environment 89 Empirically-derived Descriptor-based Diagnostic (EDD) checklist 115; checklist 104, 106; description of 104–5; diagnostic profile 105–6 epistemic writing 17 e-rater system 208–10, 282; see also automated analysis of writing European Language Portfolio (ELP) 10, 106–10, 115; challenges for learners 109; checklist 108; criteria of positiveness 109; description of 107; self-assessment grid and can-do statements 107–8

expected response in feedback 227 expertise development 35–42; acclimation stage 35; adaptive transfer 39–40; Carter's stages 38; competence stage 35; diagnosis of 37–8; diagnostic assessment 38–40; examples 40–1; proficiency stage 35; see also Model of Domain Learning (MDL) eye-tracking 3, 11, 83, 84, 154, 156, 163, 202, 273 far transfer 39, 43, 138 feedback 12, 77, 86, 151, 169, 176, 201–2, 205–6, 233, 235–6, 266; automated 224–7; diagnostic feedback on writing 240–9; in diagnostic instruments 257–61; effectiveness of 242–3; error-corrective 245; Hattie and Timperley’s model 237–40, 238, 246–7; implications for diagnosing SFL writing 262–3; learners’ socialization and prior experience, importance of 243; mediation and 251–2; as mediation and diagnosis 252–7; negative 244; peer 241–2; process 224; on proficiency levels 244–5; self 225; self-regulatory (or metacognitive) 224–5; task 224; teacher 241; uptake 243; in written or oral form 244; see also automated feedback; diagnostic feedback on writing fluency of writing 72; of lexical retrieval in SFL 73 focus of feedback 226, 245–7 formulation in SFL writing 70–1 genre knowledge 36, 40–1, 119, 134 German National Language Assessment Study (DESI) 202–3 Graduate Record Examination (GRE) 209 graphic transcription 53, 57, 71–2; and spelling in SFL writing 74–6 GraphoLearn 10, 88–9, 91, 114, 116 Hayes–Flower’s model of writing process 55–9, 56, 58, 167; Bereiter and Scardamalia’s model 59–60, 60; control level 58; instructional environment 69; Kellogg’s model 60–3, 62; planning and revising processes 56, 58–9; process level 57;

332 Index

resource level 57; translation process 56–7, 61 Hayes et al. Process Model of Revision 167 instruments 84–6 integrated plans 159 Intelligent Academic Discourse Evaluator (IADE) 211, 213, 226, 230 intelligent computer-aided language learning (ICALL) 283 Intelligent Essay Assessor (IEA) 209–10, 218 intelligent tutoring system (ITS) see Writing Pal (W- Pal) IntelliMetric 209, 218 Journal of Second Language Writing 153 Kellogg’s model of writing process 60–3, 62; construct of working memory 61–2; execution system (execution processes) 61–2; formulation system (formulation processes) 61–2; monitoring system (monitoring processes) 62–3 keystroke-logging 83, 84, 156, 282 knowledge: about universal text attributes 55; discourse community 36, 41; genre 36, 40–1; lexical vs grammatical 29–30; metaknowledge 55; procedural 55; rhetorical 36, 41; of substance and content 55; telling strategy 59; transforming 59; writing process 10, 36 knowledge-driven planning 158 language assessment literacy (LAL) 277–8 language testing 10, 131–2, 222, 284 large-scale diagnostic assessment 6, 8, 21, 30, 94, 99, 120, 125, 130, 138–41, 176, 185, 193, 200–2, 207, 258; see also DIALANG Latent Semantic Analysis (LSA) 218 L1-based writing 2–4 Learning Oriented Assessment (LOA) 42–3 left embeddedness 27 lexical retrieval in SFL writing 72–4 literacy skills 14–15 longitudinal studies of SFL writing development 282

long-term memory (LTM) 64–5, 76 L2 Lexical Complexity Analyzer 215, 216–17 L2 Syntactic Complexity Analyzer (L2SCA) 215, 216–17 L2 writing, human-mediated dynamic assessment of: approach 113; learners’ performance 113–14 L1 writing models 55–64; Bereiter and Scardamalia’s model 59–60, 60; Hayes-Flower model 55–9, 56, 58; implications for SFL writing 63–4; Kellogg’s model 60–3, 62; Zimmermann’s model 71 mediation 19, 33–5, 84, 110–14, 112, 136, 156, 202, 244, 251–6, 255, 276, 283 mediator 34 memory 9; see also long-term memory (LTM); working memory (WM) metaknowledge 55 Model of Domain Learning (MDL) 35–6; Alexander’s model 36–7; Beaufort’s model 36; interest aspect in 37 MY Access! 228–9 natural language processing (NLP) 208 natural language processing (NLP) technologies 208–10, 212, 216, 223, 231–2 near transfer 39 non-alphabetic writing systems 15 obuchenie 31–2, 34, 252 participation metaphor 16 Pearson Test of English Academic 208–9 peer feedback 241–2; see also feedback performative writing 17 phonological awareness 55, 75–6 planning in writing process 158–62; deliberate 163; integrated plans 159; knowledge-driven planning 158; reactive 163; script –or schema-driven planning 157–8; in SFL writing 161–2; see also constructive planning process writing 149, 165–7, 227, 234, 250, 273 Project Essay Grade (PEG) 208 pseudowords 76

Index  333

Questions Test 111, 114; computerized dynamic assessment (C-DA) and learners’ performance 111–13; description of 110–11; diagnostic value of 111–12 Rapid Alternating Stimulus (RAS) 73–4 Rapid Automatized Naming (RAN) 73–4 rater training 195–200; behavior-driven training 199; hierarchical approach 199–200; inter-rater differences 200; preferences 200; schema-driven or top-down rater training 199; see also diagnostic rating scales rating scales see diagnostic rating scales reading ability, in SFL and L1 context 2–4, 12 rejection 71 requested responses in feedback 248, 260–1 reviewing and revising drafts 165–73, 166–7; definition 168–9; delaying decisions 171; diagnosis 166, 173; evaluation stage 169–70; ignoring problems 171; keystroke-logging and think-aloud protocols 172; in L1 and SFL writing 172–3; paraphrasing 171; process model 167; redrafting 171; responsiveness to feedback 166, 167; searching modification strategies 171; text modification 170–2; writing fluency and 172 rhetorical knowledge 36, 40–1, 220 Roxify 10, 92–3, 226; automatic analysis of vocabulary 93; description of instrument 91–2; feedback in 260–1; participants’ experiences with 93; Text Inspector 93; validation process 92–3 script –or schema-driven planning 157–8 Sentiment Analysis and Cognition Engine (SEANCE) 216 SFL English literacy skill 41 SFL writing 16, 22, 33, 53, 264–5; cognitive models of 68–72, 69–70; contribution of lexical vs grammatical knowledge in 29–30; correlation with reading and listening 30; diagnosis of 9–13, 25, 38–9, 76–9, 173–5; genre awareness 41; graphic transcription

and spelling in 74–6; lexical retrieval in 72–4; production of characters 75; proficiency 30, 63–4, 76–7, 173, 175; progress in language learning 23, 23; role of vocabulary 29; theories of 16; transfer of general concepts and skills 38 socially-oriented theories of SFL writing development 31–43; contextual theories of transfer 43; development of expertise 35–42; diagnostic potential 34; ecological approaches to development 41–3, 43; examples 34–5; individuals’ relationship with environment 32; internalization of norms and values of expressing themselves 33; problem with transfer 32–3; self-psychology theories 43; Zone of Proximal Development (ZPD) 32, 34 speech production models 61 subject-matter knowledge 36, 41 syntactic complexity (SC) 26–8, 215; in SFL writing 46 task complexity 127–8, 131–2, 151 task design in diagnosing writing 118–19, 265; analysis of keystroke logging 145; approaches to capturing development 136; cognitive aspects 119, 143; comparability studies 143–4; computer/keyboarding/word processing skills 145; developmental perspective on writing 133–4; development of direct writing tasks 121–4; diagnostic insights 126–7, 136–8, 137, 146–7; direct and indirect task, comparison of 124–6; discourse features 119; language task characteristics 119, 121; large-scale vs classroom assessment contexts 140–2; level-specific and multi-level approaches to 138–40; link between indirect tasks and task complexity 131–2; pedagogical tasks 128–9; personal characteristics of writer 119–20; principles of diagnostic assessment 142; rater effects 144–5; score comparability 144; SFL writing development 134–6, 145–6; social dimension 119; task demands and task complexity 127–8; task difficulty in

334 Index

direct writing tests 129–31; task format 126, 147; test specifications 132–3 teacher feedback 241; see also feedback teacher training 10, 117, 265 text generation or production 10, 162–5; Cognitive Band operations 163; diagnostic potential 163; Rational Band operations 163; at SFL writing 164 theories of SFL writing development 16; cognitive 17–31; communicative and linguistic stages 18–24; complex dynamic systems view of 44–7; socially-oriented 31–43 timing of feedback 226–7, 247–8 TOEFL iBT 208–9 Tool for the Automatic Analysis of Cohesion (TAACO) 216 Tool for the Automatic Analysis of Lexical Diversity (TAALED) 216 Tool for the Automatic Analysis of Lexical Sophistication (TAALES) 216, 232 Tool for the Automatic Analysis of Syntactic Sophistication and Complexity (TAASSC) 216 transcription 54 translation process 56–7, 61 unified writing 17 VERA8 (Vergleichsarbeiten Klasse 8) 10, 99–104, 100–1, 101–3, 115–16, 130; as basis for rating scale development 185–6, 187; benchmark text 196, 197–8; criteria used 100; description of 100; diagnostic checklist 102–3; feedback in 257–8; level-specific demands and expectations 100; test packages 99 word forming task 88–91 working memory (WM) 60–1, 76; central executive 61–3; division of resources between 66–8; phonological loop 61–3; resources of 65–6; visuo-spatial sketchpad 61–2 WriteToLearn system 210, 228

writing ability, in SFL and L1 context 2–4, 9–10, 17; development of 14–16; English proficiency 29–30; independent vs source- based writing tasks 29; L1 Dutch writing 30; lexical vs grammatical knowledge 29–30; L1 Japanese writing 30; metacognitive knowledge, role of 30; role of vocabulary 29 writing development: children vs adolescent or adult 17–18; coding/ encoding skills 14–15; commonalities in 47–50; implications for diagnosing 50–1; individual’s autonomy and 16; influencing factors 15; initial stage of learning to write 14; learning as participation 16; of multilingual learners 47; non-alphabetic writing systems 15; notion of thresholds in 50; pre-writing stage 14; relationship between accuracy and lexical complexity 47; technical aspects 15–16; see also theories of SFL writing development Writing Pal (W-Pal) 211, 213, 218, 226, 233–4; conduct of automated textual analyses 212; improvements in feedback 212–13; mnemonics in 212; modules 211–12; revision of 212 writing process 10, 149, 265; brain imaging 83, 84; cognitive aspects of 52; elements in 149; eye-tracking 83, 84; Hayes and Flower’s model 55–9, 56, 58; as interactive and recursive activity 56; keystroke-logging 83, 84; knowledge 36; meanings of 149; planning 157–62; reviewing and revising drafts 165–73, 166–7; stages 173, 174; task characteristics influencing 150–3, 152; text generation 162–5 writing superiority effect 54 Written Productive Translation Task (WPTT) 73–4 Zimmermann’s model of formulation in SFL Writing 70, 70–2, 164 Zone of Proximal Development (ZPD) 32, 34, 36