Perspectives on Language Assessment Literacy: Challenges for Improved Student Learning 9780367859695, 9781003016083

Perspectives on Language Assessment Literacy describes how the elements of language assessment literacy can help teacher

663 87 5MB

English Pages [291] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Perspectives on Language Assessment Literacy: Challenges for Improved Student Learning
 9780367859695, 9781003016083

Table of contents :
Cover
Endorsements
Half Title
Title Page
Copyright Page
Table of Contents
List of figures
List of tables
List of contributors
Preface
PART I: Language assessment literacy: Theoretical foundations
1. Language assessment literacy: Where to go?
2. Language assessment literacy: Concepts, challenges, and prospects
3. Traditional assessment and encouraging alternative assessment that promotes learning: Illustrations from EAP
4. Language assessment literacy: Ontogenetic and phylogenetic perspectives
PART 2: Students’ language assessment literacy
5. Enhancing assessment literacy through feedback and feedforward: A reflective practice in an EFL classroom
6. Using checklists for developing student teachers’ language assessment literacy
7. An investigation into the correlation between IETLS test preparation courses and writing scores: Students’ reflective journals
PART 3: Teachers’ language assessment literacy
8. Language assessment literacy of novice EFL teachers: Perceptions, experiences, and training
9. Teachers’ assessment of academic writing: Implications for language assessment literacy
10. Reliability of classroom-based assessment as perceived by university managers, teachers, and students
PART 4: Language assessment literacy: Interfaces between teaching and assessment
11. To teach speaking or not to teach? Biasing for the interfaces between teaching
12. Planning for positive washback: The case of a listening proficiency test
13. Testing abilities to understand, not ignorance or intelligence: Social interactive assessment: Receive, appreciate, summarize, ask!
Conclusion: Language assessment literacy: The way forward
Author Index
Index

Citation preview

‘This volume addresses an important and perennial problem in all of education– how teachers and students perceive, respond to, and make use (or not) of assessment. Assessment literacy is universal in that all systems and levels implement assessment to evaluate instruction, learning, or curriculum and, just as frequently, to inform improved teaching and learning; so language education, not just 2nd language, researchers will find the book useful. Language assessment literacy is more complex in that language is not just an end but a tool by which all teaching and learning takes place. Being adept at using assessments to improve instruction and learning of languages is seemingly a neglected aspect of assessment in the world of language teaching, which apparently treats the summative test or exam as the norm for assessment. The chapters in this volume help move language assessment more into multifaceted data collection about competencies for the sake of improving the quality of learning and teaching. It’s nice to see language assessment research catching up with the world of classroom assessment theory and practice. The volume provides access to research and thinking about the topic from some relatively under-represented perspectives, including Turkey, Tunisia, Oman, Ukraine, UAE, Saudi Arabia, Japan; as well as Europe and the UK.’ Professor Gavin T L Brown Associate Dean Postgraduate Research (ADPG), The University of Auckland ‘Langauge assessment literacy (LAL) s a critical topic in the field of language testing and assessment; see, for example, the recently established (April of 2019) Language Assessment Literacy Special Interest Group (LALSIG) within the International Language Testing Association. Perspectives on Language Assessment Literacy comprises chapters by authors from traditionally less represented regions of the world areas and thus represents an important contribution to the field. The volume also helps advance the scholarship of LAL. Authors pay special attention to how language assessment theory and practice can better synergize with teaching to improve students’ language learning and to better document students’ learning outcomes.’ Micheline Chalhoub-Deville, Ph.D. Professor, University of North Carolina at Greensboro

Perspectives on Language Assessment Literacy

Perspectives on Language Assessment Literacy describes how elements of language assessment literacy can help teachers gather information about when and how to assess learners, and about using the appropriate assessment tools to interpret results in a fair way. It provides highlights from past and current research, descriptions of assessment processes that enhance LAL, case studies from classrooms, and suggestions for professional dialogue and collaboration. This book will help to foster continuous learning, empower learners and teachers and make them more confident in their assessment tasks, and reassure decision makers that what is going on in assessment meets international benchmarks and standards. It addresses issues like concepts and challenges of assessment, the impacts of reflective feedback on assessment, the ontogenetic nature of assessment literacy, the reliability of classroom-based assessment, and interfaces between teaching and assessment. It fills this gap in the literature by addressing the current status and future challenges of language assessment literacy. This book will be of great interest for academics, researchers, and post-graduate students in the fields of language assessment literacy and English language teaching. Sahbi Hidri is Assistant Professor of Applied Linguistics, University of Tunis, and Senior Specialist (Assessment) for the Education Division, Higher Colleges of Technology, UAE. Sahbi is the founder of Tunisia TESOL and the Arab Journal of Applied Linguistics. His research interests include language assessment literacy, test specifications, the interplay between SLA and dynamic assessment, test-taking strategies, and test mapping.

Perspectives on Language Assessment Literacy

Challenges for Improved Student Learning

Edited by Sahbi Hidri

First published 2021 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 52 Vanderbilt Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2021 selection and editorial matter, Sahbi Hidri; individual chapters, the contributors The right of Sahbi Hidri to be identified as the author of the editorial material, and of the authors for their individual chapters, has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record has been requested for this book ISBN: 978-0-367-85969-5 (hbk) ISBN: 978-1-003-01608-3 (ebk) Typeset in Bembo by Taylor & Francis Books

Contents

List of figures List of tables List of contributors Preface

ix x xii xiv

PART I

Language assessment literacy: Theoretical foundations 1 Language assessment literacy: Where to go?

1 3

SAHBI HIDRI

2 Language assessment literacy: Concepts, challenges, and prospects

13

DINA TSAGARI

3 Traditional assessment and encouraging alternative assessment that promotes learning: Illustrations from EAP

33

LEE MCCALLUM

4 Language assessment literacy: Ontogenetic and phylogenetic perspectives

52

MOJTABA MOHAMMADI AND REZA VAHDANI SANAVI

PART 2

Students’ language assessment literacy 5 Enhancing assessment literacy through feedback and feedforward: A reflective practice in an EFL classroom JUNIFER A. ABATAYO

67 69

viii

Contents

6 Using checklists for developing student teachers’ language assessment literacy

84

OLGA UKRAYINSKA

7 An investigation into the correlation between IETLS test preparation courses and writing scores: Students’ reflective journals

107

FATEMA AL AWADI

PART 3

Teachers’ language assessment literacy 8 Language assessment literacy of novice EFL teachers: Perceptions, experiences, and training

133 135

AYLIN SEVIMEL-SAHIN

9 Teachers’ assessment of academic writing: Implications for language assessment literacy

159

ZULFIQAR AHMAD

10 Reliability of classroom-based assessment as perceived by university managers, teachers, and students

176

OLGA KVASOVA AND VYACHESLAV SHOVKOVY

PART 4

Language assessment literacy: Interfaces between teaching and assessment

197

11 To teach speaking or not to teach? Biasing for the interfaces between teaching

199

DIANA AL JAHROMI

12 Planning for positive washback: The case of a listening proficiency test

220

CAROLINE SHACKLETON

13 Testing abilities to understand, not ignorance or intelligence: Social interactive assessment: Receive, appreciate, summarize, ask!

241

TIM MURPHEY

Conclusion: Language assessment literacy: The way forward

260

SAHBI HIDRI

Author Index Index

262 270

Figures

3.1 3.2 3.3 4.1

5.1 7.2 7.3 7.4 7.5 7.6 8.1 8.2 10.1 10.2 10.3 11.1 11.2 12.1 12.2 12.3 12.4 12.5 13.1

Academic reading multiple–choice task First certificate in English: Use of English task Sample collaborative writing task Differential AL/LAL profiles for four constituencies. (a) Profile for test writers. (b) Profile for classroom teachers. (c) Profile for university administrators. (d) Profile for professional language testers. Three Major Feedback Questions Excerpts representing the use of conjunctions by students B and C Excerpts representing the use of determiners in reflections Excerpts showing the use of ‘the’ for referencing purposes Excerpt from student A’s reflection 6 showing the repetition of lexical terms Excerpts representing word repetition used the participants The view of the themes and subthemes in NVivo 11 Pro Program The whole findings of novice EFL teachers Training in LTA received by respondent teachers (%) Preferred formats of training in LTA Comparison of perceptions of summative assessment reliability Students viewpoints regarding the teaching and assessment of speaking (1) Students viewpoints regarding the teaching and assessment of speaking (2) Proposed model of listening ability Highest levels of processing reached in Task 1 Highest levels of processing reached in Task 2 Highest levels of processing reached in Task 3 Highest levels of processing reached in Task 4 Bottom of each test

36 39 45

58 75 117 118 119 120 120 143 145 189 191 192 208 208 223 230 231 232 233 248

Tables

4.1 5.1 5.2 5.3 7.1 7.2 7.3 7.4 7.5 7.6 8.1 9.1 9.2 10.1 10.2 10.3 10.4 11.1 11.2 11.3 11.4 11.5

LAL stages WRIT4111 Course Profile - WRIT4111: Professional writing course Can Do Statements Feedback Sheet. Typology of written corrective feedback types. The participants’ demographic data Topic vocabulary used by the participants in all reflections Pronouns used in some reflections as anaphoric reference Examples of synonyms, antonyms and collocations in student A’s reflections Examples of synonyms, antonyms, and collocations in student B’s reflections Examples of synonyms, antonyms and collocations in student C’s reflections The profile of novice EFL teachers (numbers) Assessment Criteria and Rubrics. Examples of Sample Paraphrasing Perceptions of uniformity of requirements to summative assessments Satisfaction with the test quality Perceptions of test objectivity vs satisfaction with test results/ grades (%) Ranking of training format in order of effectiveness Mean scores of the foreign language classroom anxiety scale (FLCAS) Correlation between FLCAS and gender Mean scores of students’ satisfaction with the academic curricula and teaching practices Correlation between FLCAS and extracurricular exposure to spoken English Mean scores of students’ extracurricular exposure to spoken English

57 73 77 78 113 115 116 122 123 124 142 166 169 185 187 188 190 207 207 208 209 209

List of tables xi

11.6 Correlation between FLCAS and extracurricular exposure to spoken English 12.1 Test description 12.2 Frequencies of level of listening process reached for each correct item

210 226 234

Contributors

Fatema Al Awadi is Education Faculty and a former Bachelor graduate teaching within the Higher Colleges of Technology system. She is a TESOL Master degree holder from the British University in Dubai. Currently, she is a PhD candidate. Zulfiqar Ahmad has a PhD in Applied Linguistics from De Montfort University, UK. With over 25 years of ELT experience, he is presently an ELI faculty at the University of Jeddah, KSA. His main research interests include academic writing, academic literacies, discourse and genre analysis, language assessment, and TESOL. Diana Al Jahromi is an Assistant Professor of Linguistics and the Quality Assurance Director at the University of Bahrain. She has received numerous local and international awards. Her research interests focus on discourse analysis, sociolinguistics, computational linguistics, corpus linguistics, SLA, e-learning, and quality assurance. Sahbi Hidri is Assistant Professor of applied linguistics, University of Tunis, and Senior Specialist (Assessment) for the Education Division, Higher Colleges of Technology, UAE. Sahbi is the founder of Tunisia TESOL and the Arab Journal of Applied Linguistics. His research interests include language assessment literacy, test specifications, the interplay between SLA and dynamic assessment, and test-taking strategies, and test mapping. Olga Kvasova is Associate Professor at Taras Shevchenko University of Kyiv. She lectures in the methodology of building MA students’ academic skills, and language assessment, she also supervises MA and PhD theses in ELT. Her research interests include classroom-based assessment, teacher training in LTA, and course and materials design. Lee McCallum is an EdD candidate at the University of Exeter. She has extensive teaching experience in EAP from the Middle East, Europe, and China. Her research interests include language assessment and writing instruction with a focus on how corpus-based methods can enhance these areas.

List of contributors xiii

Mojtaba Mohammadi is Assistant Professor of TEFL at Islamic Azad University, Roudehen Branch, Iran. Mojtaba is a certified Cambridge trainer with 23 years of teaching experience who has attended and presented at national and international conferences. His research interests include language assessment, CALL, and teacher education. Tim Murphey is an active part-time Professor at KUIS (RILAE), Wayo Women’s University Graduate School, Nagoya University of Foreign Studies Graduate School, and Aoyama University. He has an MA in TESOL from the University of Florida, and a PhD in Applied Linguistics from the University of Neuchatel, Switzerland. Aylin Sevimel-Sahin is currently working in the ELT Department, Faculty of Education, Anadolu University, Eskisehir, Turkey. She holds a doctorate degree in the field of ELT testing and assessment. Her research interests are ELT teacher education, language testing and assessment, practicum/teaching practice, affective domain of ELT, and research methodology. Caroline Shackleton is a teacher and language testing professional presently working at the University of Granada’s Modern Language Centre. She holds an MA in Language Testing from the University of Lancaster, and a PhD in Applied Linguistics from the University of Granada. Vyacheslav Shovkovyi, Professor, holds a degree of Doctor of Pedagogic Sciences (Dr Habil.). He heads the Department of L1 and L2 Teaching Methodology at Taras Shevchenko University of Kyiv, Ukraine. He lectures in methodology of teaching L1, and supervises PhD theses and research studies conducted in the Department. Olga Ukrayinska is an Associate Professor at the Subdepartment of English Philology in Kharkiv Skovoroda National Pedagogical University. She has been teaching for 16 years. She holds a PhD in FL Teaching. Her academic interests centre on FL teaching and assessment in tertiary education. She is an individual member of EALTA, ALTE, and UALTA. Reza Vahdani Sanavi is currently teaching English as a Foreign Language at the Social Sciences University of Ankara as an Assistant Professor. He also served as the Head of Department in the ELT Department at Islamic Azad University, Roudehen Branch. His areas of interest are assessment, attitudinal studies, and feedback.

Preface

Language assessment literacy (LAL) has tremendous potential to enhance student learning and change a whole assessment culture. When LAL assessment is not present, or when it is not well developed among learners, teachers, decision-makers, and other stakeholders, learning for all students does not happen, nor does it improve. However, the presence of well-benchmarked LAL can lead to positive washback and can ultimately help stakeholders to attain international recognition. This entails welding together many variables such as views of language and language learning, awareness of international assessment benchmarks, and learners’ awareness of all the assessment challenges. As implied in the title, Perspectives on language assessment literacy: Challenges for improved student learning, developing LAL is critical for its effective use in language assessment, testing, and evaluation. This book identifies some different purposes of why LAL should be there and how the interfaces between assessment, learning, and teaching can lead to useful, fair, and valid tests, and therefore meet international standards. Some readers may be more familiar with the importance of LAL and how it is implemented in international contexts, whether such exams are lowstakes or high-stakes ones. The category and nature LAL included in Perspectives on language assessment literacy: Challenges for improved student learning might have great potential to empower learners, foster continuous learning, empower teachers and make them more confident in their assessment tasks, and reassure decision makers that what is going on in assessment meets international benchmarks and standards. Perspectives on language assessment literacy: Challenges for improved student learning is intended for learners, teachers, parents, decision-makers, testing organizations, and many others. The book provides highlights from past and current research, descriptions of assessment processes that enhance LAL, case studies from classrooms, and suggestions for professional dialogue and collaboration. Readers are invited to consider Perspectives on language assessment literacy: Challenges for improved student learning in relation to their contexts, personal conceptions, and practices of assessment, as well as other different micro- and

Preface

xv

macro-levels of LAL. Reading this book with a contextualization of the studies as argued for by the contributors might give readers a better comprehensive scenario of how stakeholders are considering issues that have been gaining more and more momentum in second language assessment.

Part I

Language assessment literacy: Theoretical foundations

Chapter 1

Language assessment literacy Where to go? Sahbi Hidri

Introduction The main points in traditional assessment are certifying reliability and validity in assessment instruments. These aspects are consistent with the goals of quantifying and measuring learning and collecting information. The fact that these aspects are the concerns in assessing learners establishes the final grade as the product of traditional assessment (Brown, 2004). As pointed out by Alderson (1999), Alderson et al. (1995), and Alderson and Banerjee (2001), once learning is characterized as a number, it is of utmost significance that the number be as reliable and valid as possible, otherwise it has no meaning. The philosophy of traditional assessment allows assessment instruments to rank students against each other which is termed as norm-referenced testing (NRT). According to Dunn et al. (2004), and Dunn and Dunn (1997), norm-referenced assessment can be inaccurate or unreliable in some cases and is consequently regarded to be less valid than criterion-referenced testing (CRT). According to Fulcher and Davidson (2007), the shift towards criterion-referenced assessment is indicative of the necessity to improve more precise representations of students’ learning than can be provided through norm-referenced assessment. Shepard (2000) pointed out that this viewpoint on learning is based on the notion that learning is a procedure of gathering information and knowledge in separate pieces, with restricted writing or transfer, and that assessment under this pattern attempts to find whether learners have retained the information which is basically given to them by their instructors.

Review of the literature Language education and the role of assessment Given different responsibilities teachers face in their daily activities, it is not surprising that assessment becomes a difficult task. Brown (2004) believes that it is quite unusual to question the purpose of assessment due to the fact that assessment has become a part of our everyday life. However, assessment

4 Sahbi Hidri

professionals are reflecting on the reasons behind specific practices of assessment in society. For example, McNamara (2000) and McNamara and Hill (2011) explain that we assess with the purpose of gaining insights into students’ level of understanding or capability. One might imagine that the knowledge obtained through the procedures of assessment would be welcomed and seen as an essential constituent of good instruction. Nevertheless, the application of terms like tasking teachers to teach for the test, and relegating teaching the curriculum to the assessment level means that assessment is regarded as an activity that is different from the teaching goals (McNamara, 2001; McNamara & Roever, 2006). Moreover, Rea-Dickins (2004) argues that educators most often feel obliged to select which role to adhere to, be it organizer, observer–evaluator or judge of language assessment and tests. As pointed out by Poehner (2008), the view that assessment is in contrast to teaching might be attributed to an increasing awareness of the political appeal of numerous assessment approaches. Such a scenario is particularly applicable to highstakes exams and quizzes that are approved by school administrators and policy makers and imposed upon instructors and students (Shohamy, 2001). The outcomes of high-stakes examinations carry substantial significance in learner knowledge, instructor responsibility, and state principles. Therefore, as pointed out by Poehner (2008), assessment preparation not only improves an end in itself, but it can even become more important than the curricular objectives and learning purposes. According to Poehner (2008), another significant issue affecting the association between teaching and assessment is teachers’ unfamiliarity with the principles and theory underlying various assessment conceptions (Hidri, 2016). As pointed out by Torrance and Pryor (2002), teachers and test designers often go to the classrooms without being very prepared to take up the challenges of designing valid, reliable, and fair assessment tools that would reflect the actual language ability of the test-takers. As an alternative, they are prepared to employ several practices and tasks such as cloze tests, group assignments, and tests but without a conjectural understanding to simply guide their application. Linking assessment and teaching and learning As pointed out by Cheng et al. (2015), the first way of theorizing an association between teaching and assessment should be within the scope of the influence of official testing on learning and instruction, which is known as the ‘washback effect’. According to Alderson and Wall (1993a, 1993b), Wall (2013), and Wall and Alderson (1996), washback displays itself mainly in high-stakes assessment contexts, where high scores should be the instruction target of teachers and test designers. At this level, scores are meant to show the knowledge gain, and skills and strategies that test-takers have acquired through a period of instruction. Similarly, the test results should show how well learners qualified for the test. Frederiksen and Collins (1989) state that tests might have positive or negative impacts that are directly linked to construct validity of the test per se.

Where to go? 5

Although washback investigations explore the influence of assessment on teaching, researchers have addressed the impacts of teaching on assessment. In this method, in order to connect assessment and teaching, the assessment process should emanate from the instructional materials that are being taught in class (Poehner, 2008). This approach enables classroom instructors to undertake a more active role in determining assessment practices where teachers should undertake a more valid and active role in selecting the right assessment tools and methods that are congruent with the course learning outcomes and objectives. Another approach elucidated by Rea-Dickins (2004) in linking assessment and teaching is an extra improvement of curricular-driven evaluations which affords itself well to assessments of program efficiency. That is, due to the fact that the assessment tasks emanate from the objectives of a given curriculum, learners’ assessment tasks can be perceived as a gauge of how well they met these purposes. The next approach to getting assessment and teaching together, according to Poehner (2008), includes creating educational objectives and then inventing comparable teaching and assessment accomplishments. Instead of imposing an assessment on a present educational milieu or applying classroom activities and practices to create assessment processes, assessment and teaching from this outlook should be established along with each other. The concluding approach on the association between instruction and assessment, which is discussed by Rea-Dickins (2004), tries to achieve this by including assessments throughout educational performances and activities. This kind of assessment is typically implemented by classroom instructors with the purpose of having teaching fine-tuned to students’ requirements which indicates a type of formative assessment. Language assessment literacy The success of testing is related to the manner and technique in which it is administered to learners yet much more than that depends upon the objectives of assessment and assessor. This becomes even complex when it is in the realm of second language teaching and learning because assessment in language is required to measure the related elements in flux. Therefore, it seems logical that the assessors need to possess comprehensive assessment literacy in order to learn about the strengths and weaknesses of the learners through the process. Unfortunately, underdeveloped educational systems are lacking in the sophisticated assessment parameters, which negatively affects second language learning. This reflected in the performance of learners, as discussed by Alderson (1996). According to him, testing does not only reveal the strengths and weaknesses of the learners but also the strengths of and weaknesses of the quality of teaching, assessment, and program. Today, assessment literacy in the assessors stands as a significant hallmark of any successful language teaching program. Many experts in the educational field have argued that assessment literacy can be considered a fashionable cliché in language teaching; however, its

6 Sahbi Hidri

absence has the potential to cripple the quality of a program as well as the purpose of teaching. It is for this reason seen ‘as a sine qua non for today’s competent educator’ (Popham, 2006, 2009). In addition, Nawab (2012) argues that second language teachers must be trained exclusively under separate programs because language teaching and assessment demands something different from other disciplines. It demands diligence and sensitivity towards the learning needs of the stakeholders. It is through the phenomenon of language that learners have to reach the stock of knowledge for other subjects. This throws light on the significance of addressing the issues of assessment literacy among second language programs. After the first part of the 21st century, the concerns regarding assessment literacy started to appear all around globe and an effective definition of the term was needed in order to relate it to second language curriculum design. This points towards the significance of the knowledge and practice of assessment in the context of its utility in updating language learning materials (Fulcher, 2012). In this way, assessment training among language teachers is considered of prime importance in the delineation of successful language teaching courses. However, this calls in to question any language teaching and assessing criteria that have been unsuccessful in yielding acceptable results from learners’ performance. In addition, it lightens the burden of responsibility and accountability on the shoulders of learners because language learning contexts are now packages that have a foundation built on the assessor’s assessment skills of the assessors. The rest is unfolded later that encapsulates the traits of other stakeholders. Nevertheless, educational organizations are still responsible for arranging pre- and in-service courses and workshops to train the assessors in order to alleviate the weak standards of assessment around them. A study carried out by Vogt and Tsagari (2014) reveals that language teachers, overall, expressed the need for training in issues of assessment because they continuously have to come to terms with standardized testing. Moreover, more and more people are getting involved in analyzing scores and deciding about the assessment issues. In such a scenario, the systems that do not move ahead with updates, are bound to barely meet, or completely fail in bringing about desired learning outcomes among students. It actually is the time to reengineer the whole system of assessment and its objectives (Taylor, 2009), seeking to answer the most significant question: Why do we need to assess our students? The answer takes into account professional assessment practices, effective, assessment-literate teachers, and conscientious organizations that are capable of providing concrete opportunities for their assessors to learn how to effectively assess, what to assess, and what not to assess. Too much assessment points towards the rote learning present in language learning programs in many educational systems. Growing professionalization has lent more meanings to the term ‘assessment literacy’, and in the field of applied linguistics, it is now related to something

Where to go? 7

more than setting standards for developing and administering tests using certain techniques. It is something that goes beyond the traits of tests to include the ethics of testing in policy and practice. It engages assessors at a level which is much more than just the technical characteristics of tests (Taylor, 2009), encapsulating the philosophy and ethics of testing. It is noteworthy that due to globalization, an ethical milieu has appeared in the realm of assessment literacy for assessors that functions on the standards of accountability and responsibility. An assessor is not only a stakeholder whose responsibility starts on the day of a test and finishes after an assigned duration. The modern assessor is a significant learning partner whose effective techniques and conceptions of assessment can produce effective learning outcomes among learners. Today’s assessor has to exercise, on an almost daily basis, analytical and formative assessment in theory and practice. The primary reason behind such a significant in assessment literacy is that the 21st century has so far been all about promoting global connections among different populations of the world. This purpose can only be achieved through understanding and expressing ideas for which languages are the pivotal tools. Another significant factor that can make or break assessment is the assessors’ understanding of the validity and reliability of tests. Assessors have to be fully vigilant to look into the standardization of testing. This also includes the ability to spot fake standardization in any testing contexts. This is too critical to be the work of a naïve teachers without any prior training in assessment. It is for this reason that the expert language teaching community has been working to develop explicit standards in testing while expanding the concepts of validity and reliability (Messick, 1996). This has updated the components of assessment literacy. There has been a paradigmatic shift from merely relying upon the assessment material, which is appropriate for testing a certain level, to increasing accountability and analytically understanding the learners’ needs in a particular context. Therefore, the shape of the present assessment paradigm is, according to Davies (2003, 2013, 2014, 2008), ‘Knowledge + Skills + Principles’. The updated assessment frameworks function on the shoulder of teachers who reflect the relevant ethical involvement in the process of assessment. This further indicates a major change in the definition of assessment literacy in the age of information. Davies (2008, 2014) continues to discuss what constitutes the elements of Knowledge, Skills, and Principles in assessment. Knowledge is the relevant background that is ready to function at the time of practice; Skills are the abilities in an assessor to standardize tests and use related methodologies to hold meaningful assessment, whereas Principles are related with the suitable use of language tests with fairness and professional expertise. We frequently come across reports of training and workshops arranged by stakeholder organizations for updating teachers’ assessment literacy. We need to approach this with much caution as there is a big difference between the arrangements that bring about positive changes in teachers’ assessment conceptions, and counterfeit arrangements that are arranged for the sake of publicity.

8 Sahbi Hidri

Rigorous assessment literacy training pertains to practical and beneficial approaches delineated for assessors. They are aligned with the truthful understanding of the amount of assessment training required for assessors. Well-rounded training not only includes practices in measurement and psychometrics but also promotes among assessors an understanding of the role of assessment in a particular social environment. McNamara and Roever (2006) advocate a broad view of this role. They put forth that a competent assessor takes into account the social consequences of an assessment as well as its appropriate utility. The days of testing for testing’s sake no longer satisfy organizations and markets requiring properly trained professionals who have passed through genuine and socially sensitive education. Sometimes purely technical assessment that fulfils all credentials for a properly designed test may not be the only factor in measuring learners’ performance. Assessors should possess the ability of employing analytics in a manner that is capable of capturing the true learning that has taken place in the mind of the learners. External practices of assessment must be founded on the internal understanding of its objectives and purposes. Studies on LAL are still being released and perhaps the most challenging study that has given LAL another dimension is Xu and Brown’s (2016) where they linked LAL to teacher education and educational assessment. The authors also suggested a framework of teacher assessment literacy in practice. Perspectives on Language Assessment Literacy: Challenges for Improved Student Learning is a book about LAL as perceived by authors in their ELT contexts. The book is divided into four parts: a) Language Assessment Literacy: Theoretical Foundations, b) Students’ Language Assessment Literacy, c) Teachers’ Language Assessment Literacy, and d) Language Assessment Literacy: Interfaces between Teaching and Assessment, and they debate different cases of LAL as investigated in these contexts. Part One, Language Assessment Literacy: Theoretical Foundations, situates the LAL in its theoretical underpinnings by linking LAL to teachers, learning, and assessment. In Hidri’s Chapter, ‘Language Assessment Literacy: Where to Go’, the author addressed the broad scope of LAL by addressing the fact that LAL is always shaped by views of language and language learning, mainly of teachers. LAL can also be shaped by the views of policy makers, materials designers, and official program writers. Such views sometimes stand in sharp contrast with the views of teachers and this can end in a ‘bloomy and shay’ assessment policy. Chapter Two, ‘Language Assessment Literacy: Concepts, Challenges and Prospects’, theoretically addresses LAL. Tsagari hails the role of language testing and assessment at the societal as well as the educational levels. The author also highlights the point that having a standard and unified definition of LAL has been an ongoing issue of debate that language assessment and testing practitioners need to agree on. To build a more comprehensive idea about LAL, Tsagari reviews all the studies of LAL with the purpose of having a clear, current status of LAL that will serve in having clear pedagogical implications for learners and teachers. In Chapter Three, ‘Traditional Assessment and

Where to go? 9

Encouraging Alternative Assessment that Promotes Learning: Illustrations from EAP’, McCallum probes the traditional forms of assessment while at the same time calling for the use of other alternative form of assessment in an EAP context. Contrary to what the literature on LAL has defined as two opposing directions of traditional vs. alternative forms of assessment, the author argues that these two opponent views of LAL are in fact complimentary the moment they are placed on a continuum that fuses ‘traditional type assessment tasks to critical type assessment tasks to critical type assessments which represent the different assessment goals that exist within the specialized field of EAP [English for Academic Purposes]’. The chapter offers the opportunity to develop this continuum in different context. Sanavi and Mohammad, in the last chapter, ‘Language Assessment Literacy: Ontogenetic and Phylogenetic Perspectives’, investigate the ontogenetic and phylogenetic aspects of LAL in the two last decades while probing the views of different stakeholders. The authors claim that the LAL term was conceptually defined as ‘two concepts of contextual and experiential factors through mediation’. Part Two of the book, Students’ Language Assessment Literacy, deals with students’ conceptions and practices of assessment. In Chapter One, Abayato accentuates promoting LAL through the use of feedback. This chapter reflects on such an implementation in an EFL classroom since it is related to students and teacher professional development, classroom practices, and instructional materials pertaining to students’ learning. Defining LAL in a comprehensive way requires having a detailed idea of classroom practices of teaching and learning. Based on his personal experience, the author exemplifies his feedback on students’ work to effectively enhance their learning in an EFL context. Chapter Two, ‘Using Checklists for Developing Student Teachers’ Language Assessment Literacy’, Ukrayinska probes the use of checklists as learning and assessment tools in a context where, for example, teachers have to develop their teaching materials. For teachers, checklists are used to standardize the test design phase among test designers. With almost the same purpose, checklists are employed by students for self and peer-assessment. This multifunctional nature of checklists can contribute to developing students’ language assessment literacy. In Chapter Three, ‘An Investigation into the Correlation between IETLS Test Preparation Courses and Writing Score: Students’ Reflective Journals’, AlAwadi, in a study of the United Arab Emirates (UAE) context, studies the impacts of test preparation courses on obtaining a high score in the International English Language Testing System (IELTS) exam and how students reflect on their use of vocabulary to master such a sub-skill in this international standardized exam. The author maintains that knowledge and use of the different parts of the exam can develop students’ language assessment literacy. However, before targeting this literacy, students have to be involved in test preparation courses. Test scores, according to the authors, cannot form a comprehensive view of the actual language ability of learners, hence the necessity to involve students in IELTS preparation courses so they perform better in the IELTS exams.

10 Sahbi Hidri

Part Three of the book, Teachers’ Language Assessment Literacy, tackles teachers’ LAL. In Chapter One of this part, Sahin stresses the necessity of language teachers to be assessment literate so that they can assess students in an objective way as per their instructional context. The study was carried out on 22 Turkish EFL teachers and was aimed at investigating their conceptions and practices of assessment. Results of the study revealed that teachers still needed to work more on their assessment beliefs so that they can test students in a fair way. In the second chapter, Ahmad accentuates the relevance of teachers to develop their language assessment literacy. Based on data analysis of students’ writing, the author affirmed that both the assessment standards and teachers’ rating should be revisited as they did not meet the expectations of writing fair and valid writing assessment tasks. The study is important in signposting the relevance of standardizing international benchmarks for the assessment of writing. In Chapter Three, Kvasova and Shovkovyi tackled some stakeholders’ perceptions of the reliability of classroom-based summative assessments in Ukraine. The authors stress the fact that teachers most often lack important assessment literacy skills to be operational in their academic contexts. However, this lack of assessment expertise, as perceived by university managers, teachers, and students might lead to harmful effects and therefore will not meet international assessment standards. Implications of the study called for biasing for a better quality of constructed tests whose major purpose is to guarantee assessment reliability. Part Three of the book, Language Assessment Literacy: Interfaces between Teaching and Assessment, includes studies from Bahrain, Spain and Japan. In Chapter One of this part, Al Jahromi highlights the fact that the assessment of speaking is still posing some major problems to students, practitioners, and assessors. Based on empirical data on the Bahraini context, the author critically reviews the assessment of speaking and calls for a reconsideration of the assessment of this skill. In Chapter Two, Shackleton addressed the positive ‘washback’ effect in a listening proficiency test in the Spanish context. The author used a B2 listening exam to develop its construct validity, a think-aloud protocol, and a retrospective interview to measure the construct validity of the test and whether the test measures what it is supposed to measure. Based on the analysis of the planning and prediction strategies as well as the other research instruments, the author maintained that the listening construct was fuzzy and that there are serious threats to its construct validity. The author calls for the use of authentic listening input to raise teachers’ awareness to order to target the assessment of listening construct validity. The last chapter of this part deals with understanding the assessment of testing abilities in a socially interactive environment. To do so, the author maintains this understanding necessitates the presence of language assessment literacy for test designers. Also, the author criticizes the role of standardized assessment in making our students creative in their socially interactive contexts. Developing LAL is important there needs to be much more work on training students and teachers so that they can contribute to useful test development.

Where to go? 11

Conclusion The main aim of assessment is to assess whether or not learning has occurred. It is believed that the main objective of assessment is to find out how far the learning involvements are essentially generating the anticipated outcomes. Assessment primarily refers to the systematic way of collecting information with the aims of making judgments or decisions about people. Shepard (2000) pointed out that this viewpoint on learning is based on the notion that learning is a procedure of gathering information and knowledge in separate pieces. The main points in traditional assessment are certifying reliability and validity in assessment instruments. These aspects are consistent with the goals of quantifying and measuring learning, and collecting information. The fact that these aspects are the concerns in assessing learners establishes the final grade as the product of traditional assessment (Brown, 2004). As pointed out by Alderson (2000), once learning is characterized as a number, it is of utmost significance that the number be as reliable and valid as possible, otherwise it has no meaning. The philosophy of traditional assessment allows assessment instruments to rank students against each other, which is termed as NRT. However, NRT can be inaccurate or unreliable in some cases, and it is consequently regarded to be less valid than CRT. The shift towards CRT is indicative of the necessity to improve more precise representations of students’ learning than can be provided through NRT.

References Alderson, J. C. (2000). Assessing reading. Cambridge University Press. Alderson, J. C. (1999, May). Testing is too important to be left to testers [Plenary address]. The Third Annual Conference on Current Trends in English Language Testing. United Arab Emirates University. Alderson, J. C. (1996). The testing of reading. In C. Nuttall (Ed.), Teaching reading skill in a foreign language (pp. 212–228). Heinemann. Alderson, J. C., & Banerjee, J. (2001). Language testing and assessment (Part 1). Language Teaching, 34(4), 213–236. Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. Cambridge University Press. Alderson, J.C., & Wall, D. (1993a). Does washback exist? Applied Linguistics, 14(2), 115–129. Alderson, J. C., & Wall, D. (1993b). Examining washback: The Sri Lankan impact study. Language Testing, 10(1), 41–69. Brown, H. D. (2004). Language assessment: Principles and classroom practices. Pearson Education. Cheng, L., Sun, Y., & Ma, J. (2015). Review of washback research literature within Kane’s argument-based validation framework. Language Teaching, 48(4), 436–470. Davies, A. (2014). 50 years of language assessment. In A. J. Kunnan. (Ed.), The companion to language assessment: Abilities, contexts and learners (pp. 3–21). Wiley Blackwell. Davies, A. (2013). Native speakers and native users: Loss and gain. Cambridge University Press. Davies, A. (2008). Textbook trends in teaching language testing. Language Testing, 25(3), 327–347.

12 Sahbi Hidri Davies, A. (2003). Three heresies of language testing research. Language Testing, 20(4), 355–368. Dunn, L. M., & Dunn, L. M. (1997). Peabody picture vocabulary test—III. American Guidance Service. Dunn, L., Morgan, C., O’Reilly, M., & Parry, S. (2004). The student assessment handbook. Routledge Falmer. Frederiksen, J. R., & Collins, A. (1989). A systems approach to educational testing. Educational Researcher, 18(9), 27–32. Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment Quarterly, 9(2), 113–132. Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource book. Routledge. Hidri, S. (2016). Conceptions of assessment: Investigating what assessment means to secondary and university teachers. Arab Journal of Applied Linguistics, 1(1), 19–43. McNamara, T. (2001). Language assessment as social practice: Challenges for research. Language Testing, 18(4), 333–349. McNamara, T. (2000). Language testing. Oxford University Press. McNamara, T., & Hill, K. (2011). Developing a comprehensive, empirically based research framework for classroom-based assessment. Language Testing, 29(3), 395–420. McNamara, T., & Roever, C. (2006). Language testing: The social dimension. Blackwell. Messick, S. (1996). Validity and washback in language testing. Language Testing, 13(3), 241–256. Nawab, A. (2012). Is it the way to teach language the way we teach language? English language teaching in rural Pakistan. Academic Research International, 2(2), 696–705. Poehner, M. (2008). Dynamic assessment: A Vygotskian approach to understanding and promoting L2 development. Springer Science + Business Media. Popham, W. J. (2006). All about accountability: A dose of assessment literacy. Improving Professional Practice, 63(6), 84–85. Popham, W. J. (2009). Assessment literacy for teachers: Faddish or fundamental? Theory Into Practice, 48(1), 4–11. Rea-Dickins, P. (2004). Understanding teachers as agents of assessment. Language Testing, 21(3), 249–258. Shepard, L. (2000). The role of assessment in a learning culture. Educational Researcher, 29(7), 4–14. Shohamy, I. (2001). The power of tests: A critical perspective on the uses of language tests. Longman. Taylor, L. (2009). Developing assessment literacy. Annual Review of Applied Linguistics, 29, 21–36. doi:10.1017/S0267190509090035. Torrance, H., & Pryor, J. (2002). Investigating formative assessment: Teaching, learning, and assessment in the classroom. Open University Press. Vogt, K., & Tsagari, D. (2014). Assessment literacy of foreign language teachers: Findings of a European study. Language Assessment Quarterly, 11(4), 374–402. Wall, D. (2013). Washback in language assessment. In C. A. Chapelle (Ed.), The encyclopedia of applied linguistics. Blackwell Publishing Ltd. Wall, D., & Alderson, J. C. (1996). Examining washback: The Sri Lankan impact study. In A. Cumming & R. Berwick (Eds.), Validation in language testing (pp. 194–221). Multilingual Matters. Xu, Y., & Brown, G. T. L. (2016). Teacher assessment literacy in practice: A reconceptualization. Teacher and Teacher Education, 58, 149–162. doi:10.1016/j.tate.2016.05.010.

Chapter 2

Language assessment literacy Concepts, challenges, and prospects Dina Tsagari

The conceptualizations of language assessment literacy (LAL) The term ‘assessment literacy’ was coined by Stiggins (1991) and emerged in an era of multiplication of literacies (e.g. technological literacy, computer literacy, biblical literacy). As Taylor observes, when used as the functional head of compound words, the term ‘literacy’ denotes the ability to understand and engage in practices related to the denotation of the preceding word (2013, p. 405). Overall, it can be said that ‘assessment literacy’ refers to the general educational ability to understand and engage in practices related to assessment (Engelsen & Smith, 2014). However, the term ‘language assessment literacy’ (LAL) is not as straightforward. Davies (2008, p. 335) provided a very schematic and general account of LAL which consists of skills, knowledge, and principles as follows: Skills provide the training in necessary and appropriate methodology, including item writing, statistics, test analysis and increasingly software programmes for test delivery, analysis, and reportage. Knowledge offers relevant background in measurement and language description as well as in context setting. Principles concern the proper use of language tests, their fairness and impact, including questions of ethics and professionalism. Davies’ model suggests that language assessment literate professionals should know how to perform effective assessment activities, be able to ground their assessment practices in the basis of solid (descriptive) knowledge and be in a position to adopt a critical view questioning available assessment practices (see also Davies 2008, pp. 335–336). The central role of skills, knowledge, and principles in establishing LAL has been underlined by several authors after Davies. For example Inbar-Lourie (2008) considers LAL to be a concept that can be represented in the form of a triplet, namely ‘the reason or rationale for assessment (the “why”), the description for the trait to be assessed (the “what”), and the assessment process (the “how”)’ (p. 390). Adopting a post-structuralist approach, Inbar-Lourie

14 Dina Tsagari

(2008) argued that assessment practices and assessment literacy have been developed in the context of two conflicting conceptions of assessment. On the one hand, there is the traditional, cognitivist approach, which reflects the principles of positivism. This conception is materialised in high-stakes, standardized tests and requires practitioners to have knowledge and skills which are largely psychometric. For Inbar-Lourie, this conception, and the respective practices, form the ‘testing culture’ (2008, p. 387). On the other hand, there is a socio-cultural conception which adopts an interpretative, constructivist approach to knowledge and assessment. According to the latter conception, knowledge and assessment are not value-free (see also Taylor 2013, p. 411). They are socially constructed under the influence of more or less dominant epistemological assumptions, educational preconceptions, and social, political, and cultural beliefs (2008, p. 387). This socially oriented conception, which Inbar-Lourie calls ‘assessment culture’, requires stakeholders to be aware of the contextual considerations and the social consequences of assessment and focus on practices that promote learning (i.e., Assessment for Learning). Insisting on the social aspect of assessment, and broadening the account of LAL to include all possible stakeholders, Taylor (2009) suggested an extended and more contextualized conceptualization of the term. As she points out, ‘training for assessment literacy entails an appropriate balance of technical know-how, practical skills, theoretical knowledge, and understanding of principles, but all firmly contextualized within a sound understanding of the role and function of assessment within education and society’ (Taylor 2009, p. 27). The contribution of Taylor’s provisional framework is that it includes considerations of context but, most importantly, it evokes the key notion of balance. According to Taylor (2009, p. 27), the context and the role of the stakeholder in the assessment process will determine the balance of knowledge in specific areas and the levels of LAL that should be achieved. Therefore, although LAL should be developed for all stakeholders, each stakeholder should acquire the amount of knowledge that fits their role. Fulcher (2012) addressed the contextualization of skills, knowledge, and principles in his investigation on teachers’ levels of LAL. Drawing on previous accounts, but also on his empirical findings on teachers’ perceived needs, Fulcher (2012, p. 125) provides the following definition of assessment literacy: The knowledge, skills and abilities required to design, develop, maintain or evaluate, large-scale standardized and/or classroom-based tests, familiarity with test processes, and awareness of principles and concepts that guide and underpin practice, including ethics and codes of practice. The ability to place knowledge, skills, processes, principles and concepts within wider historical, social, political and philosophical frameworks in order to understand why practices have arisen as they have, and to evaluate the role and impact of testing on society, institutions, and individuals.

Concepts, challenges, and prospects 15

The innovative aspect of Fulcher’s definition is that he considers that skills, knowledge, and principles are not sufficient on their own for the development of LAL. According to Fulcher’s conceptualization, skills, knowledge, and principles should be integrated in the context of assessment in a dialectical scheme between assessment constructs and contextual factors (e.g. historical, social, political, and philosophical). This means that practitioners, in particular, cannot be considered assessment literate, if they are not in a position to fully understand how assessment practices and contextual factors shape each other. Interestingly enough, Fulcher mentions ‘awareness’ in his definition and argues for a critical understanding of assessment, requiring teachers to be able to evaluate both assessment practices and social implications of assessment. Scarino (2013), building on the notion of assessment culture, argues that the awareness that individuals have on their own beliefs and preconceptions about assessment (whether these are based on personal experience or contextual factors) is not only a fundamental component of LAL, but also an indispensable process in the individual’s development of LAL (2013, p. 316). Assuming that social, cultural, educational, and political beliefs shape our understanding of assessment constructs and practices, Scarino considers that providing training in skills, knowledge, and principles is not sufficient for developing language assessment literate practitioners. As she points out (2013, p. 322) Assessment literacy needs to be considered in relation to the theoretical knowledge base as an essential source of input, as well as to teachers’ interpretive frameworks which are shaped through their particular situated personal experiences, knowledge, understandings and beliefs. They are obliged to integrate simultaneously the complex theoretical, practical and institutional dimensions of the assessment act and an understanding of self in relation to these. On this view, self-awareness becomes an aim and measure of success in LAL development (Scarino 2013, p. 324). Other authors have tried to provide more flexible models allowing for variation in the development of LAL components. Pill and Harding (2013) presented a developmental model in which LAL is represented in the form of a continuum (e.g., from literate to illiterate) with five stages: a) b) c)

Illiteracy, i.e. the state of ignorance of language assessment concepts and methods. Nominal literacy, i.e. understanding that a specific term relates to assessment, although with possible misconceptions. Functional literacy, that is, sound understanding of basic terms and concepts.

16 Dina Tsagari

d) e)

Procedural and conceptual literacy, i.e. understanding central concepts of the field and using knowledge in practice. Multidimensional literacy, i.e. knowledge that extends beyond ordinary concepts, including philosophical, historical, and social dimensions of assessment. (Pill & Harding 2013, p. 383)

The advantage of this type of conceptualization is that it can reflect the fact that LAL cannot be acquired as a block of knowledge all at once. The continuum approach seems to describe more accurately the development of LAL. Moreover, Pill and Harding’s model captures the fact that stakeholders in assessment have different needs which in turn define the levels of assessment literacy that they should reach. For instance, policy makers could fulfil their role by reaching the ‘functional level’ in Pill and Harding’s model, whilst teachers would most likely need to acquire ‘multidimensional’ or, at least, ‘conceptual and procedural’ literacy in order to engage in effective assessment practices. Although innovative in design and insightful in distinguishing assessment literacy levels according to stakeholders’ needs, Pill and Harding’s model presents some critical problems. As the authors seem to acknowledge (2013, p. 383), the exact meaning and the content of the proposed levels are rather vague (see also Harding & Kremmel, 2016). Also, historical and social dimensions of assessment appear peripheral in Pill and Harding’s model unlike Brindley (2001a, 2001b), Inbar-Lourie (2008), and Scarino (2013) who address contextual considerations and awareness of social consequences as fundamental for the development of LAL. An attempt to bridge component- and levels-based conceptualizations of LAL have been presented in Taylor (2013). Taylor acknowledges that LAL involves various stakeholders, not only teachers. She also takes into account that the levels and areas of literacy vary with stakeholders’ roles and needs. However, instead of matching areas of knowledge with levels of LAL (see Pill & Harding, 2013), Taylor hypothesizes eight core dimensions of knowledge, skills, and principles, and five degrees of literacy. These eight dimensions of LAL are: knowledge of theory, technical skills, principles and concepts, language pedagogy, sociocultural values, local practices, personal beliefs/attitudes, scores, and decision making. For the definition of the levels of literacy, Taylor follows Pill and Harding’s (2013) model and assumes the possibilities of illiteracy, nominal illiteracy, functional literacy, conceptual and procedural literacy, and multidimensional literacy. According to Taylor’s model, stakeholders are expected to acquire a specific level of literacy in each key dimension depending on their context and needs (Taylor 2013, pp. 409–410). The proposed conceptualization allows us to capture the fact that abilities in a knowledge area can be more or less developed depending on each stakeholder’s specific needs. Of course, there might be objections or modifications with respect to the content of the key dimensions or the exact levels of literacy that are necessary for each group of

Concepts, challenges, and prospects 17

stakeholders. The point is that Taylor’s conceptualization provides a powerful tool for creating LAL profiles for each stakeholder, which, in turn, enables more focus on pedagogical efforts. The relation of personal conceptions, beliefs, and attitudes in developing LAL was examined by Scarino (2013). Drawing on evidence collected from her own previous projects, Scarino pointed out that teachers’ preconceptions and beliefs about assessment form an interpretive framework that shapes teachers’ assessment practices and understanding of language assessment constructs. On the basis of this observation, Scarino suggests that the conceptualization of LAL should not be restricted to a core knowledge base, but should also include some component for the teachers’ interpretive framework. It is worth mentioning that, against the aforementioned attempts to conceptualize or define LAL, Inbar-Lourie has recently introduced the idea of ‘language assessment literacies’ (see Inbar-Lourie 2016). Multiple amalgamated understandings of assessment are created by the dynamic merging of expertise in language assessment with expertise in the local context (educational, social, etc.). This idea suggests that research should drop the pursuit of a monolithic prescriptive framework of LAL in favour of a dynamic, loose descriptive model which ‘sets general guiding principles for different assessment literacies but is aware of local needs and is loose enough to contain them’ (Inbar-Lourie 2016, p. 268). Inbar-Lourie (2017) also argued against insulation in defining the knowledge base of LAL, claiming that a pluralistic, descriptive framework of localized language assessment literacies is preferable to a prescriptive monolithic literacy approach. More recently, research and discussions have contributed to the efforts towards defining the construct of assessment literacy. For example, Kremmel, Eberharter, and Harding (2017) pointed out that the role of ‘language’ in the existing LAL models is not clearly conceptualized. According to Kremmel et al., a clear understanding and description of the language component should be at the heart of LAL. Insisting on the variety of stakeholders’ needs, Cooke, Barnett, and Rossi (2017) argued that existing LAL models offer only hypothetical assessment literacy profiles for certain stakeholder groups. In their research, Cooke et al. (2017) try to fill that gap by presenting an empirical model to operationalize the definition of the LAL construct across different stakeholder groups. Interestingly, Cooke et al. offer some worked examples of their model, e.g. assessment literacy profiles of language teachers, language assessors, etc. in the East Asian educational context. Kremmel and Harding (2017; 2019) have also called for more evidence on the conceptualization of stakeholders’ literacy profiles and presented their own empirical model of LAL across different contexts. Supporting their theoretical suggestions, Kremmel and Harding drew on data from web-based questionnaires and offered assessment literacy profiles for language teachers, language testing professionals, and language testing researchers.

18 Dina Tsagari

Expanding the circle of stakeholders, Malone (2017) stressed the centrality of including students’ perspectives in defining the construct and suggested relevant methodologies. Adopting a constructivist and interpretive perspective, Tsagari (2017) argues that the development of LAL is a situated activity located in particular contexts and underlines the role of teachers’ perceptions and knowledge in the process of assessment literacy acquisition.

Empirical research in language assessment literacy Against the theoretical background presented above, scholars conducted empirical investigations of various designs in an effort to understand the constructs and the processes pertaining to LAL. LAL research with various stakeholders In an attempt to identify testing and assessment needs of language testing practitioners across Europe, Hasselgreen, Carlsen, and Helness (2004) conducted a largescale survey in 37 European countries. A total of 914 language teachers, teacher trainers, and testing experts participated in a web-based closed-item questionnaire devised to identify their assessment practices, training, and perceived needs. Hasselgreen et al. (2004) found evidence that all practitioners have significant needs in LAL and that they urgently required training in alternative (or less traditional) forms of assessment, such as portfolios and self- and peer-assessment. Language teachers and teacher trainers, in particular, showed a greater need for training in the aforementioned areas, as well as in interpreting test results, using informal continuous assessment, giving feedback, establishing reliability and validity, and using statistics. In particular, language teachers expressed a need for training in designing tests, too. Test experts revealed a special interest in non-teachingrelated areas, such as statistical analysis, validation research, and item writing and reviewing. A widely overlooked group of stakeholders, that of non-practitioners (i.e. test users), was the research focus of O’Loughlin’s study (2013). Using a survey and semi-structured interviews, O’Loughlin investigated the LAL levels and needs of language proficiency test users. Collecting data from the staff members of two Australian universities that used International English Language Testing System (IELTS) scores as student entry requirements, O’Loughlin found that non-practitioner test users have an oversimplified view of assessment processes. As the findings showed, IELTS users in Australian universities were only concerned with surface information on the test procedures that they used, predominantly with minimum scores required for entry to particular courses (2013, p. 371). Participants showed a much lower degree of knowledge about issues of validity and reliability. Underlining the risk of misjudgements with crucial impact on students’ choices and academic development, O’Loughlin suggests the provision of more language assessment information to test users

Concepts, challenges, and prospects 19

(through workshops, discussion groups and web resources), and the adoption of a more holistic, contextual, and ethically grounded approach to the interpretation of language proficiency test scores. Yan, Fan, and Zhang (2017) drew on data from semi-structured interviews in order to provide LAL profiles for language teachers, language testers, and graduate students in language studies programs in China. As their findings suggest, assessment practices and training needs in China are highly contextualized and shaped by experiential factors which are different for each stakeholder group. Kim, Chapman, Wilmes, Cranely, and Boals (2017) illustrated a case of collaboration between educators and test developers for the creation of formative language assessment tools in the US educational context. Their findings revealed the benefits of dynamic collaboration with stakeholders – particularly with educators and parents – in the development of valid language assessment. The effects of dynamic collaboration were also presented by Harsch, Seyferth, and Brandt (2017). In particular, Harsch et al. (2017) presented insights from the eighteen first months of a long-term project in which teachers, coordinators, and researchers developed their assessment literacy together. The aim of the project was to investigate how the aforementioned stakeholder groups bring their abilities, skills, and knowledge together, and how they learn with and from each other. A study that examined both teachers’ and language assessment specialists’ LAL development is that of Baker and Riches (2017). The study was carried out over a series of workshops on language assessment in 2013, where Haitian teachers offered feedback to assessment specialists about draft examinations. The outcome of these workshops was a revision of national English examinations which were then presented to the Haitian Ministry of Education and Professional Training (MENFP). Interestingly, the study found that teachers’ and specialists’ expertise complement each other and there are still challenges to be addressed in collaborative decision making and consensus building among these stakeholders. Finally, Kim, Chapman, and Wilmes (2017) studied various resources created to enhance parents and educators’ assessment literacy and, more specifically, the ability to interpret and use score reports. Aspects of LAL and English language teachers There are some interesting findings regarding the LAL of teachers of English as a foreign language (EFL). Researchers who investigated the educational context of the Middle East and North Africa provided evidence of low LAL levels and stressed the urgent need for training. For example, in an attempt to explore the interaction of low assessment literacy with negative washback, Kiomrs, Abdolmehdi, and Naser (2011) replicated Plake and James’ (1993) test in a small survey (N=53) in the Iranian EFL context. Apart from a significant correlation between low LAL levels and negative washback effects, Kiomrs et al. found

20 Dina Tsagari

that Iranian EFL teachers performed poorly in assessment practices and had major misunderstandings about assessment, but felt well prepared for teaching and assessing (see also Badia, 2015). At the European level, the Hasselgreen et al. (2004) questionnaire was replicated by Vogt and Tsagari in their 2014 study on the assessment literacy of EFL teachers in seven countries in Europe who recruited 853 participants via questionnaires and added a qualitative component in their research by conducting follow-up interviews with 63 of them. Findings from Vogt & Tsagari’s study confirmed teachers’ lack of assessment literacy and training (see also Tsagari & Vogt, 2017) particularly in less traditional areas of assessment, as well as a need for development in the areas of reliability, validity, and statistics, and an ability to critically evaluate the tests they used. Additionally, Vogt and Tsagari (2014) point out that teachers, in their effort to meet the requirements of their role, tend to resort to compensation strategies, such as learning on the job (by observing colleagues and mentors) or testing as they were tested (pp. 390–391). With the addition of a follow-up interview section, Hasselgreen et al.’s (2004) questionnaire was also replicated by Kvasova and Kavytska (2014) who conducted research on the LAL levels of Ukrainian EFL teachers. Interestingly enough, Kvasova and Kavytska observed that Ukrainian teachers also use the compensation strategies reported in Vogt and Tsagari (2014) (i.e. learning on the job and assessing as they themselves were assessed). In his investigation of assessment conceptions of Tunisian English language teachers, Hidri (2016) found evidence of wrong and conflicting conceptions about assessment. More recently, Berry, Munro, and Sheehan have presented the results of a project which aimed to investigate the training needs, practices, and beliefs of English language teachers in a wide variety of countries (see also Berry & Sheehan, 2017; Berry, Sheehan, & Munro, 2017a; Sheehan & Munro, 2017). Drawing from semi-structured interviews, classroom observations, and teachers’ written feedback on a LAL workshop, the researchers largely confirmed previous studies on teachers’ LAL levels and beliefs. More specifically, as their data showed, English language teachers expressed a lack of knowledge in assessment literacy, as well as a need for training in practical elements of assessment and clear criteria in assessment. Supporting previous investigations on the issue, Berry et al. (2017a) pointed out that teachers in their sample had a testing-oriented conception of assessment. Interestingly enough, the majority of the participants in the project were not confident about assessment. Findings from a qualitative part of a large-scale study on EFL teachers’ assessment literacy levels, training, and needs were presented in Tsagari and Vogt (2017). Using data from semi-structured interviews with primary and secondary state school EFL teachers in Greece, Cyprus, and Germany, Tsagari and Vogt investigated teachers’ perceptions of their own professional preparation, as well as teachers’ perceived training needs. Findings showed that EFL teachers in the aforementioned educational contexts have low levels of LAL. The majority of them said that they had not learnt anything (or learnt very little) about language testing and assessment during their pre-service training.

Concepts, challenges, and prospects 21

Teachers also held fuzzy concepts about assessment, they tended to revert to traditional assessment procedures, and their feedback procedures reflected a deficit-oriented approach. Supporting findings from similar studies (see Vogt & Tsagari, 2014; Kvasova & Kavytska, 2014), participants in Tsagari and Vogt’s (2017) study followed the strategy of learning on the job, relying on mentor colleagues and published materials. A very important finding in Tsagari and Vogt’s research is that teachers were not able to clearly formulate their training and professional needs. The role of contextual factors in developing LAL, but also in exploring LAL levels was examined in Xu and Brown’s (2017) study, which used an adapted version of the Teacher Assessment Literacy Questionnaire to explore the LAL levels of university English language teachers in China. They also explored the possible interaction between LAL levels and demographic characteristics, such as age, gender, professional title, qualification, and others. Drawing on data collected from 891 participants, Xu and Brown’s study revealed that teachers in Chinese universities have a very basic level of LAL while demographic factors do not have a significant impact on teachers’ assessment literacy. The only factor that seemed to influence LAL levels was the institution in which teachers worked. However, the authors stress that these findings should not be considered as clear evidence of the lack of interplay between contextual factors and LAL as this might be due to the methodological design employed e.g. a questionnaire used which was originally intended for a U.S. context 30 years ago was not appropriate for capturing the particular contextual parameters of Chinese universities. Assessment LAL levels in South America, more specifically in the Colombian educational context, were also the focus of Hernández Ocampo (2017), Restrepo and Jaramillo (2017), and Giraldo’s (2018) works. Villa Larenas (2017) also explored LAL levels of Chilean EFL teacher trainers. More recently, Berry, Sheenan, and Munro (2017b) collected data from interviews and classroom observations in the course of a study aiming at exploring UK teachers’ attitudes towards assessment as well as teachers’ perceived training needs. Focusing on the Canadian educational context, Valeo and Barkaoui’s (2017) research explored how English as a Second Language (ESL) teachers conceptualize and conduct assessment in the ESL classroom and how teachers’ conceptions influence their decisions in designing and using writing assessment tasks (see Valeo & Barkaoui, 2017; Barkaoui & Valeo, 2017). Valeo and Barkaoui’s findings suggest that teachers hold varying conceptualizations about how to design and select writing tasks. Using an expansion of Fulcher’s (2012) questionnaire, Kremmel et al. (2017) presented a case about how teacher involvement in highstakes test development can contribute to the development of their LAL. Focusing on the Turkish educational context, Mede and Atay (2017) examined the LAL levels of English-language university teachers in Turkey. Data were collected from 350 participants who completed an adapted version of Vogt and Tsagari’s (2014) questionnaire and from follow-up group interviews with 34 participants. As Mede and Atay’s research showed, English language

22 Dina Tsagari

teachers in Turkish universities have limited language testing and assessment literacy and exhibit a significant lack of knowledge in assessment concepts. Participants in the study were not familiar with notions, such as validity and reliability, and could not use statistics. Testing micro-linguistic aspects of language, such as grammar and vocabulary, was the only domain in which participants seemed to feel comfortable. Mede and Atay’s study showed that English language teachers in Turkish higher education have a compelling need for training in classroom-based assessment practices and in contents and concepts of assessment. Another qualitative study (Yastıbas¸ & Takkaç, 2018) on language teachers’ LAL in Turkey revealed that language assessment, which is based on instructional purposes and developed by language teachers, is focused on the students and their coursebooks. The findings of the study also showed that such a language assessment structure contributes to the validity of exams in terms of content and construct, and to positive washback effects on students. Finally, apart from evidence on LAL levels and needs, recent research has made significant contributions in exploring how LAL relates to factors such as practitioners’ professional background, practitioners’ beliefs, and the surrounding socio-political context. For example, Yan, Zhang, and Fan (2018) conducted a qualitative study to explore the way contextual and experiential factors mediate LAL development. Data involved semi-structured retrospective interviews with three secondary-level EFL teachers in China. Findings showed that the teachers’ experience on assessment, the network of stakeholders, the assessment policies implemented, the assessment training resources available, and practical constraints have all contributed to teachers’ distinct profiles on language assessment and a need for more training in assessment practice.

Methodological considerations Methodological designs The majority of research on LAL draws on quantitative and qualitative analytical methods with an evident recent increase in the use of the latter. The use of mixed methods is also very frequent by authors who attempt to combine the validity of quantitative data analysis with the illuminating and clarifying force of qualitative analytical tools (see Jeong, 2013). The most popular instruments of quantitative approaches in LAL research are questionnaires and surveys. In some works, authors designed, developed, and administered original questionnaires (e.g. Brown & Bailey 2008; Hasselgreen et al. 2004; O’Loughlin 2013). In other works, highly esteemed questionnaires were replicated and adapted to the context and needs of particular research projects (e.g. Jin, 2010; Kiomrs et al. 2011; Vogt & Tsagari 2014; Kvasova & Kavytska 2014). The questionnaires used in the literature commonly consist of closed-response items (e.g. Hasselgreen et al. 2004; Jin, 2010; Fulcher 2012; O’Loughlin 2013), although combinations of both open- and closed-response

Concepts, challenges, and prospects 23

questions are not rare (e.g. Brown & Bailey 2008; Mazandarani & Troudi, 2017). Apart from cases in which questionnaires are the sole source of data gathering and analysis (such as Hasselgreen et al. 2004; Fulcher 2012), scholars often recruit subsets of questionnaire respondents for follow-up – usually semi-structured – interviews (O’Loughlin, 2013; Jeong, 2013; Vogt & Tsagari, 2014). In the latter case, the qualitative analysis of the information gathered by interviews is meant to elaborate and clarify the quantitative data offered by the questionnaire. Rarely, the purpose of collecting both quantitative and qualitative data is materialized by the use of questionnaires that include sections for question elaboration and other written comments (see, for instance, Malone, 2013). In general, scholars have made extensive use of interviews but often as a supplement to other types of data. While in most cases these interviews were conducted on an individual basis (see, for instance, O’Loughlin, 2013), group interviews were also used in some methodological designs (e.g. Malone, 2013). Compared to individual interviews, group interviews are considered to maximize interactions between participants. However, they always entail the risk of failing to collect the interviewees’ actual beliefs, since participants in group interviews might influence one another or conform their discourse to the group’s (see Malone, 2013, pp. 334–335). These limitations might explain the choice of some scholars to use private conversations instead of interviews as a means to collect qualitative evidence for their research (e.g., Arkoudis & O’Loughlin, 2004). In the recent literature on LAL, there are few studies that drew exclusively on data gathered from interviews. Deneen and Brown (2016) is one of them. In most cases, interviews are used in addition to other data collection tools (e.g. Gu (2014); Tsagari & Vogt (2017) Empirical investigations on practitioners and stakeholders have offered important contributions to the field, but they do not monopolize the relevant research. Significant findings and insights have also been presented by literature and textbooks reviews (Davies 2008; Allen & Negueruela-Azarola, 2010). There are also other studies, such as position papers (Boud, 2000; Carless, 2007; Popham, 2009; Stiggins, 2006, 2012; Scarino, 2017, among many others), and assessment literacy course surveys and overviews (Brown & Bailey, 2008; Jin, 2010; Lam, 2015). Papers discussing processes of language assessment in practice (see, for instance, Rea-Dickins, 2001, 2006; Gu, 2014) and the implementation of state-wide assessment reforms (e.g. Davison, 2013; Hamp-Lyons, 2016) have also played a significant role. Case studies constitute the vast majority of research (see, among others, Pill & Harding, 2013; Kvasova & Kavytska, 2014; Hidri, 2016; Gu, 2014). Large-scale investigations on different countries are also common (Hasselgreen et al., 2004; Brown & Bailey, 2008; Vogt & Tsagari, 2014), while comparative studies are significantly fewer (e.g., Davison, 2004; Cheng, Rogers & Hu, 2004; East, 2015). In general, research in LAL has been

24 Dina Tsagari

developed in isolation from other disciplines. According to Davies, this is the cost of the growing professionalism in the field (2008, p. 341). The overall tendency for LAL research is the non-inclusion of other disciplines’ frameworks (e.g. Xu & Brown, 2016) and methods (e.g. Antoniou & James, 2014). Nevertheless, exemptions do exist, and some authors have presented interdisciplinary works that combine LAL interests with ethnographic methods (see Hill & McNamara, 2012) or discourse analysis principles (see Leung & Mohan, 2004). Participants The vast majority of empirical investigations on LAL elicited data from informants who were either foreign/second language teachers or testing and assessment instructors. However, at times the participants’ professional identity is not very clearly identified – especially in large-scale investigations. Thus, many empirical studies drew on data from participants who were teaching foreign languages at both secondary and university level (Fulcher, 2012; Vogt & Tsagari, 2014; Mazandarani & Troudi, 2017). Similarly, some investigations recruited assessment and testing instructors as participants, who were also languages instructors (e.g. Hasselgreen et al., 2004). In his work, East (2015) clearly distinguishes participant groups (i.e. Australian EFL teachers teaching in secondary schools only, grouped by subject language), but this is rather an exception compared to the overall tendency. It is very common for researchers to use the whole classroom context in order to collect information about actual, classroom-based assessment practices. The vast majority of these works focus on the teacher’s role in the assessment process (e.g. Rea-Dickins, 2001, 2006; Gu, 2014), while the students’ role gains some attention only in investigations that focus on teacher–student interaction (see Leung & Mohan, 2004). Remarkably, research on the assessment literacy of other stakeholders is limited. An exception to this is Pill and Harding (2013) who used the transcripts of thirteen hearings of the Australian House Standing Committee on Health in order to investigate the misconceptions of policy makers about language testing. O’Loughlin’s work (2013) also explored the assessment literacy levels of university staff in two Australian universities. Participants in LAL research were largely self-identified and self-volunteered. In most cases, they were contacted or recruited through professional lists, mailing lists, and social networks (e.g. Fulcher, 2012; Jeong, 2013; Kvasova & Kavytska, 2014). Cases of classroom- or course-observation analyses can be considered as an exemption to this tendency. Therefore, it is reasonable to assume that the participation of a classroom (or any other educational unit) in scientific investigations entails some degree of collaborative spirit at some level of the educational administration. Since a large amount of research has been conducted on the basis of web-collected information (e.g. Kremmel & Harding, 2017, 2019), the

Concepts, challenges, and prospects 25

exact geographical distribution of participants is not possible. Nevertheless, the available data and the contexts of research suggest that participants in LAL investigations were predominantly from Australia, United States, United Kingdom, Europe, Canada, Hong Kong, and China. Fewer studies have focused on the educational context in the Middle East (Kiomrs et al., 2011; Mazandarani & Troudi, 2017) and North Africa (Hidri, 2016). Assessment practitioners and stakeholders from Middle East, Africa and South America have been generally overlooked by research so far; at best, the latter geographical regions are underrepresented in large-scale questionnaire samples.

Conclusions Based on the findings and the points of agreement in or divergence from claims and research outcomes presented above, the literature on LAL does not reflect an entirely optimistic view. Authors have repeatedly observed a gap between the theoretical standards of LAL and actual language assessment practices. Other scholars have explicitly expressed doubts about whether the field of LAL has really evolved in recent research. Still, essential components of the LAL framework need further research, and the promotion of LAL is still under investigation. There is no doubt that research on LAL will improve our theoretical designs, and new frameworks are likely to appear in the future. These future models will probably accommodate defects and problems of previous conceptualizations. However, as the overview of conceptualizations shows, some crucial aspects of LAL should be prioritized in future investigations because LAL components and practices are not definite or clearly articulated. As Inbar-Lourie (2016, 2017) observes, the field is characterised by absence of the language trait from some of the definitions offered in the literature. Therefore, further research should clarify the relation between assessment literacy and language assessment literacy (see Kremmel et al., 2017) as they are commonly treated as freely interchangeable, which seems to be due to the existing theoretical vagueness. Nevertheless, when referring to the language trait, it seems that the field does not have a clear definition of language (see also Kremmel et al., 2017). In her studies, Scarino (2008, 2017) observes that the conception of the language construct has changed through time. Language assessment has shifted from a cognitivist approach to language to a more communicative approach and, recently, to an intercultural approach. Of course, language can be approached from different perspectives, and language teaching and assessment can focus on different dimensions of language use, especially within multilingual contexts, which have become a challenge for most educational contexts today (Schissel, Leung, & Chalhoub-Deville, 2019). Future research should provide a clear definition of language in a more holistic framework that incorporates useful insights from all approaches to language.

26 Dina Tsagari

Scholars also need to conclude with a clear and generally acceptable definition of LAL. A major trend in the field suggests that LAL is made up of certain components. However, both the number and the classification of these components are debatable. Taylor (2009), for instance, hypothesizes that LAL consists of eight components, including principles and concepts, sociocultural values, local practices, and personal beliefs. An obvious problem with such a classification is that it is not entirely clear where the line between the different components should be drawn. On a theoretical level, it seems reasonable to distinguish between, for instance, socio-cultural values and personal beliefs. Nevertheless, when it comes to examinations of actual attitudes and practices, the distinction between the sociocultural pattern and personal behaviour does not seem that straightforward and requires clear, supportive evidence. The point is that LAL conceptualizations should not focus entirely on addressing theoretical requirements but should also provide some framework that can incorporate real assessment performances. Another challenge in the conceptualizations of LAL concerns the role of context. More and more, authors acknowledge that contextual factors have a significant effect on the development of LAL and the implementation of assessment practices (e.g. Hill 2017a, 2017b; Tsagari, 2017). In the literature, authors claim that language assessment can be affected by parameters, such as class size and administrative requirements (e.g. Cheng et al., 2004). Although it is undeniable that assessment practices as well as LAL development are affected by contextual considerations, there is still a need for a clear definition of these considerations. In addition to the need for a more systematic and general study of context, the literature also reveals a need for a more intensive investigation of contextual factors that until now have been widely overlooked. It is not until very recently that scholars have started to investigate parameters such as the demographics of language teachers (e.g. experienced vs novice teachers; see Hildén & Fröjdendahl, 2018). Similarly, authors concentrated on studying LAL in the context of teaching English, overlook assessment practices and needs in the context of teaching other languages. The educational level factor has been also somewhat neglected in studies. Authors tend to collect data and suggest theoretical models without distinguishing among educational levels. This seems to imply that teaching, learning, and assessment can be treated in a uniform way whether it refers to primary, secondary, or tertiary educational levels. It is obvious, though, that each educational level has different aims and needs and suggests a different context. Thus, future research should investigate these contexts separately and suggest ways to promote LAL according to the context and needs of each educational level. On this view, the field of LAL should also investigate the ways in which LAL could be better communicated to all levels involved. Another aspect of context that needs careful consideration is the professional context of non-practitioner stakeholders, such as policy makers or administration officers. Lack of LAL might be a sign of limited professionalism for

Concepts, challenges, and prospects 27

language teachers, but this is not the case for administration officers or policy makers. As a result, general conceptualizations of dimensions are provided for non-practitioners on the basis of the practitioner’s context, such as professional ethics, decision making, and attitudes, and if policy makers, for instance, have the same professional ethics with teachers. While the findings of relevant research revealed misconceptions of LAL among practitioners as well as limitations in the implementation of assessment practices, the findings are usually explained on the basis of practitioners’ training and professional development. Thus, the practitioner’s professional perspective should promote LAL to these groups, and non-practitioners should not be expected to adapt to practitioners’ needs and intentions. The aforementioned considerations of suggested models and conceptualizations should be equally addressed in future efforts to promote LAL. For instance, the importance of contextual factors should concern not only research projects but also the training provided. Training in LAL should not be designed and delivered according to some general conception of assessment literacy. Instead, it should be formed with respect to the contextual parameters and the needs of the target stakeholders’ group. Thus, teachers of primary education should receive different training from university teachers. Similarly, training programmes should be designed according to the educational context of each country. A training programme designed for English language teachers in China cannot be transferred and applied as is in the context of French language teaching in Greece. In addition to carefully designed training programmes, LAL should be promoted through other means, such as web resources (e.g. the TALE project, http://taleproject.eu), online tutorials, seminars, and workshops. Again, these materials should reflect the contextual considerations that apply to each stakeholder group and each educational environment. Ideally, practitioners and other test users should be given the chance to practice and experience testing and assessment processes in structured educational opportunities. Research shows that language teachers tend to adopt assessment practices through experience and by observing others (see Vogt & Tsagari, 2014). Similarly, O’Loughlin (2013) suggests that LAL of university administrative staff could be raised if the latter could actually take the test they use. O’Loughlin’s proposal is worth examining in practice and even generalized for other stakeholders, too. Moreover, there is a lot yet to be learnt about the protagonists of assessment – students and teachers, and how they enact assessment policy mandates in their daily practices. Research should shed light on students’ perspective on assessment practices (see Rea-Dickins 2001; Malone 2016, 2017; Tsagari, 2013). Also, if assessment is meant to be for learning, then LAL should provide some account of what constitutes evidence of language learning (see Rea-Dickins, 2001).

28 Dina Tsagari

Of course, the most powerful factor in assessment practices is the wider educational and cultural conceptions of assessment. Training and experience can do very little in changing society’s views of testing and assessment. In general, people tend to require certifications as proof of knowledge or expertise. Interestingly enough, people do not seem so obsessed with external, standardized examinations when it comes to other important qualifications, such as the ability to drive a car. Thus, providing appropriate information to interested parties and applying good assessment practices can be a timely and effective way to change assessment cultures.

References Allen, H. W., & Negueruela-Azarola, E. (2010). The professional development of future professors of foreign languages: Looking back, looking forward. The Modern Language Journal, 94(3), 377–395. Antoniou, P., & James, M. (2014). Exploring formative assessment in primary school classrooms: Developing a framework for actions and strategies. Educational Assessment, Evaluation and Accountability, 26(2), 153–176. Arkoudis, S., & O’Loughlin, K. (2004). Tensions between validity and outcomes: Teacher assessment of written work of recently arrived immigrant ESL students. Language Testing, 21(3), 284–304. Badia, H. (2015). English language teachers’ ideology of ELT assessment literacy. International Journal of Education & Literacy Studies, 3(4), 42–48. Baker, B. A., & Riches, C. (2017). The development of EFL examinations in Haiti: Collaboration and language assessment literacy development. Language Testing, 35(4), 557–581. Barkaoui, K., & Valeo, A. (2017). Designing L2 writing assessment tasks for the ESL classroom: Teachers’ conceptions and practices [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Berry, V., & Sheehan, S. (2017). Exploring teachers’ language assessment literacy: A social constructivist approach to understanding effective practice. In Learning and assessment: Making the connections: Proceedings of the ALTE 6th International Conference. http://events.cambridgeenglish.org/alte2017-test/perch/resources/alte-2017-procee dings-final.pdf. Berry, V., Sheehan, S., & Munro, S. (2017b). Mind the gap: Bringing teachers into the language literacy debate [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Berry, V., Sheehan, S., & Munro, S. (2017a). What do teachers really want to know about assessment? [Conference paper]. The 51st Annual International IATEFL Conference. Glasgow, United Kingdom. Boud, D. (2000). Sustainable assessment: Rethinking assessment for the learning society. Studies in Continuing Education, 22(2), 151–167. Brindley, G. (2001a). Language assessment and professional development. In C. Elder, A. Brown, K. Hill, N. Iwashita & T. Lumley (Eds.), Experimenting with uncertainty: Essays in honour of Alan Davies (pp. 126–136). Cambridge University Press. Brindley, G. (2001b). Outcomes-based assessment in practice: Some examples and emerging insights. Language Testing, 18(4), 393–407.

Concepts, challenges, and prospects 29 Brown, J. D., & Bailey, K. M. (2008). Language testing courses: What are they in 2007? Language Testing, 25(3), 349–383. Carless, D. (2007). Learning-oriented assessment: Conceptual basis and practical implications. Journal of Innovations in Education and Teaching International, 44(1), 57–66. Cheng, L., Rogers, T., & Hu, H. (2004). ESL/EFL instructors’ classroom assessment practices: Purposes, methods, and procedures. Language Testing, 21(3), 360–389. Cooke, S., Barnett, C., & Rossi, O. (2017). An evidence-based approach to generating the language assessment literacy profiles of diverse stakeholder groups [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Davies, A. (2008). Textbook trends in teaching language testing. Language Testing, 25(3), 327–347. Davison, C. (2004). The contradictory culture of teacher-based assessment: ESL teacher assessment practices in Australian and Hong Kong secondary schools. Language Testing, 21(3), 305–334. Davison, C. (2013). Innovation in assessment: Common misconceptions and problems. In K. Hyland & L. L. C. Wond (Eds.), Innovation and change in English language education (pp. 263–275). Routledge. Deneen, C. C., & Brown, G. T. L. (2016). The impact of conceptions of assessment on assessment literacy in a teacher education program. Cogent Education, 3(1), doi:10.1080/2331186X.2016.1225380. East, M. (2015). Coming to terms with innovative high-stakes assessment practice: Teachers’ viewpoints on assessment reform. Language Testing, 32(1), 101–120. Engelsen, K. S., & Smith, K. (2014). Assessment literacy. In C. Wyatt-Smith, V. Klenowski & P. Colbert (Eds.), Designing assessment for quality learning (pp. 91–107). Springer Netherlands. Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment Quarterly, 9(2), 113–132. Giraldo, F. (2018). Language assessment literacy: Implications for language teachers. Profile: Issues in Teachers’ Professional Development, 20(1), 179–195. Gu, P. Y. (2014). The unbearable lightness of the curriculum: What drives the assessment practices of a teacher of English as Foreign Language in a Chinese secondary school? Assessment in Education: Principles, Policy & Practice, 21(3), 286–305. Hamp-Lyons, L. (2016). Implementing a learning-oriented approach within English language assessment in Hong Kong schools: Practices, issues and complexities. In G. Yu & Y. Jin (Eds.), Assessing Chinese learners of English (pp. 17–37). Palgrave MacMillan. Harding, L., & Kremmel, B. (2016). Teacher assessment literacy and professional development. In D. Tsagari & J. Banerjee (Eds.), Handbook of second language assessment. Handbooks of Applied Linguistics (pp. 413–428). Mouton De Gruyter. Harsch, C., Seyferth, S., & Brandt, A. (2017). Developing assessment literacy in a dynamic collaborative project: What teachers, assessment coordinators, and assessment researchers can learn from and with each other [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Hasselgreen, A., Carlsen, C., & Helness, H. (2004). European survey of language testing and assessment needs: Report: part one – general findings. European Association for Language Testing and Assessment. http://www.ealta.eu.org/documents/resources/survey-rep ort-pt1.pdf.

30 Dina Tsagari Hernández Ocampo, S. P. (2017). How literate in language assessment should English teachers be?[Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Hidri, S. (2016). Conceptions of assessment: Investigating what assessment means to secondary and university teachers. Arab Journal of Applied Linguistics, 1(1), 19–43. Hildén, R., & Fröjdendahl, B. (2018). The dawn of assessment literacy – exploring the conceptions of Finnish student teachers in foreign languages. Apples – Journal of Applied Language Studies, 12(1), 1–24. Hill, K. (2017a). Language teacher assessment literacy – scoping the territory. Papers in Language Testing and Assessment, 6(1), iv–vii. Hill, K. (2017b). Understanding classroom-based assessment practices: A precondition for teacher assessment literacy. Papers in Language Testing and Assessment, 6(1), 1–17. Hill, K., & McNamara, T. (2012). Developing a comprehensive, empirically based research framework for classroom-based assessment. Language Testing, 29(3), 395–420. Inbar-Lourie, O. (2008). Constructing a language assessment knowledge base: A focus on language assessment courses. Language Testing, 25(3), 385–402. Inbar-Lourie, O. (2017). Language assessment literacies and the language testing communities: A mid-life identity crisis? [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Inbar-Lourie, O. (2016). Language assessment literacy. In E. Shohamy, I. Or & S. May (Eds.), Language testing and assessment. (pp. 257–270). Springer International Publishing. Jeong, H. (2013). Defining assessment literacy: Is it different for language testers and non-language testers? Language Testing, 30(3), 345–362. Jin, Y. (2010). The place of language testing and assessment in the professional preparation of foreign language teachers in China. Language Testing, 27(4), 555–584. Kim, A. A., Chapman, M., & Wilmes, C. (2017). Developing materials to enhance the assessment literacy of Parents and Educators of K-12 English language learners [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Kim, A. A., Chapman, M., Wilmes, C., Cranley, M. E., & Boals, T. (2017). Validation research of preschool language assessment for dual language learners: Collaboration between educators and test developers [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Kiomrs, R., Abdolmehdi, R., & Naser, R. (2011). On the Interaction of test washback and teacher assessment literacy: The case of Iranian EFL secondary schools teachers. English Language Testing, 4(1), 156–160. Kremmel, B., & Harding, L. (2017). Towards a comprehensive, empirical model of language assessment literacy across different contexts [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Kremmel, B., & Harding, L. (2019). Towards a comprehensive, empirical model of language assessment literacy across stakeholder groups: Developing the language assessment literacy survey. Language Assessment Quarterly, 17(1), 100–120. Kremmel, B., Eberharter, K., & Harding, L. (2017). Putting the ‘language’ into language assessment literacy [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Kvasova, O., & Kavytska, T. (2014). The assessment competence of university language teachers: A Ukrainian perspective. Language Learning in Higher Education: Journal of the European Confederation of Language Centres in Higher Education (CercleS), 4(1), 159–177.

Concepts, challenges, and prospects 31 Lam, R. (2015). Language assessment training in Hong Kong: Implications for language assessment literacy. Language Testing, 32(2), 169–197. Leung, T., & Mohan, B. (2004). Teacher formative assessment and talk in classroom contexts: Assessment as discourse and assessment of discourse. Language Testing, 21(3), 335–359. Malone, M. E. (2013). The essentials of assessment literacy: Contrasts between testers and users. Language Testing, 30(3), 329–344. Malone, M. E. (2017). Including student perspectives in language assessment literacy [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Malone, M. E. (2016). Training in language assessment. In E. Shohamy, I. Or & S. May (Eds.), Language testing and assessment (pp. 225–239). Springer International Publishing. Mazandarani, O., & Troudi, S. (2017). Teacher evaluation: what counts as an effective teacher? In Hidri S., & C. Coombe (Eds.), Evaluation in foreign language education in the Middle East and North Africa (pp. 3–28). Springer International Publishing. Mede, E., & Atay, D. (2017). English language teachers’ assessment literacy: The Turkish context. DilDergisi, 168(1), 43–60. O’Loughlin, K. (2013). Developing the assessment literacy of university proficiency test users. Language Testing, 30(3), 363–380. Pill, J., & Harding, L. (2013). Defining the language assessment literacy gap: Evidence from a parliamentary inquiry. Language Testing, 30(3), 381–402. Plake, B. S., & James, C. I. (1993). Teacher assessment literacy questionnaire. University of Nebraska-Lincoln. Popham, W. J. (2009). Assessment literacy for teachers: Faddish or fundamental? Theory into Practice, 48(1), 4–11. Rea-Dickins, P. (2006). Currents and eddies in the discourse of assessment: A learningfocused interpretation. International Journal of Applied Linguistics, 16(2), 163–188. Rea-Dickins, P. (2001). Mirror, mirror on the wall: Identifying processes of classroom assessment. Language Testing, 18(4), 429–462. Restrepo, E., & Jaramillo, D. (2017). Preservice teachers’ language assessment literacy development [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Scarino, A. (2017). Developing assessment literacy of teachers of languages: A conceptual and interpretive challenge. Papers in Language Testing and Assessment, 6(1), 18–40. Scarino, A. (2013). Language assessment literacy as self-awareness: Understanding the role of interpretation in assessment and teacher learning. Language Testing, 30(3), 309–327. Scarino, A. (2008). The role of assessment in policy-making for languages education in Australian schools: A struggle for legitimacy and diversity. Issues in Language Planning, 9(3), 344–362. Schissel, L. J., Leung, C., & Chalhoub-Deville, M. (2019). The construct of multilingualism in language testing. Language Assessment Quarterly, 6(4–5), 373–378. Sheehan, S., & Munro, S. (2017). Assessment: attitudes, practices and needs: Project report. British Council. https://www.teachingenglish.org.uk/sites/teacheng/files/pub_ G239_ELTRA_Sheenan%20and%20Munro_FINAL_web%20v2.pdf Stiggins, R. (1991). Assessment literacy. Phi Delta Kappa, 72(7), 534–539. Stiggins, R. (2006). Assessment for learning: A key to motivation and achievement. Edge: The Latest Information for the Education Practitioner, 2(2), 1–19.

32 Dina Tsagari Stiggins, R. (2012). Classroom assessment competence: The foundation of good teaching. http:// images.pearsonassessments.com/images/NES_Publications/2012_04Stiggins.pdf Taylor, L. (2013). Communicating the theory, practice and principles of language testing to test stakeholders: Some reflections. Language Testing, 30(3), 403–412. Taylor, L. (2009). Developing assessment literacy. Annual Review of Applied Linguistics, 29, 21–36. Tsagari, D. (2013). EFL students’ perceptions of assessment in higher education. In D. Tsagari, S. Papadima-Sophocleous & S. Ioannou-Georgiou (Eds.), International experiences in language testing and assessment (pp. 117–143). Peter Lang. Tsagari, D. (2017). The importance of contextualizing language assessment literacy [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Tsagari, D., & Vogt, K. (2017). Assessment literacy of foreign language teachers around Europe: Research, challenges and future prospects. Papers in Language Testing and Assessment, 6(1), 41–63. Valeo, A., & Barkaoui, K. (2017). How teachers’ conceptions mediate their L2 writing assessment practices: Case studies of ESL teachers across three contexts [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Villa Larenas, S. (2017). Language assessment literacy of EFL teacher trainers [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Vogt, K., & Tsagari, D. (2014). Assessment literacy of foreign language teachers: Findings of a European study. Language Assessment Quarterly, 11(4), 374–402. Xu, Y., & Brown, G. T. L. (2016). Teacher assessment literacy in practice: A reconceptualization. Teaching and Teacher Education, 58, 149–162. Xu, Y., & Brown, G. T. L. (2017). University English teacher assessment literacy: A survey-test report from China. Papers in Language Testing and Assessment, 6(1), 133–158. Yan, X., Zhang, C., & Fan, J. J. (2018). ‘Assessment knowledge is important, but…’: How contextual and experiential factors mediate assessment practice and training needs of language teachers. System, 74, 158–168. doi:10.1016/j.system.2018.03.003. Yan, X., Fan, J. J., & Zhang, C. (2017). Understanding language assessment literacy profiles of different stakeholder groups in China: The importance of contextual and experiential factors [Conference paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia. Yastıbas¸, A. E., & Takkaç, M. (2018). Understanding language assessment literacy: Developing language assessment. Journal of Language and Linguistic Studies, 14(1), 178–193.

Chapter 3

Traditional assessment and encouraging alternative assessment that promotes learning Illustrations from EAP Lee McCallum Introduction The field of language testing (also termed language assessment in this chapter and other work [e.g. O’Sullivan, 2011]) involves the process of designing language tests, testing students, and using this data for evaluation and decision-making purposes (Davies, Brown, Elder & Hill, 1999). Language testing enjoys a rich, complex, and often misinterpreted history, with tests simply defined as instruments that elicit certain behaviour from candidates whereby this behaviour is used to make inferences about a candidate’s language ability (Carroll, cited in Bachman, 1990). These inferences are often reflected in a numerical score which is used against a benchmark level to set entry into higher education, training or employment opportunities, and to govern immigration to an often-English-speaking country (Shohamy, 1998, 2001a). These inferences are facilitated by using standardized tests where test administration, content, format, language, and scoring procedures are equal for all test takers. This allows scores across test populations to be easily compared (Popham, cited in Menken, 2008). The history of language testing has been mapped according to different time periods and ‘waves of scholarship’ including Spolsky (1976) and Davies’ (1978) three stages: pre-scientific, psychometric–structural, and psycholinguistic–sociolinguistic as well as Morrow’s (1979) time period classification of the same eras: the ‘Garden of Eden’, the ‘Vale of Tears’, and the ‘Promised Land’. Shohamy’s (1996) distinct fivestage categorization is partly guided by test task typologies: discrete-point, integrative, communicative, performance testing, and alternative assessment, which span more than a century of testing practices. These waves are steeped in economic, social, and political influences that steer the direction of testing in mainstream education. Weir, Vidakovic, and Galaczi (2013) summarize how tests have always been gatekeeping tools to prevent mass immigration such as the US immigration that took place in the post-world war years, while Spolsky (2008) highlights how the Chinese first introduced formal selection testing for elite government positions and

34 Lee McCallum

this later transcended to education in Europe with France, Italy, and the UK using tests to decide entry into higher education. Given these uses, it is important to recognize that the last two waves of scholarship of psychometric–structural and psycholinguistic–sociolinguistic play a key role in the understanding, promotion, use of tests, and the desire to change them (Fulcher, 2000). The field of psychometrics is viewed as the cornerstone of ‘traditional’ testing with its focus on objectively measuring mental traits such as language ability, whereas the increasingly influential psycholinguistic–sociolinguistic wave champions the need for fairer communicative testing that is more socially aware, ethical, and grounded in ‘re-humanizing’ the testing process (Fulcher, 2000). This chapter acknowledges the vast history of testing, yet does not strive to simply remap it. Instead, it follows other theoretically motivated work such as Alderson and Banerjee (2001) by presenting an overview of the landscape of traditional testing under two broad principled sections that cover pertinent issues in testing. The two principled sections – the reliance on statistically robust psychometric scoring practices to meet the aim of selecting the highest scoring students for entry into higher education, and beliefs that ‘Standard English’ is the testing model to be followed – help outline traditional testing’s key tenets and shape the chapter in a logical manner. These sections will also refer to task types and how they facilitate testing goals. The chapter will also examine the same two principled sections through the lens of Critical Language Testing (CLT) to illuminate how, by situating itself in critical theory and critical applied linguistics, CLT offers alternative views that are more concerned with prioritizing test takers than the scientific musings that traditional testing offers. CLT recognizes the power that tests yield over test takers and aims to promote more interpretive, open scoring procedures which call for the inclusion of local varieties of English in language testing. In examining these tenets, the chapter focuses on providing theoretical and empirical research evidence from the narrow context of English for Academic Purposes (EAP), which involves the teaching and testing of academic English that is needed for tertiary level study. It is hoped that such an analysis can contribute to the wider literature on language assessment literacy and help illuminate task type options to teachers and test designers who need to include a range of task types in their assessments to capture the range of skills they need to test.

Review of the literature Traditional language testing The paradigmatic stance and aims of traditional testing Traditional language testing has largely been shaped by its middle stage of psychometric–structuralist approaches, which embody the influence of psychology and psychometrics. These influences conjure up the traditional notions of language testing as being characterized by objectively scored items and final results that are

Traditional and alternative assessment 35

easily quantifiable and form the basis of key decision-making processes (Spolsky, 1976). Psychometrics, and its reliance on statistics, has played a pivotal role in language assessment since the late 1950s coinciding with and being influenced by structural linguistics. This influence continues today with standardized tests operating at all levels of education in different countries including China (see Jenkins (2014) for an overview of China’s ‘Gaokao’ high school exit exam that determines entry to study opportunities), and in the UK, US, and Australia with proficiency exams such as IELTS (International English Language Testing System) and TOEFL (Test of English as Foreign Language) required to gain entry into higher education (Weir et al., 2013). Fulcher (2010, 2014) places this historical reliance on psychometrics in a wider sphere of testing being viewed as a natural science and having its roots, in hard, generalizable, objective positivism and realism whereby language ability was measurable and able to be isolated from the person producing it. This means objective tests were supported for their purity in measuring a single construct well and in achieving high levels of validity and reliability as well as for being capable of being objectively scored and administered to large populations (Spolsky, 1994). Moss, Pullin, Gee, and Haerbel (2005) further outline the underlying goal of psychometrics, and thus traditional testing, in that they seek to develop interpretations that are generalizable across individuals and contexts, and to understand the limits of those generalizations. In seeking out these generalizations, interpretations characterize groups of individuals who score the same in the trait being tested. This stance also highlights how test scores are interpreted, with increasing test scores symbolizing proof of educational gains in knowledge (Moss et al., 2005). However, this interpretation of scores remains dubious because it disregards how learners are taught or prepared to answer questions and perform tasks on the knowledge appearing in the test (Miller & Legg, 1993). This condenses curricula and means that while learners ‘appear’ to gain knowledge, this is somewhat artificially gained from a narrow knowledge base that is decided by the test, which has in turn been decided by those tasked with making selection decisions (Shohamy, 1996). This base is of largely receptive knowledge because tasks are designed to be uniform and easy to assess, and to require a single answer through their mediums of gap-fill, multiple-choice questions (MCQs) and true–false or matching exercises (Linn, 2000, cited in Moss et al., 2005). Figure 3.1 below shows a typical multiple-choice task whereby students are expected to choose a single correct answer. The task in Figure 3.1 closely matches the tenets of the traditional paradigm in language testing via using a task which has a limited number of possibilities and which strengthens the focus on a particular kind of knowledge by asking students to choose only one correct answer. This task achieves strong reliability in grading as answers are prescribed in the shape of an answer key. These types of tasks are a frequent occurrence in large-scale proficiency examinations which

36 Lee McCallum Questions 10 – 12 Choose the appropriate letters A, B, C or D. Write your answers in boxes 10–12 on your answer sheet. 10. Research completed in 1982 found that in the United States soil erosion A reduced the productivity of farmland by 20 per cent. B was almost as severe as in India and China. C was causing significant damage to 20 per cent of farmland. Dcould be reduced by converting cultivated land to meadow or forest. 11.By the mid-1980s, farmers in Denmark A used 50 per cent less fertiliser than Dutch farmers. B used twice as much fertiliser as they had in 1960. C applied fertiliser much more frequently than in 1960. D more than doubled the amount of pesticide they used in just 3 years. 12.Which one of the following increased in New Zealand after 1984? A farm incomes B use of fertiliser C over-stocking D farm diversification

Figure 3.1: Academic Reading Multiple Choice Task. Task taken from: https://www.ielts.org/-/ media/pdfs/academic_reading_sample_task_multiple_choice.ashx?la=en

serve large numbers of students and institutions, and are often marketed as tests that determine proficiency for a candidate’s suitability to undertake academic study or skilled employment. A further paradigmatic tenet of this reliance on psychometrics stems from the belief that the scores adhere to normal distribution where the majority of scores pool at the mean and other scores deviate from the mean to create a bell-shaped curve (Douglas, 2010). This normal distribution pattern is used to determine access to resources such as higher education, with scores benchmarked against a cut-off score that determines pass or fail decisions and, ultimately, the benefits available to the test taker (Spolsky, 1995). Shohamy (2001a) explains that assessment stakeholders perceive these scores as objective, legitimate, and a mark of achievement; however, several scholars have appreciated that scores can have serious consequences for test takers with Cattell, cited in Spolsky (1995), recognizing the serious decisions being made from these scores, and that the testing community has a responsibility to ensure scores are reliable. This stance was also taken by Edgeworth (1888); however, Edgeworth’s (1888) view on ensuring

Traditional and alternative assessment 37

reliability lay in rater reliability with the concept of ‘reliability’ chiefly concerned with the measurement consistency of scoring practices (Carr, 2011). These concerns, whether socially guided or purely statistically investigated, highlight issues that were also recognized and investigated by Thorndike (1904) in the early 20th century, and in Cambridge and Oxford University and UCLES (University of Cambridge Local Examinations Syndicate) circles at much later dates (Weir et al., 2013). Thorndike (1904), cited in Spolsky, (1995), also recognized that fairness to test takers meant scoring questions to reflect their level of difficulty, and therefore weighted scoring meant that the candidates answering more challenging questions received more marks than those answering simpler questions. In keeping with a focus on the test and its target inferences, traditional testing also supports a narrow conceptualization of validity, with Messick’s (1989) content and construct validity receiving extensive attention from testers. Content validity addresses how representative the test is as a sample of a course’s syllabus, whereas construct validity is an evaluation of how well the test’s scores reflect what the test claims to be measuring (Davies, et al., 1999). In EAP, content and construct validity have been traditionally governed and driven by a needs analysis of learners whereby, as Fulcher (1999) indicates, ‘content validity’ is taken to be representing the course students were taught, and ‘construct validity’ ensures that the test examines what skills it claims to be designed to test, while also ensuring that it tests the skills the course aimed to develop. It is of utmost importance that the testing community, including instructors, realize that reliability and validity are not absolute qualities, and that the two tenets operate on a continuum, much like discrete-integrative/communicative whereby neither will ever be absolute properties. It is equally important to realize that the political, social, economic, and cultural terrains that tests operate under contribute to balancing this continuum (McNamara, 2001). This narrow, traditional view of validity does not consider the social consequences of the test. Thus, in combination with reliability, a narrow range of single answer discrete test items, and a narrow scoring scale, testing therefore views language proficiency as a single unitary construct that can be isolated from human, social, and test administration factors. In achieving a single measurable construct, traditional testing supports validity and reliability practices that are geared towards ensuring that the test document allows testers to make the types of inferences they aim to make. This approach, Shohamy (2004) argues, is still embedded in a measurement lens because it further distinguishes between those with lower and higher levels of predetermined knowledge. These practices, including the weighting of items, still rely on the design and selection of items that elicit a single correct answer, signalling that as a unidimensional construct, language proficiency can be isolated and measured as a single trait with the right tools (Bachman, 2000).

38 Lee McCallum

The quest for uniform, single measurement is facilitated by an interconnected belief in structural linguistics (Davies, 2003). Structural linguistics began to play a significant role in language testing in the 1920s with its importance peaking in the 1960s with the influence of Lado (1961) and discrete-point testing. While this influence has diminished with the introduction of integrative, communicative task-based testing, it is important to note, as Davies (1978) does, that its influence in modern times remains balanced with later, more communicative approaches. This trend can be seen throughout the history of UK testing with UCLES Cambridge English exams such as the FCE (Cambridge English First), which functions as an exam that is accepted for study and employment opportunities, including discrete-point items. These tests, alongside IELTS, examine word and phrase knowledge with topics that reflect real-life interests and contexts. Davies (2003) highlights the role of structuralist approaches in that there are two camps or opposing ends of a continuum in communicative and discretepoint approaches and points out that tests must be balanced between both extremes because a test which is too structural lacks context and a test which is too integrative/communicative is too local, subjective, and ungeneralizable. A sample task that aims to balance these two perspectives is presented below in Figure 3.2. Figure 3.2’s task balances some aspects of using discrete-point items with real-life interests. In taking such an approach, there is little scope for negotiating an answer with students required to produce grammatically correct choices which are limited by the surrounding co-text. A key tool in balancing the continua of reliability and validity and discretecommunicative is the use of rigorously tested measurement scales. Traditional testing has used measurement scales since the early Thorndike (1904) scales of the 20th century; however, tests now use more pluralistic scales which measure several interconnected micro skills and features. This is illustrated by examining the development of TOEFL since its creation in the 1960s (Spolsky, 1995). TOEFL has moved from the indirect testing of writing to directly testing writing through essay composition and, more recently, integrated tasks to reflect changing trends in applied linguistics as it moves from a structural to an integrated, communicative focus. Indeed, in practice, this has meant that the inferences being made about language ability have also changed to reflect the task types students are required to complete in higher education. The early work by Bridgeman and Carlson (1983) was influential in redefining the scope and aims of TOEFL with their study providing an essay task inventory from US universities to ensure TOEFL mirrored these tasks through assessing the writing skills needed at tertiary level. The requirement to produce a timed written piece is often argumentative in nature and, on the one hand, these essay tasks allow writing competence to be elicited in a communicative manner that partly mirrors future academic work; however, on the other hand, topic control, time limits, and prescriptive scoring scales combine to limit response types and scoring ranges, and to tighten perceptions of successful writing (Fulcher, cited in Menken, 2008).

Traditional and alternative assessment 39 This task tests candidates’ knowledge of word class with candidates required to change the word given on the left to fit the passage. For questions 17 – 24, read the text below. Use the word given in capitals at the end of some of the lines to form a word that fits in the gap in the same line. There is an example at the beginning (0). Write your answers IN CAPITAL LETTERS on the separate answer sheet. Example: 0 M E M O R A B L E _____________________________________________________________________ Family bike fun National Bike Week was celebrated last week in a (0) … … …. way with a Family Fun Day in Larkside Park. (MEMORY) The event (17) … … …. to be highlysuccessful with over five hundred people attending. (PROOF) Larkside Cycling Club brought along a (18) … … …. of different bikes to (VARY) demonstrate the (19) … … …. that family members of all ages can get from (ENJOY) group cycling. Basic cycling (20) … … …. was taught using conventional bikes. (SAFE) There were also some rather (21) … … …. bikes on display. (USUAL) One-wheelers, fivewheelersand even one which could carry up to six (22) … … (RIDE) were used forfun. The club also gave information on how cycling can help to reduce (23) … … … damage. (ENVIRONMENT) They also provided (24) … … …. as to how people could substitute the bike for the car for daily journeys. (SUGGEST) The overall message was that cycling is great family fun and an excellent alternative to driving. By the end of the day over a hundred people had signed up for membership. Figure 3.2: First Certificate in English: Use of English Task. Adapted from: http://www.cambridgeenglish.org/exams/first/preparation/

In this respect, Fulcher notes how a scale such as the TOEFL or Common European Framework of Reference (CEFR) can be misused because teachers come to view the scale as a resource to prescriptively judge learners as well as to influence curriculum development that reflects what the scale sees as signalling a higher proficiency grade. The proficiency scales also help shape knowledge by indicating, for example, linguistic features at each proficiency level, organization patterns, expected discourse markers, and also the development of ideas, meaning the writer changes or molds their response to match these criteria (Hawkins & Filipovic, 2012). In an international context, this means learners’ writing is forced to change style from the rhetoric and discourse style of their L1 to the discourse of the L2, and learners are often prepared for these changes in the form of IELTS exam preparation classes or freshman composition classes at university (Kachru, 2006).

40 Lee McCallum

Traditional testing can be seen to balance on the edges of discrete-objective and communicative and/or integrative testing and reliability and validity, which has meant a more prominent stance being taken on test ethics as testing continues to become more socially aware of test takers. The influence of sociolinguistics that helps shape current communicative testing has led to professional testers’ social responsibilities increasing and a wider call for tests to meet the needs of test takers and test users (McNamara & Roever, 2006). One such response to this protection of local needs is the International Language Testing Association’s (ITLA) (2000) ethical guidelines which were written to avoid local bias and Western hegemonic influence on testing. Despite this assumed awareness of local needs, Davies (2014) and McNamara and Roever (2006) believe a response to local needs is only undertaken by international testing bodies when negative social consequences are empirically linked to the physical test document itself. In practice, this means test ethics are addressed through appropriate topic choices that appear to accept or meet local cultural realities. McNamara and Roever (2006) question these efforts by holding the view that this acceptance of local context is superficial with the Educational Testing Service (ETS) opting for culturally appropriate topic choices because it increases their potential market value for a wider range of cultures. The next section of the chapter outlines how traditional testing’s objective aims are realized by following a ‘Standard English’ testing model.

Language standards: Adhering to standard inner circle English Traditional testing in its views of objectivity believes in using ‘Standard English’ (henceforth SE), as the testing model which matches or informs the same SE model used in pedagogy. The SE model is defined by Trudgill and Hannah (2008) and Davies (2013) as the variety of English that is normally used by educated native speakers with these speakers belonging to Kachru’s (1986) Inner Circle countries. Inner Circle countries are those who hold the traditional cultural and linguistic bases of English and have traditionally had ‘ownership’ of the language (e.g. the UK, US, and Australia) with US and UK English favoured as pedagogic and testing models because of their wide dispersion and robust codifications over time (Hickey, 2015). Testing models that favour these standard varieties facilitate objectivity and allow test aims to be standardized across populations with test takers being assessed on these prescriptive Inner Circle norms. The SE that is referred to in Trudgill and Hannah (2008) emanates from traditional prescriptive grammar from 18th century grammarians, such as the work of Lowth, and is firmly guided by what people ought to say or write, and disregards what people actually say (descriptivism) as erroneous and crucially of a lower standard or status (Freeborn, 2006). This view of SE is seen as ‘correct’ English and a marker of being highly educated. Seargeant (2012) also refers to SE as being rule-governed (and these rules set out what the ‘proper’ form of the language is) and raises awareness

Traditional and alternative assessment 41

that SE is set and maintained by authorities of the standard, such as prescriptive grammarians, and institutions including the Oxford English Dictionary. This view of SE also assumes that a dependent relationship exists in teaching and testing, with countries outside the politically powerful Inner Circle countries being dependent on using the native norms of SE for teaching English as a foreign language (Kachru, 1986). Kachru (1986) sees this relationship as Inner Circle countries being ‘norm-providing’ and those outside the Inner Circle being ‘norm-dependent’. These norms and their dependency link back to structuralism and Chomskyan (1965) and Quirkian (1990) views of linguistics in that the only model to be promoted as reliable and valid is the native speaker model with little serious credit given to alternative models.

Criticisms levied at traditional testing The main tenets of traditional testing can be summarized as adhering to epistemological and ontological pillars that promote objectivity, uniformity, and an adherence to standardization across learner language production. Standardization in scoring procedures, interpreting these scores, and using tests to classify and sort second language learners into groups that signal levels of ability at a single time are all characteristics of traditional testing that influence the testing we see today in EAP contexts. In outlining these characteristics, several tensions arise from the literature. Spolsky (1994, 1995) tracks the tensions between seeking psychometric rigour, objectivity, and reliable measurement, and the realization that the human trait of language proficiency is variable, multidimensional, and complex. Shohamy (2001a) also indicates that the preference for objective measurement also creates class differences in society, feelings of inferiority, and the fact that tests have become control devices because of a faith in numbers that only an elite group of people can interpret. Shohamy (2001a) furthers this ideology by acknowledging that objectivity also means tests are isolated from people and society where the test facilitates injustice and creates a culture in TESOL where current and future human worth and opportunities are determined by a test that examines a narrow knowledge and skills base. The following section of the chapter presents CLT’s response to the practices, issues, and beliefs of traditional testing.

Critical language testing CLT is a response to what scholars believe to be unjust practices that take place in the uses, design, implementation, and scoring of traditional tests (Lynch, 2001). This response is guided by the belief that the uses and consequences of tests have the power to open and close doors for people and create winners and losers simultaneously, and therefore shape test takers’ lives (Shohamy, 2001b). Shohamy (2001a) outlines how traditional testing results in those denied access to resources being marginalized and forced to conform to the expected

42 Lee McCallum

behaviour that the test elicits to gain access. It is these beliefs that lead the call for change in the testing world and that achieve a fairer system that equalizes and better distributes the current stratified sharing of resources and access to benefits (Lynch, 2001). It is important to realize that in forming a response to traditional testing, CLT strives for fair and equal testing opportunities under a critical theory framework, yet it does not advocate eradicating traditional testing. A fundamental consideration is that traditional test approaches have an appropriate use; however, CLT’s objective is to highlight that neglect of other approaches is undemocratic and thus calls for testing to be dialogic in nature where all parties involved in testing have a voice (Shohamy, 2001a). It is also important to clarify that CLT recognizes that through dialogue a harmonized medium that balances interests can be achieved, and change can take place (Trede & Higgs, 2010). This section of the chapter further explores how the tenets of traditional testing are viewed under this framework.

The paradigmatic stance and aims of CLT CLT emerges from critical applied linguistics and the ontological and epistemological foundations of critical theory. Critical theory is embedded in several branches of critical applied linguistics which aims to redress bias and hegemonic practices in testing and teaching (Pennycook, 1994). CLT is grounded in the critical theory positions of postmodernism and an ever-present need to balance injustice and transform the practice under scrutiny (Trede & Higgs, 2010). Critical theory has its roots in postmodernism in response to 19th century modernism which championed the careful separation of language from other domains such as politics (Canagarajah, 2016). Critical theory also considers the language learner not as an object following the L1 speaker (responding to Chomskyan linguistics), but instead values their language production as unique and worthy of study in its own right (Canagarajah, 2016). Critical theory appreciates that all knowledge is equally valuable, and instead of disregarding technical, objectifying knowledge or practical, interpretive knowledge, it values both as equal to critical interest knowledge which seeks to transform the current status quo (Trede & Higgs, 2010). It is also important to point out that critical theory and indeed critical testing bodies reject the traditional model of education which Freire, cited in Raddaoui and Troudi (2013), terms ‘banking’ education whereby teachers are the ‘depositors’ of knowledge and the students are the ‘depositories’. Under critical views, this model transfers teacher-dictated knowledge to passive learners who reiterate and transfer this knowledge to others who continue this cycle. Raddaoui and Troudi (2013) highlight how in valuing all knowledge types, approaches to critical theory, including that of testing, call for education to reflect not traditional ‘banking’ education but a shared partnership where students have a voice in shaping knowledge and collaborating with those in power to help create an appropriate and specific curriculum that enhances learning and ethically tests that learning.

Traditional and alternative assessment 43

Shohamy (2001b) equally outlines how tests originally aimed to provide access to services for all regardless of entitlement. However, Shohamy (2001b) argues that these tests have failed to shake off their overarching selection purpose meaning classroom teaching is forced, consciously or unconsciously by the school’s management, teachers, and would be selected students, to centre on ensuring students pass the test and receive the associated pass grade benefits. In this sense, Menken (2008) highlights how tests become the centralized language policy that dictate, from top-down, what content is taught, how, and by whom it can be taught, and in what language it is best taught. This is also indicated in the work of Hamp-Lyons (1998) at the international level with the TOEFL test dictating curriculum in schools, universities, and academies. Shohamy (2001b) explains how CLT seeks to change the top-down practice that has been created by traditional testing. This change places test takers at the heart of the testing process, and seeks to give them an active role in that process, and for power to be redistributed more fairly to reflect this new balance. Shohamy (2001b) discusses how CLT invites stakeholders – including teachers and test takers – to debate and confront the roles tests play in shaping instruction, access to education, and the creation of ‘new’ knowledge. These views are also stated by Darling-Hammond (1994) who sees a need for testing to move from a sorting tool to a developmental aid that supports learners and has greater appreciation for individuals’ unique knowledge that is often disregarded in traditional testing in favour of dominant knowledge that those in power have deemed important and therefore testable (Shohamy, 2001b). Shohamy (1998) highlights how critical perspectives view the testing process as a non-neutral practice that is ideologically laden with the values of those in power, while Messick (1989) and Alderson and Banerjee (2001) similarly note that tests contain values that are psychologically, socially, economically, and politically guided, with testing decisions reflecting all of these values through the physical test document. Noam, cited in Shohamy (1998), also clarifies how these factors merge to shape learners’ beliefs about knowledge, learning, and success, with learners believing that success equals mastering test knowledge. The stance of critical theory is further influenced by constructivism whereby there is a need to peel away the surface and uncover power. Constructivists believe that those in power decide what knowledge is valuable and whether it will be tested. These issues of the powerful deciding knowledge are explained by Foucault, cited in Benesch (2001), as being ever present. In testing, the power imbalance between stakeholders such as teachers, test designers, and test users such as institutions and test takers, is a ‘self-sustaining’ system where test designers have total non-negotiable control over the knowledge input (Shohamy, 2001a). A fundamental concept in understanding the power relations that exist in testing is Bourdieu’s (1991) symbolic power, which specifies that power relations continue to exist and are maintained because the party granting the power believes that the power exists; it is willing to give the other party power, and to allow the other party to exercise its dominance.

44 Lee McCallum

In applying this notion of symbolic power to testing, Shohamy (2001a) shows that the power of those who introduce tests derives from the trust that those affected by the tests place in them. This means that the perpetuation of a power imbalance can only flourish with the agreement of both parties and there must be a strong willingness on the part of the test takers to be dominated for the existing testing culture to prevail. Foucault (1980) explains that power and resistance coexist, and that human agency means both power and resistance are needed to maintain the situation of one party appearing to exercise power over the other. Lynch (2001) expands on this by explaining how test takers and parents favour tests because they confirm how good they are, and allow those who pass to maintain their dominance and uphold the view that education is for ‘only our people’ meaning those who also pass the test. The power balance is, as Shohamy (2001a) explains, maintained by group collaboration, and key collaborators in language testing are the many fee-demanding test preparation schools, academies, and related institutions that thrive on molding test takers into test takers that master the test’s single type of knowledge. This is best seen in EAP with many IELTS and TOEFL preparation courses and academies set up to train students to achieve their required score to enter higher education (Spolsky, jenk1995). These institutions aid the perpetuation and maintenance of this singular view of knowledge by preparing the compliant test takers who need to pass the test and gain access to the benefits that come with a pass grade. Jenkins (2014) and Mauranen, Llantada, and Swales (2010) note the damage these practices do because test takers then enter a higher education environment that requires them to demonstrate knowledge mastery collaboratively in both EAP and content degree module group tasks, albeit, to an extent, still in a restricted manner, through adhering to academic writing and speaking conventions (see Omoniyi, (2010) for a critical review of these conventions). The paradigmatic aims of CLT are realized through using a repertoire of tasks that are ‘alternatives’ to the objective, single answer tasks that traditional tests support. These alternative tasks fall under the domain of ‘alternative assessment’ which is grounded in critical theory and makes use of multiple assessment measures that assess a wide range of skills and knowledge types. Under the umbrella term of ‘alternative assessment’, there are several types of alternatives including democratic, dynamic, formative, and diagnostic assessment (Huhta, 2007). However, this chapter will not address the subtle differences in these terminologies; it will merely provide task examples that operate as alternatives to the unidimensional traditional approach (see Leung, (2007) for an elaboration of these types). These tasks count writing portfolios, journals, and peer and self-assessment as core examples (Brown & Hudson, 1998). These tasks are not one-off events; assessment takes place over an extended period of time and focuses on learning and how learners can manage weaknesses in particular areas (Lynch & Shaw, 2005). Davies (2003) considers these tasks as less formal and more used for formative rather than summative assessment, although as Teasdale and Leung (2000) and Moss et al., (2005) note, some of

Traditional and alternative assessment 45

these tasks are robustly designed and developed to ensure fairness, illustrate face validity, and guard against grades that are too subjective and cannot be justified. A sample alternative task is presented below in Figure 3.3: Overview of task In groups, learners design a brochure for the Hong Kong Tourism Board describing 4 attractions in Hong Kong which would appeal to young people of their own age. Task guidelines for learners Writing a Tourist Brochure Imagine the Hong Kong Tourism Board has asked your class to design a brochure that would be of interest to young visitors of your own age. In groups of 4, design the brochure describing FOUR sites suitable to young people of your age coming to Hong Kong. Complete this task by following the steps below: Step 1: Group task Discuss in your groups which sites young people would want to visit in Hong Kong. Choose one site each to investigate. For homework find out as much as you can about the site, where the site is, when it is open, what one can see/do there, what the facilities are, how one gets there etc. Bring this information to the next class. Step 2: Group task Exchange information with your group members. Tell them about the site you have found out about. Then decide how you are going to present the information in your brochure, what order you want to put your sites in, what illustrations you need, what title you want to give the brochure, etc. Step 3: Individual task Write a description about your chosen site (120 words). Remember to say why it is interesting. Proofread it carefully, then hand it to your teacher. Step 4: Group task In your groups, edit your work based on your teacher’s comments. Then put together your brochure. Your brochures will be assessed on the following basis: (a) Task fulfilment: would your selected sites appeal to young people? (b) Accuracy of language and information provided: Is the brochure written in good English? Is the information provided accurate? (c) Attractiveness of final written submission

Figure 3.3: Sample collaborative writing task (Adapted from the Curriculum and Development Institute (2005) and Douglas (2010)).

The sample task in Figure 3.3 represents a possible task that is suitable for a pre-university EAP course that focuses on a specific genre; it also raises awareness of audience an integrates interpersonal affective skills such as communication and critical thinking. The task also requires prolonged engagement with these skills and the opportunity to respond and react to teacher feedback.

46 Lee McCallum

Brown and Hudson (1998) suggest that these tasks can become fairer and allow learners a louder voice by negotiating assessment criteria and giving learners a say in the important elements of the task. In this respect, other collaborative tasks that can form the basis for assessment may include jigsaw reading and writing tasks (e.g. Esnawy, 2016) as well as tasks that allow students to compare their experiences on individual, pair, and group work (e.g. Bhowmik, Hillman, & Roy, 2018). Unlike the previous two tasks, this task allows negotiation of meaning and allows students to produce freer examples of written language, as opposed to the restricted output in Figure 3.2, and the focus on recognition of the correct answer in Figure 3.1.

Language standards: Appreciating difference CLT takes issue with the use of native norms in language tests with Tomlinson (2010) strongly expressing that learners are being tested on Standard English they never use, and will have very little exposure to, and no future use for, outside possible academic circles. However, in modern times, the spread of English and globalization serves and creates test populations that use English as a communication tool to communicate with other non-natives, and therefore, while traditional testing seeks to group test takers together for a specific purpose, there is a growing need to do so with consideration for language use that is localized (Canagarajah, 2006). Tomlinson (2010) further highlights the marginalization that learners experience when they are forced to learn norms that their context does not use, and Hamid (2014) and Zheng (2014) also claim that this situation forces learners to chase an impossible standard or ‘phantom’ construct of native speaker proficiency, which is not their attainment goal. Davies (2013) also indicates that justifying this standard is problematic when there is wide variation between native speakers in different Inner Circle countries. On an international level, Jenkins (2006) refers to the claims made by international testing bodies that their tests are internationally appropriate when in reality candidates are penalized for using internationally communicative forms of the language. The ‘World Englishes’ paradigm, which champions pluralistic views of language proficiency, includes work from Canagarajah (2006) and Kirkpatrick (2006) who argue alongside Kachru (1982) that proficiency involves test takers being proficient enough to communicate in the context of use and so local varieties of English and their established norms are worthy targets for inclusion in tests. This is relevant for both Outer and Expanding Circle countries, which were thought to traditionally err on the side of norm dependency; however, both do not always follow native norms, as previously thought by Kachru’s (1986) earlier work on the Expanding Circle. Norms are said to be social constructs that are generated by a complex set of ideological, socio-political, socio-economic, and cultural forces that help to establish boundaries in practice and are guided by prescriptivism (Tupas, 2010).

Traditional and alternative assessment 47

Their socially constructed nature means they are labelled with connotations of powerful native norms seen as ‘legitimate’, and non-native norms seen as ‘illegitimate’, or in the words of Quirk (1990), ‘quackery’. Hamid and Baldauf (2013) also note that non-native norms are also seen as ‘deficit forms’, or, in some cases, ‘interlanguage’, and are not recognized under traditional Second Language Acquisition (SLA) theories as legitimate. However, Groves (2010) refutes suggestions that these norms are ‘interlanguage’ by reminding us that the interlanguage concept arose from application to individual learners and not whole social communities, and alongside Kirkpatrick & Deterding (2011) he points out that non-native norms, such as the practice of placing the topic at the front of the sentence, are far-reaching and could be spread and shared across more than one geographical area. Kirkpatrick and Deterding (2011) and Kim (2006) cement this valid point by arguing that since more non-native speakers shape the language than native speakers, their use of the language should be considered in testing language use and ability.

Conclusion This chapter has outlined the key theoretical tenets of traditional and critical language testing and provided examples of EAP-relevant task assessments. Within this broad discussion, a number of historical and contemporary assessment terms and trajectories were set out to engage with the key understandings of language assessment we need as practitioners. The chapter presented and discussed tasks that typify these different understandings and it seeks to encourage these discussions to continue at both global and local levels. At a global level, there is a need to examine differences across international tests to find common and divergent task differences, and how these relate back to the skills we perceive as fundamental to study in higher education. At a local level, there is also a need to examine tasks that play a role in shaping local assessment practices and how these tasks align with EAP curriculum goals (e.g. Rauf & McCallum, in press).

References Alderson, J. C., & Banerjee, J. (2001). Language testing and assessment (Part 1). Language Teaching, 34, 213–236. Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford University Press. Bachman, L. F. (2000). Modern language testing at the turn of the century: Assuring that what we count counts. Language Testing, 17(1), 1–42. Benesch, S. (2001). Critical English for academic purposes: Theory, politics and practice. Lawrence Erlbaum Associates. Bhowmik, S. K., Hillman, B., & Roy, S. (2018). Peer collaborative writing in the EAP classroom: Insights from a Canadian postsecondary context. TESOL Journal, doi:10.1002/tesj.393. Bourdieu, P. (1991). Language and symbolic power. Polity.

48 Lee McCallum Bridgeman, B., & Carlson, S. (1983). A survey of academic writing tasks required of graduate and undergraduate foreign students. TOEFL Research Report 15. Educational Testing Service. Brown, J. D., & Hudson, T. D. (1998). The alternatives in language assessment. TESOL Quarterly, 32, 653–675. Canagarajah, A. S. (2006). Changing communicative needs, revised assessment objectives: Testing English as an international language. Language Assessment Quarterly, 3(3), 229–242. Canagarajah, A. S. (2016). TESOL as a professional community: A half-century of pedagogy, research and theory. TESOL Quarterly, 50(1), 7–41. Carr, N. T. (2011). Designing and analysing language tests. Oxford University Press. Carroll, J. B. (1968). The psychology of language testing. In A. Davies (Ed.), Language testing symposium: A psycholinguistic approach (pp. 46–69). Oxford University Press. Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373–381. Chomsky, N. (1965). Aspects of the theory of syntax. MIT Press. Curriculum and Development Institute (2005). Task-based assessment for English language learning at secondary level. Education and Manpower Bureau. https://cd1.edb.hkedcity. net/cd/eng/TBA_Eng_Sec/pdf/part2_Task1.pdf Darling-Hammond, L. (1994). Performance-based assessment and educational equity. Harvard Educational Review, 64, 5–30. Davies, A. (2014). 50 years of language assessment. In A. J. Kunnan (Ed.), The companion to language assessment: Abilities, contexts and learners (pp. 3–21). Wiley Blackwell. Davies, A. (1978). Language testing survey article. Part 1. Language Teaching and Linguistics Abstracts, 113(4),145–159. Davies, A. (2013). Native speakers and native users: Loss and gain. Cambridge University Press. Davies, A. (2003). Three heresies of language testing research. Language Testing, 20(4), 355–368. Davies, A., Brown, A., Elder, C., & Hill, K. (1999). Dictionary of language testing. Cambridge University Press. Douglas, D. (2010). Understanding language testing. Hodder Education. Edgeworth, F. Y. (1888). The statistics of examinations. Journal of the Royal Statistical Society, LI, 599–635. Esnawy, S. (2016). EFL/EAP reading and research essay writing using jigsaw. Procedia – Social and Behavioral Sciences, 232, 98–101. Cambridge University. First certificate English: Use of English paper: Part 3 task. https:// www.gettinenglish.com/wp-content/uploads/2014/07/cambridge-english-first-ha ndbook-2015.pdf Foucault, M. (1980). Power/knowledge: Selected interviews and other writings: 1972–1977. Pantheon. Freeborn, D. (2006). From Old English to Standard English: A course book in language variation across time (3rd ed). Palgrave Macmillan. Freire, P. (1996). Pedagogy of the oppressed. Penguin. Fulcher, G. (1999). Assessment in English for Academic Purposes: Putting content validity in its place. Applied Linguistics, 20(2), 221–236. Fulcher, G. (2000). The communicative legacy in language testing. System, 28, 483–497.

Traditional and alternative assessment 49 Fulcher, G. (2014). Philosophy and language testing. In A.J. Kunnan (Ed.), The companion to language assessment: Evaluation, methodology, and interdisciplinary themes. (pp. 1434–1451). Wiley Blackwell. Fulcher, G. (2010). Practical language testing. Hodder Education/Routledge. Groves, J. (2010). Error or feature? The issue of interlanguage and deviations in nonnative varieties of English. HKBU Papers in Applied Language Studies, 14, 108–129. Hamid, O. M. (2014). World Englishes in international proficiency tests. World Englishes, 33(2), 263–277. Hamid, O. M., & Baldauf, R. B. (2013). Second language errors and features of world Englishes. World Englishes, 32(4), 476–494. Hamp-Lyons, L. (1998). Ethical test preparation practice: The case of the TOEFL. TESOL Quarterly, 32, 329–337. Hawkins, J. A., & Filipovic, L. (2012). Criterial features in L2 English: Specifying the reference levels of the common European framework. Cambridge University Press. Hickey, R. (2015). (Ed.). Standards of English: codified varieties around the world. Cambridge University Press. Huhta, A. (2007). Diagnostic and formative assessment. In. B. Spolsky & F. M. Hult (Eds.), The Handbook of educational linguistics (pp. 469–482). Wiley-Blackwell. International English Language Testing System (IELTS). (2017). IELTS Academic reading sample task. https://www.ielts.org/-/media/pdfs/academic_reading_sample_task_m ultiple_choice.ashx?la=en International Language Testing Association (ILTA). (2000). ILTA Code of Ethics. http:// www.iltaonline.com/page/CodeofEthics Jenkins, J. (2006). Current perspectives on teaching world Englishes and English as a Lingua Franca, TESOL Quarterly, 40(1), 157–181. Jenkins, J. (2014). English as a lingua franca in the international university: The politics of academic English language policy. Routledge. Kachru, B. B. (1986). The alchemy of English: The spread, functions and models of non-native Englishes. Pergamon Press. Kachru, B. B. (1982). (Ed.). The other tongue – English across cultures. University of Illinois Press. Kachru, Y. (2006). Culture and argumentative writing in World Englishes. In K. Bolton & B. B. Kachru (Eds), World Englishes: Critical concepts in linguistics. (Vol. V) (pp. 19–39). Routledge. Kim, H. J. (2006). World Englishes in language testing: A call for research. English Today, 22(4), 32–39. Kirkpatrick, A. (2006). Which model of English: native-speaker, nativised or lingua franca? In R. Rubdy & M. Saraceni (Eds.), English in the world: Global rules, global roles (pp. 71–83). Continuum. Kirkpatrick, A., & Deterding, D. (2011). World Englishes. In J. Simpson (Ed.), The Routledge handbook of applied linguistics (pp. 373–388). Routledge. Lado, R. (1961). Language testing. McGraw-Hill. Leung, C. (2007). Dynamic assessment: Assessment for and as teaching? Language Assessment Quarterly, 4(3), 257–278. Linn, R. L. (2000). Assessments and accountability. Educational Researcher, 29(2), 4–16. Lynch, B. K. (2001). Rethinking assessment from a critical perspective. Language Testing, 18(4), 351–372.

50 Lee McCallum Lynch, B., & Shaw, P. (2005). Portfolios, power and ethics. TESOL Quarterly, 39(2), 263–297. Mauranen, A., Llantada, C. P., Swales, J. M. (2010). Academic Englishes: A standardized knowledge? In A. Kirkpatrick (Ed.), The Routledge handbook of world Englishes (pp. 634–653). Routledge. McNamara, T. (2001). Language assessment as social practice: Challenges for research. Language Testing, 18(4), 333–349. McNamara, T., & Roever, C. (2006). Language testing: The social dimension. Language Learning Monograph Series. Blackwell Publishing. Menken, K. (2008). High-stakes tests as de facto language education policies. In E. Shohamy, & N. H. Hornberger Encyclopedia of language and education (Vol. 7), (pp. 401–413). Springer. Messick, S. (1989). Validity. In R. L. Linn. (Ed.), Educational measurement (pp. 13–103). Macmillan. Miller, D. M., & Legg, S. M. (1993). Alternative assessment in a high-stakes environment. Educational Measurement: Issues and Practice, 12(2), 9–15. Morrow, K. (1979). Communicative language testing: revolution of evolution? In C. K. Brumfit & K. Johnson (Eds.), The communicative approach to language teaching (pp. 143–159). Oxford University Press. Moss, P. A., Pullin, D., Gee, J. P., & Haerbel, E. H. (2005). The idea of testing: Psychometric and sociocultural perspectives. Measurement: Interdisciplinary Research and Perspectives, 3, 63–83. Noam, G. (1996). Assessment at a crossroads: Conversation. Harvard Educational Review, 66, 631–657. O’Sullivan, B. (2011). Language testing. In J. Simpson (Ed.), The Routledge handbook of applied linguistics (pp. 259–274). Routledge. Omoniyi, T. (2010). Writing in English(es). In A. Kirkpatrick (Ed.), The Routledge handbook of world Englishes (pp. 471–490). Routledge. Pennycook, A. (1994). The cultural politics of English as an international language. Routledge. Popham, J. W. (1999). Why standardized test scores don’t measure educational quality. Educational Leadership, 56(6), 8–15. Quirk, R. (1990). Language varieties and standard language. English Today, 21, 3–10. Raddaoui, R., & Troudi, S. (2013). Three elements of critical pedagogy in ELT: An overview. In P. Davidson, M. Al-Hamly, C. Coombe, S. Troudi, & C. Gunn (Eds.), Achieving Excellence Through Life Skills Education: Proceedings of the 18th TESOL Arabia Conference, (pp. 73–82). TESOL Arabia Publications. Rauf, M., & McCallum, L. (in press). Language assessment literacy: Task analysis in Saudi universities. In L. McCallum & C. Coombe (Eds.), The assessment of L2 written English across the MENA Region: A synthesis of practice. Palgrave Macmillan. Seargeant, P. (2012). Exploring world Englishes: Language in a global context. Routledge. Shohamy. E. (2004). Assessment in multicultural societies: Applying democratic principles and practices to language testing. In B. Norton & K. Toohey (Eds.), Critical pedagogies and language learning (pp. 72–93). Cambridge University Press. Shohamy, E. (1998). Critical language testing and beyond. Studies in Educational Evaluation, 24(4), 331–345. Shohamy, E. (2001b). Democratic assessment as an alternative. Language Testing, 18(4), 373–391.

Traditional and alternative assessment 51 Shohamy, E. (1996). Language testing: Matching assessment procedures with language knowledge. In M. Brenbaum & F. J. R. C. Dohy (Eds.), Alternatives in assessment of achievements, learning processes and prior knowledge (pp. 143–160). Kluver Academic. Shohamy, E. (2001a). The power of tests: A critical perspective on the use of language tests. Longman. Spolsky, B. (2008). Language assessment in historical and future perspectives. In E. Shohamy & N. H. Hornberger (Eds.), Encyclopedia of language and education (Vol. 7) (pp. 445–454). Springer. Spolsky, B. (1976). Language testing: Art or science? Paper read at the 4th International Congress of Applied Linguistics, Stuttgart, Germany. Spolsky, B. (1995). Measured words. Oxford University Press. Spolsky, B. (1994). Policy issues in testing and evaluation. The Annals of the American Academy of Political and Social Sciences, 532, 226–237. Teasdale, A., & Leung, C. (2000). Teacher assessment and psychometric theory: A case of paradigm crossing? Language Testing, 17(2), 163–184. Thorndike, E. L. (1904). An introduction to the theory of mental and social measurements. The Science Press. Tomlinson, B. (2010). Which test of which English and why? In A. Kirkpatrick (Ed.), The Routledge handbook of world Englishes (pp. 599–617). Routledge. Trede, F., & Higgs, J. (2010). Critical inquiry. In J. Higgs, N. Cherry, R. Macklin, & R. Ajjawi (Eds.), Researching practice: A discourse on qualitative methodologies (pp. 247–255). Sense Publishers. Trudgill, P., & Hannah, J. (2008). International English: A guide to varieties of Standard English (5th ed.). Hodder Arnold. Tupas, F. R. T. (2010). Which norms in everyday practice and why? In A. Kirkpatrick (Ed.), The Routledge handbook of world Englishes (pp. 567–580). Routledge. Weir, C. J., & Vidakovic, I., & Galaczi, E. D. (2013). Measured constructs: A history of Cambridge English language examinations 1913–2012. Cambridge University Press. Zheng, Y. (2014). A phantom to kill: The challenges for Chinese learners to use English as a global language. English Today, 30(4), 34–39.

Chapter 4

Language assessment literacy Ontogenetic and phylogenetic perspectives Mojtaba Mohammadi and Reza Vahdani Sanavi

Introduction With the tenets of a sociocultural perspective widely recognized in the field of language education, scholars have attempted to expand these tenets into different aspects of learning and teaching. Assessment was not excluded. ‘Assessment of learning’ was critically argued and the concept of ‘assessment for learning’ was introduced. Out of the need to document learners’ outcomes, and to establish standards and benchmarks to measure their knowledge of language learning, came the concept of language assessment literacy (LAL). In its short lifespan beginning 1991 in general education, and 2001 in language education, the definition of LAL has been expanded and reconceptualized, and a number different models proposed . In the light of the prominence LAL has recently gained, and given the depth and breadth of this concept, this chapter intends trace its genesis, examining the development of ‘literacy’ to ‘literacies’, and of ‘assessment literacy to ‘language assessment literacy’. It also explores theoretical frameworks and proposed models. In conclusion, the concept of LAL is problematized and future directions of the field are suggested.

From literacy to literacies A quick review of the definition of ‘literacy’ in different disciplines, might lead one come to conclude that, with the exception of the core definition (i.e., being able to read and write), there is no unanimous definition among scholars in the humanities. According to Hillerich (1976), the meaning of ‘literacy’ may be argued to be ambiguous, or, as more recently stated by UNESCO (2004), considered as “not uniform” (p. 13). The birth of the concept can be traced back to the time when sociologists started to study man as talking and writing animal (Goody & Watt, 1963), but the benchmark to define literacy varies in different cultural, social, economic, political, and even religious contexts. In more modern contexts, the meaning is metaphorically expanded to cover other concepts such as computing, art, drama, math, and science. For UNESCO, ‘literacy’ is defined as:

Ontogenetic and phylogenetic perspectives 53

the ability to identify, understand, interpret, create, communicate and compute, using printed and written materials associated with varying contexts. Literacy involves a continuum of learning in enabling individuals to achieve their goals, to develop their knowledge and potential, and to participate fully in their community and wider society. (UNESCO, 2004) This conceptualization leads us to delineate literacy as a complex, multifaceted issue in the modern era. Taylor (2013) even viewed it as ‘literacies’ or ‘multiple literacies’ and claimed it to be the result of ‘pressure to acquire an ever expanding body of knowledge, skills and competence relating to a growing number of domains in everyday life’ (p. 404), which includes the language and discourse patterns of each of these disciplines. The expanding trend of acquiring knowledge and skills, and the ability to use them efficiently in daily life, was the consequence of sociocultural and functional approaches to education. This was what Hyland and Hamp-Lyons (2002) named ‘academic literacy’ defining it as ‘the complex set of skills (not necessarily only those relating to the mastery of reading and writing) which are increasingly argued to be vital underpinnings or cultural knowledge required for success in academic communities’ (p. 4). With the emergence of multifarious translations of literacy in the last few decades, and the introduction of notions such as ‘media literacy’, ‘emotional literacy’, ‘health literacy’, ‘vernacular literacy’, ‘cultural literacy’, ‘moral literacy’, and ‘critical literacy’, it comes as no surprise to have ‘assessment literacy’ (Stiggins, 1991) appear in the field of education.

The genesis of language assessment literacy Assessment as one of the components of any educational curriculum is believed to be the section which is ‘the least amenable to change’ (Scarino, 2013, p. 310). Yet, with the augmenting rate of reform in the approaches, methods, and techniques in teaching and learning in the last decades of the twentieth century, change in the testing and assessing of the products of teaching processes was inevitable. As viewed by Stiggins (1991), who first coined the term ‘assessment literacy’, educational agencies were required to administer tests to measure and document the outcomes of their programs. Besides, major educational and policy decisions counted on the results of the tests. For Stiggins (1995), assessment literate individuals have ‘a basic understanding of the meaning of high- and lowquality assessment and are able to apply that knowledge to various measures of student achievement’ (p. 535). They come to any assessment knowing what they are assessing, why they are doing so, how best to assess the achievement of interest, how to generate sound samples of performance, what can go wrong, and how to prevent those problems before they occur. (p. 240)

54 Mojtaba Mohammadi and Reza Vahdani Sanavi

The need for assessment to move from the periphery to centre stage was keenly felt. The reason might be that testing has turned out to be ‘a big business’ (Spolsky, 2008, p. 297) – both commercially or non-commercially – and ‘the societal role that language tests play, the power that they hold, and their central functions in education, politics and society’ (Shohamy & Or, 2013, p. x). After Stiggins, other scholars started to conceptualize assessment literacy, such as Falsgraf (2006), who defined it as ‘… the ability to understand, analyze and apply information on student performance to improve instruction’ (p. 6). Inbar-Lourie (2013) viewed it as ‘the knowledge base required for performing assessment tasks’ (p. 2924). The concept of ‘language assessment literacy’ entered the field of language assessment at the beginning of the 21st century when Brindley (2001) stated that, unlike in general education, language teaching programs lack a sizable bulk of research in ‘teacher’s assessment practices, levels, and training, and professional development needs’ (p. 126). Without mentioning the stakeholders, some scholars defined language assessment literacy. For Inbar-Lourie (2008a), it is defined as ‘having the capacity to ask and answer critical questions about the purpose for assessment, about the fitness of the tool being used, about testing conditions, and about what is going to happen on the basis of the test results’ (p. 389). In O’Loughlin’s (2013) view, it encompasses ‘the acquisition of a range of skills related to test production, test-score interpretation and use, and test evaluation in conjunction with the development of a critical understanding about the roles and functions of assessment within society’ (p. 363). Vogt and Tsagari (2014) saw LAL as ‘the ability to design, develop and critically evaluate tests and other assessment procedures, as well as the ability to monitor, evaluate, grade and score assessments on the basis of theoretical knowledge’ (p. 377). Another definition, from quite a general perspective in terms of stakeholders, is Pill and Harding’s (2013), which states that LAL ‘may be understood as indicating a repertoire of competences that enable an individual to understand, evaluate and, in some cases, create language tests and analyse test data’ (p. 382). Malone (2013) laid the accountability on teachers’ shoulders and remarked that LAL ‘refers to language instructors’ familiarity with testing definitions and the application of this knowledge to classroom practices in general and specifically to issues related to assessing language’ (p. 329). Fulcher (2012) offers a definition of LAL with a wider scope, which embraces different assessment competencies. He argued that LAL is: the knowledge, skills and abilities required to design, develop, maintain or evaluate, large-scale standardized and/or classroom based tests, familiarity with test processes, and awareness of principles and concepts that guide and underpin practice, including ethics and codes of practice. The ability to place knowledge, skills, processes, principles and concepts within wider historical, social, political and philosophical frameworks in order to

Ontogenetic and phylogenetic perspectives 55

understand why practices have arisen as they have, and to evaluate the role and impact of testing on society, institutions, and individuals. (p. 125) The above conceptualizations of LAL have one thing in common. They have pinpointed micro- and/or macro-level components of language assessment practice. Some of the micro-level components are assessment, such as designing, developing, grading, and analysing tests in the classroom, and the related theoretical issues. Some of the macro-level components are assessment practices which deal with having a critical perspective for the purpose of assessment, societal roles and the consequences of assessment, and holding assessment within wider historical, social, political, and philosophical frameworks.

LAL competencies In line with the bringing attention to language assessment literacy, there were attempts to demystify the concept and elaborate on what exactly is meant by attaining this kind of literacy. The earliest attempt to describe ‘assessment literacy’, though not named so, in education was the list of seven standards proposed by the American Federation of Teachers, the National Council on Measurement in Education, and the National Education Association in 1990, which are summarized by Stabler-Havener (2018, p. 3). It includes skills such as: Choosing and developing assessment methods appropriate for making instructional decisions. Administering, scoring, and interpreting the results of teacher produced and externally produced assessments. Appropriately using assessment results to inform decisions regarding student and curriculum development. Devising valid grading procedures for student assessments. Communicating assessment results to stakeholders. Identifying unethical and inappropriate assessment methods and use of assessment data. The other major proposed competencies of LAL are presented in four different models. The first one is Brindley’s (2001) professional development program model which introduced five competencies for a language teacher. He criticized the standards presented by the American Federation of Teachers, the National Council on Measurement in Education, and the National Education Association as not being ‘flexible enough to allow teachers to acquire familiarity with those aspects of assessment that are relevant to their needs’ and not having

56 Mojtaba Mohammadi and Reza Vahdani Sanavi

considered the individuals’ different levels of knowledge in assessment issues (p. 129). Reiterating that any program for teachers on assessment should be related to the curriculum, geared to the current level of teachers’ knowledge, and fitted to their needs, he proposed five components, two of which were considered as the core units for any stakeholder. They are Knowledge about the social context of assessment (including social, educational, and political aspects of assessment and such issues as accountability, standards, and ethics). Defining and describing proficiency (dealing with the knowledge of theoretical background of language tests and the related concepts of reliability and validity). Constructing and evaluating language tests (including the skills in test development and analysis). Assessment in the language curriculum (exploring the notion of criterionreference in learning and testing and examining assessment performance-based test options such as observation, portfolio, conferencing, project work, journal keeping). Putting assessment into practice (knowledge of strategies to assign action plans for all the theoretical issues raised in the whole program to be further explored and documented). The next model is the skills, knowledge, principles model proposed by Davies (2008). At the end of his assessment coursebook evaluation, he put forth three competencies of teachers’ LAL. Skills provide the training in necessary and appropriate methodology, including item writing, statistics, test analysis and increasingly software programs for test delivery, analysis and reportage. Knowledge offers relevant background in measurement and language description as well as in context setting. Principles concern the proper use of language tests, their fairness and impact, including questions of ethics and professionalism. (p. 335) This is rather similar to Inbar-Lourie’s (2008a) three competency-related questions of how (to assess), what (to assess), and why (to assess) to be addressed by language teachers, which correspond to the Davies’ skills, knowledge, and principles respectively. Having analysed the data collected from 278 international language teachers, Fulcher (2012) presented a tripartite model of needs for an assessment training program. In his hierarchical model, he adopted the skill, knowledge, and ability competencies of LAL proposed by Davies as the fundamentals for language teachers, and called it ‘practice’. Putting the concept of LAL in to a broader

Ontogenetic and phylogenetic perspectives 57

perspective, he expanded its model to two more tiers of ‘principles’ and ‘context’. The concept of ‘principles’ includes the issues that guide teachers to have the best possible practice, i.e., getting acquainted with principles, concepts, processes of assessment, and ethics and codes of practice. ‘Context’ looks at a wider landscape of LAL by placing practice and principles within a historical, political, social, and philosophical framework for the stakeholders to be able to figure out the origin and justifications of these practices, and principles, and the impact(s) adopting any one of them may have on society, organizations, and individuals. The initiative of the model seems to be twofold. One is that the hierarchical nature of the model presents a sequence for the course providers and trainers, which can be very assistive in providing coursebooks or planning their sessions. The other, which can also be taken as the consequence of the hierarchy of materials and issues here, is that these levels are not essential for all stakeholders. That is, depending on the level they are working at, it can be mandatory or optional for them. Another point to mention is the necessity of practicing the theories for the teachers. More recent research attempts introducing the componential framework of language assessment literacy are Pill and Harding (2013) and Taylor (2013), which have taken different view on the concept by accounting for stakeholders in LAL. Excluding Brindley (2001) and Fulcher (2012) who, in one way or another, consider a variety of stakeholders with respect to the levels of literacy development, the studies by Pill and Harding and Taylor highlight the agencies that can benefit differently and to various degrees from LAL competencies. Pill and Harding adapted the idea from Bybee (1997), and it was later expanded by Kaiser and Willander (2005) in the fields of scientific and mathematical literacy in education. Unlike the previous models which are mostly modular, Pill and Harding proposed a continuum of stages for LAL ranging from illiteracy, which is complete ignorance in assessment concepts and methods, to multidimensional literacy, which includes knowledge about the philosophical, historical, and social background of assessment. Kaiser and Willander (2005) summarized these five stages as follows: Table 4.1 LAL stages Illiteracy Nominal literacy

Ignorance of language assessment concepts and methods

Understanding that a specific term relates to assessment, but may indicate a misconception Functional literacy Sound understanding of basic terms and concepts Procedural and conceptual literacy Understanding central concepts of the field, and using knowledge in practice Multidimensional literacy Knowledge extending beyond ordinary concepts including philosophical, historical and social dimensions of assessment

58 Mojtaba Mohammadi and Reza Vahdani Sanavi

We think that the major demerit here is the lack of a clear definition of the levels each stakeholder is expected to attain, as Harding and Kremmel (2016) also note. Taylor’s (2013) spider web model provides the solution. In her LAL profile model, there are eight competencies, which are available in the previously mentioned models, with the exception of one: knowledge of theory, technical skills, principles and concepts, language pedagogy, sociocultural values, local practices, personal beliefs/attitudes, and scores and decision making. This is the first model that has paid attention to the personal beliefs/attitudes of the stakeholders in language assessment. In tandem with Scarino (2013) and Giraldo (2018), we believe that some particularities of language assessment practice come from the bottom, from teacher–assessors’ interpretations, judgements, and decisions in assessment, which result from their developing capabilities. Taylor’s model has mapped these eight components onto a five-stage continuum adopted from Pill and Harding (2013) (0 = illiteracy to 4 = multidimensional literacy), and proposed which components, and how much of each, should be achieved by each stakeholder – test writers, classroom teachers, university administrators, and professional language testers.

Figure 4.1 Differential AL/LAL profiles for four constituencies. (a) Profile for test writers. (b) Profile for classroom teachers. (c) Profile for university administrators. (d) Profile for professional language testers. Adopted from Taylor (2013), p. 410

Ontogenetic and phylogenetic perspectives 59

According to the model, a test writer, for example, is expected to have more developed literacy in knowledge of theory, technical skills, and principles and concepts, and less in personal beliefs/attitudes, local practices, and language pedagogy than a classroom teacher. In spite of the differentiation of the stakeholders, and the related levels of mastery in each dimension of LAL competency here, Taylor (2013) has failed to define these dimensions; hence, anyone can have his or her own definition of them (Harding & Kremmel, 2016). In a recent study, Yan, Zhang, and Fan (2018) investigated how contextual and experiential factors mediate teachers’ LAL development. They proposed a two-layered mediation model for language teachers LAL competencies that includes contextual factors (such as educational and assessment policies, assessment practice for different stakeholders, and the resources and constraints of the local instructional context), and experiential factors (such as assessment development, i.e., item writing skills, and development of the assessment intuition, i.e., items analysis and score use). They also found that ‘the impact of contextual factors is mediated through experiential factors. That is, while the assessment context creates opportunities and motivation for assessment practice, it is the accumulation of assessment experiences that foster and strengthen assessment knowledge and skills’ (p. 166). Like some previous studies (e.g., Giraldo, 2018; Scarino, 2013, Taylor, 2013), the enhancement of language teachers’ LAL begins with their own assessment experiences, interpretations, and self-awareness.

Problematizing LAL and future directions In the previous parts of the chapter, we have detailed the ontogenetic and phylogenetic analyses the concept of LAL has undergone, from its origin to its reconceptualization. In the next section, we will problematize the concept of LAL by critiquing what has been done so far in the field and by predicting possible future directions for research.

LAL conceptualization The literature of assessment literacy, both in general and in language education, has put forward a number of definitions for key concepts. In language education, from the earliest model up to the most recent one, few have included macro-level assessment (i.e. consideration of the social, cultural, and political context) when evaluating competencies. The majority of the competency evaluations are for micro-level assessment (i.e. classroom assessment). There is a need to specify a fairly balanced weight for the mid-level organizations and agencies, institutions, associations, and communities. Along with a need for a more comprehensive theoretical definition, LAL could also benefit from being better defined operationally. Several studies have designed a questionnaire and used interviews (e.g., Crusan, Plakans, & Gebril, 2016; Fulcher, 2012; Hasselgreen, Carlsen, & Helness, 2004; Vogt & Tsagari, 2014);

60 Mojtaba Mohammadi and Reza Vahdani Sanavi

however, a detailed description of what an assessment literate individual needs to know, and a robust instrument, which is theoretically based on one of the competency models, are yet to be developed.

LAL stakeholders Taylor’s (2013) recent model, paid attention to a number of stakeholders in language assessment; however, other studies were either limited to looking at teachers or remained silent regarding the existence of any stakeholders. Any re-conceptualizations need to be comprehensive in scope in order to cover all stakeholders including: teachers, test writers, university administrators, and professional language testers. Other stakeholders can also play a crucial role in the development, analysis, and interpretation of a test. If parents, for example, are assessment-literate and are familiar with a few testing principles and practices, teachers and school administrators can have their support in their school/classroom activities.

LAL teacher education programs Many teacher education centres all around the world are offering training programs and/or short courses for pre-service and in-service language teachers to establish or enrich their competence to start or continue their careers. Yet, few modules or unit credits are allotted to assessment competencies and the results of studies revealed that teachers have inadequate knowledge of assessment (Fulcher, 2012; Lam, 2014; Mendoza & Arandia, 2009; Yan, 2010). One of the reasons, as Brindley (2001) claimed, is the insufficient attention to assessment-related contents in the teacher education courses. Even if there is a course, it is devoted to presenting the formal testing practices rather than formative or classroom-based assessment (Bailey & Brown, 1996; Brown & Bailey, 2008) and it cannot adequately equip learners with the fundamentals of an assessment literacy (Lam, 2014). Most of these courses are not compulsory ones but electives that have no guarantee of registration for the pre-service teachers (Lam, 2014. Mendoza and Arandia (2009) also argued that undergraduate and graduate education programs should teach language assessment practices to preservice teachers, and in-service teachers can try it on their own. These results are also endorsed by other scholars (e.g., Hasselgreen, 2008; Vogt & Tsagari, 2014). On the practitioners’ side, there are major courses offered by Cambridge and Trinity College London, as well as tests like the Cambridge Teaching Knowledge Test (TKT) for pre-service and in-service teachers, which include few to no sections for assessment-related issues. In her analysis of TKT, Stabler-Havener (2018) reported that only thirteen percent of the first two modules and sixteen percent of the third module are dedicated to assessing literacy, and while the young learner (YL) module has twenty-five percent assessment-related content, the Content and Language Integrated Learning (CLIL) module has only five percent. We also examined the syllabus and assessment guidelines of the Cambridge Certificate in

Ontogenetic and phylogenetic perspectives 61

Teaching English to Speakers of Other Languages (Adult) (CELTA) (5th ed., 2018) and there is no content dealing with assessment issues: ‘Learners and teachers, and the teaching and learning context, Language analysis and awareness, Language skills: reading, listening, speaking and writing, Planning and resources for different teaching contexts, and Developing teaching skills and professionalism’. In the Certificate in Teaching English to Speakers of Other Languages (CertTESOL) course, which is the CELTA counterpart run by Trinity College London (2016) for pre-service English teachers, there is little to no trace of language assessment content in the five units ‘Teaching Skills’, ‘Language Awareness & Skills’, ‘Learner Profile’, ‘Materials Assignment’, and ‘Unknown Language’. In the Diploma in Teaching English to Speakers of Other Languages (Adults) (DELTA) course at Cambridge University (2019), there are few sections related to language assessment: one on key concepts, and three on terminology related to assessment and assessment issues as English Language Teaching (ELT) specialism. In the Diploma in Teaching English to Speakers of Other Languages (DipTESOL) course at Trinity College London (2017), there is only one module related to language assessment out of ten. This underlines the fact that these highly fashionable teacher training courses are insufficiently designed in terms of increasing the theoretical and practical awareness of the teachers’ knowledge of assessment in general and classroom-based assessment in specific.

LAL resources To enhance LAL among teachers (or even other stakeholders), teacher education programs, or other stakeholders, require adequate resources. Davies (2008), in his analysis of the assessment textbooks published over 46 years, revealed that language testing experts have separated themselves from the field of education testing which has resulted in ‘students [being] over-protected from exposure to empirical encounters with real language learners’ (p. 341). In the resources meant to equip novice teachers with understanding and practical skills in the classroom environment, there seems to be a need to have a trade-off between covering theoretical concepts and practical, classroom-based assessment issues. Fulcher (2012) also investigated the needs of the language teachers and concluded that language teachers’ require certain resources to fulfil their role as language testers:    

A text that is not light on theory, but explains concepts clearly, especially where statistics are introduced. A practical ‘how-to’ guidance, although not prescriptive in nature. A balance between classroom and large-scale testing, with illustrations and practical examples drawn from a range of sources and countries. Activities that can be reasonably undertaken given the constraints and resources that teachers normally face. (p. 124)

62 Mojtaba Mohammadi and Reza Vahdani Sanavi

Textbooks are not the only available resources to help LAL stakeholders enhance their awareness of and adherence to language assessment literacy norms. As Malone (2013) noted, in recent years, textbooks can be supplemented in assorted ways, including: traditional as well as face-to-face workshops, online or downloadable tutorials, materials produced by professional language testing associations, reference frameworks, such as the Common European Framework of Reference in Europe, video projects, pre-conference workshops, and series of narrative accounts about developing assessment literacy. (p. 332) Though non-textbook resources are currently rather prevalent, they cannot be regarded as effective and efficient as written textbooks and practical manuals.

Conclusion In the early 21st century, teaching, learning, and assessment are no longer three stranded islands in the ocean of education, in general, nor is it the same in that of language education, in particular. They are parts of a trilogy, with a series of theories and practices in one large compendium, with various characters and roles, and different scripts, but carefully stage-managed to make one voice and a single message: Teaching is for the sake of learning, and assessing is for the sake of learning. Assessment is no longer an ignored field in this compendium. As Coombe, Troudi and Al-Hamly (2012) estimated, assessment makes up 30 to 50 percent of the daily activities of the language teachers. Hence, assessment literacy deserves more attention. In its short lifespan, LAL has undergone a series of (re)conceptualizations with a number of models having been proposed, yet it continues to mature. There are several areas for future growth: 1 2

3 4

The concept ‘language assessment literacy’ requires the adoption of a more comprehensive perspective and its operationalization still has room for growth. There is a need for research on stakeholders other than teachers (i.e. parents and policy makers). Also, learner assessment literacy can be investigated to explore the impact of a learner’s awareness of the assessment-related issues. According to Kumaravadivelu, 2006), this would help achieve ‘liberatory autonomy’ for language learners. Learner assessment literacy can go side by side with other competencies essential in the 21st century, such as critical thinking, problem-solving, effective collaboration and communication, and self-directed language learning (Koh & DePass, 2019). Language teacher education program should be reformed to include more assessment-related content. The resources should cover a balance of the theoretical and practical issues of assessment for stakeholders in the field.

Ontogenetic and phylogenetic perspectives 63

Only when language assessment literacy has explored these growth areas, can we think of the concept as mature and developed, and contributing to an increased quality of language education.

References Bailey, K. M., & Brown, J. D. (1996). Language testing courses: What are they? In A. Cumming & R. Berwick (Eds.), Validation in language testing (pp. 236–256). Multilingual Matters. Brown, J. D., & Bailey, K. M. (2008). Language testing courses: What are they in 2007. Language Testing, 25(3), 349–383. Bybee, R. W. (1997). Achieving scientific literacy: From purposes to practices. Heinemann. Brindley, G. (2001). Language assessment and professional development. In C. Elder, A. Brown, K. Hill, N. Iwashita, T. Lumley, T. McNamara & K. O’Loughlin (Eds.), Experimenting with uncertainty: Essays in honor of Alan Davies (pp. 126–136). Cambridge University Press. Cambridge University (2018). Certificate in teaching English to speakers of other languages (CELTA): Syllabus and assessment guidelines. https://www.cambridgeenglish.org/Ima ges/21816-celta-syllbus.pdf Cambridge University (2019). Diploma in teaching English to speakers of other languages (DELTA): Syllabus specifications. https://www.cambridgeenglish.org/Images/ 22096-delta-syllabus.pdf Coombe, C., Troudi, S., & Al-Hamly, M. (2012). Foreign and second language teacher assessment literacy: Issues, challenges, and recommendations. In C. Coombe, P. Davidson, B. O’Sullivan & S. Stoynoff (Eds.), The Cambridge guide to second language assessment (pp. 20–29). Cambridge University Press. Crusan, D., Plakans, L., & Gebril, A. (2016). Writing assessment literacy: Surveying second language teachers’ knowledge, beliefs, and practices. Assessing Writing, 28, 43–56. Davies, A. (2008). Textbook trends in teaching language testing. Language Testing, 25(3), 327–347. Falsgraf, C. (2006). Why a national assessment summit? New visions in action. National Assessment Summit. Summit conducted in Alexandria, Va. https://files.eric.ed.gov/ fulltext/ED527580.pdf Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment Quarterly, 9(2), 113–132. Giraldo, F. (2018). Language assessment literacy: Implications for language teachers. Profile: Issues in Teachers’ Professional Development, 20(1), 179–195. Goody, J., & Watt, J. (1963). The consequences of literacy. Comparative Studies in Society and History, 5(3), 304–345. Harding, L., & Kremmel, B. (2016). Language assessment literacy and professional development. In D. Tsagari & J. Banerjee (Eds.), Handbook of second language assessment (pp. 413–428). Mouton de Gruyter. Hasselgreen, A. (2008). Literacy in classroom assessment (CA): What does this involve?5th Annual Conference. Paper conducted at the meeting of the European Association for Language Testing and Assessment, Athens, Greece. http://www.eaulta.eu.org/con ferences/2008/docs/sunday/panel/Literacy%20in%20classroom%20assessment.pdf

64 Mojtaba Mohammadi and Reza Vahdani Sanavi Hasselgreen, A., Carlsen, C., & Helness, H. (2004). European survey of language testing and assessment needs: Report: Part 1 general findings. European Association for Language Testing and Assessment. http://www.ealta.eu.org/documents/resources/survey-rep ort-pt1.pdf Hillerich, R. L. (1976). Toward an assessable definition of literacy. The English Journal, 65(2), 50–55. Hyland, K., & Hamp-Lyons, L. (2002). EAP: Issues and directions. Journal of English for Academic Purposes, 1(1), 1–12. Inbar-Lourie, O. (2008a). Constructing an assessment knowledge base: A focus on language assessment courses. Language Testing, 25(3), 385–402. Inbar-Lourie, O. (2013). Language assessment literacy. In C. Chapelle (Ed.), The encyclopedia of applied linguistics (pp. 1–9). Blackwell Publishing Ltd. Kaiser, G., & Willander, T. (2005). Development of mathematical literacy: Results of an empirical study. Teaching Mathematics and its Applications, 24(2–3), 48–60. Koh, K., & DePass, C. (2019). Developing teachers’ assessment literacy: Multiple perspectives in action. In K. Koh, C. DePass, & S. Steel (Eds.), Developing teachers’ assessment literacy: A tapestry of ideas and inquiries (pp. 1–6). Brill Sense. Kumaravadivelu, B. (2006). Understanding language teaching: From method to postmethod. Lawrence Erlbaum Associates. Lam, R. (2014). Language assessment training in Hong Kong: Implications for language assessment literacy. Language Testing, 32(2), 169–197. Malone, M. (2013). The essentials of assessment literacy: Contrasts between testers and users. Language Testing, 30(3), 329–344. Mendoza, A. A. L., & Arandia, R. B. (2009). Language testing in Colombia: A call for more teacher education and teacher training in language assessment. Profile, 11(2), 55–70. O’Loughlin, K. (2013). Developing the assessment literacy of university proficiency test users. Language Testing, 30(3), 363–380. Pill, J., & Harding, L. (2013). Defining the language assessment literacy gap: Evidence from a parliamentary enquiry. Language Testing, 30(3), 381–402. Scarino, A. (2013). Language assessment literacy as self-awareness: Understanding the role of interpretation in assessment and in teacher learning. Language Testing, 30(3), 309–327. Shohamy, E., & Or, I. G. (2013). Introduction to volume 7. In E. Shohamy, I. G. Or, & S. May (Eds.), Encyclopedia of language and education (3rd edition., Vol. 7) (pp. ix–xviii). Springer Science and Business Media. Spolsky, B. (2008). Language testing at 25: Maturity and responsibility? Language Testing, 25(3), 297–305. Stabler-Havener, M. L. (2018). Defining, conceptualizing, problematizing, and assessing language teacher assessment literacy. Teachers College, Columbia University Working Papers in Applied Linguistics & TESOL, 18(1), 1–22. Stiggins, R. J. (1991). Assessment literacy. Phi Delta Kappa, 72(7), 534–539. Stiggins, R. J. (1995). Assessment literacy for the 21st century. Phi Delta Kappa, 77(3), 238–245. Taylor, L. (2013). Communicating the theory, practice and principles of language testing to test stakeholders: Some reflections. Language Testing, 30(3), 403–412. Trinity College London (2016). Certificate in teaching English to speakers of other languages (CertTESOL): Syllabus. https://www.trinitycollege.com/resource/?id=5407

Ontogenetic and phylogenetic perspectives 65 Trinity College London (2017). Licentiate diploma in teaching English to speakers of other languages (LTCL Diploma TESOL): Validation requirements, syllabus and bibliography for validated and prospective course providers: Syllabus. https://www.trinitycollege.com/ resource/?id=1776 UNESCO (2004). The plurality of literacy and its implications for policies and programs: Education sector position paper. UNESCO. http://unesdoc.unesco.org/images/0013/ 001362/136246e.pdf Vogt, K., & Tsagari, D. (2014). Assessment literacy of foreign language teachers: Findings of a European study. Language Assessment Quarterly, 11(4), 374–402. Yan, J. (2010). The place of language testing and assessment in the professional preparation of foreign language testers in China. Language Testing, 27(4), 555–584. Yan, X., Zhang, C., & Fan, J. J. (2018). Assessment knowledge is important, but …: How contextual and experiential factors mediate assessment practice and training needs of language teachers. System, 74, 158–168.

Part 2

Students’ language assessment literacy

Chapter 5

Enhancing assessment literacy through feedback and feedforward A reflective practice in an EFL classroom Junifer A. Abatayo Introduction Providing effective feedback and expansive feedforward involves a complex process that requires teachers to provide meaningful advice to students’ work to take their learning forward. Feedback, as described by Brown, Bull, and Pendlebury (2013), is the best-tested principle and practice in the classroom. It can be meaningful and relevant when it is directed towards encouraging students to improve their own learning. In addition, feedback becomes effective when it is focused on students and where it provides the opportunity to act in response. This closing the gap between where the learner is and where the learner will go leads to the power of feedback (Hillocks, 1986; Hattie, 2009). As Brown and Knight (1994) put it, good feedback is achieved when students are suitably guided towards developing deep understanding to affect learning. Feedback has been introduced and used intensively in English as a Foreign Language (EFL) and English as a Second Language (ESL) classrooms as a method of encouraging students to collaborate with others, contribute to learner autonomy, and develop a sense of ownership (Berg, 1999; Carson & Nelson, 1996; Tsui & Ng, 2000; Paulus, 1999). Some cultures are particularly sensitive to receiving feedback and feedforward, therefore knowledge of a specific classroom culture can be helpful. Hattie and Timperley (2007) asserted that learning context is very important in providing feedback. Noels (2001) added that by acknowledging students’ needs and preferences, feedback could help students develop an understanding of its positive effect as a technique that could lessen the negative affective results of their work. Furthermore, Hyland (2003) advised that teachers should implement feedback by explaining to students that they are part of the process, and responsible for their own development. Peacock’s study (as cited in Al-wossabi, 2019) stated that when teachers employ different types and techniques of feedback in relation to students’ preferences and styles, it would support and advance their learning. This point is of great importance because feedback not only helps students and teachers achieve effective learning and teaching, it also

70 Junifer A. Abatayo

provides an opportunity in enhancing assessment literacy through understanding what students know and what they can do, and using results of assessments to improve curriculum, programmes, learning, and teaching. As an EFL language teacher in the Sultanate of Oman, I believe that my experience in providing feedback on students’ work in the classroom has enhanced the quality of student learning, and encouraged students to reflect and move forward, closing the gap between where they are and where they will go next. I have been teaching in Oman for 5 years, and I admit that it has been very challenging. My students come from a myriad of cultural and educational backgrounds; therefore, it is important to consider their previous experiences. I need to consider the underlying socio-cultural factors that might affect students’ involvement in the learning process such as: their unique use of the English language, authentic speech, and developmental errors committed while being EFL learners. Considering these factors made my teaching engagement with students, and the education process more inclusive. This paper explains my critical review and reflects on how aspects of my teaching, assessment practices, and assessment literacy are enhanced through the integration of feedback in the classroom. The paper also elucidates how I value my students as learners in EFL context, the support I provided on their achievement, and the use of technology to support feedback and feedforward in the classroom.

My teaching context The Sultanate of Oman and its Ministry of Education have high hopes that young Omanis, equipped with a good education can cope up with the demands of time. To support all Omani students’ learning, the Education Council of Oman (2017) put emphasis on its Education Vision 2040. Education authorities mandated that development of the education system and curricula should provide students with positive reinforcement and support in classrooms across different levels. Highlighting the support needed in the classroom, one important move is to monitor students’ progress and lead them to the next phase of learning. This is where feedback and feedforward mechanisms play their role in the development of students’ learning. However, the lack of communication between teacher and student, between student and student, and among students posed a threat to achieving effective teaching and learning. Interestingly, Al-Issa (2006) an Omani scholar, stressed that the teacher’s role is important in raising students’ awareness about taking individual responsibility and in advancing their own learning. When feedback is timely, and purposely given, it will help students advance their learning. In 2016, I initially conducted an exploratory study that addressed the importance of the integration and implementation of feedback and feedforward in the context of Omani students’ learning experiences. Students’ form of writing is one aspect of teaching that has influenced my desire to go extra miles

A reflective practice in an EFL classroom 71

in assessing their work. The objectives of the investigation were to document students’ learning based on feedback provided by the teacher, and to outline practical forms of assessment in order to monitor their progress in the classroom. Black and Wiliam (1998) indicated that progress monitoring could increase students’ learning. With constant follow-up, and an open line of communication with me, students can monitor their own learning. At the Faculty of Language Studies (FLS), there is a need to provide meaningful feedback on students’ work in order to provide them with meaningful learning experiences. Monitoring students’ learning can help them prepare for the next classroom activity, thus facilitating their own learning. In his study, Budimlic (2012) affirmed that the functionality and importance of feedback in the classroom raised the standards of student achievement. This also confirmed the initial investigation I conducted two years ago; the integration of feedback and feedforward in an EFL context has worked well and helped students understand their own progress and learning. According to Sadler (1989), formative assessment and feedback are concerned with how judgements about the quality of students’ responses can be used to shape and inform decisions to improve students’ competence. Remarkably, Sadler’s explanation of formative and summative assessment showed alignment with the initial findings of my analysis on students’ feedback. Sadler further emphasized that, to provide feedback, there must be judgement or evaluation of a product. In writing classes, teachers evaluate students’ writing, then students’ received feedback to help them understand their very own strength and weaknesses in writing. Irons (2007) shared that teachers’ awareness on students’ learning can also influence teaching; therefore, feedback is important in helping students advance their learning. Feedback is important if students are made aware of its process, and when they are given the opportunity to set their own learning goals (Black & Wiliam, 1998; Torrance & Pryor, 2002; Budimlic, 2012; McCord, 2012; Evans, Hartshorn & Tuioti, 2010) My personal experience as a language teacher in EFL writing classes has influenced my teaching and has helped me understand the social context of my students. It also helped me understand the nature of the assessment environment, assessment literacy, instructional processes, and testing conditions. Through understanding the feedback mechanisms as suggested by research and literature, it made me realize that students can do more in the classroom when guided properly and in a timely fashion.

Teaching practice: The case at Sohar University The Faculty of Language Studies (FLS) at Sohar University has two streams: English Language Studies and Translation. Three courses at FLS involve writing: Academic Writing 1, Academic Writing 2 and Professional Writing. All these courses require students to write academic journals, summaries, paraphrases, application letters, and other forms of academic writing. When I first

72 Junifer A. Abatayo

taught one of these writing classes, I noticed that there were no explicit mentions of feedback sessions on students’ writing in the course profile and portfolios. Although feedback can be conducted indirectly, it is important that students and teachers share and understand the feedback mechanism to improve teaching and advance learning. According to Black and Wiliam (1998), all activities undertaken by teachers and students in assessing themselves and their work can help develop teaching and learning and can increase engagement in the classroom. At the Faculty of Language Studies, it is central to adopt and implement a clear feedback mechanism that can influence good learning and teaching. I realized that providing feedback and effective feedforward within a learning and teaching context is essential to the development of students’ potential. In the case of EFL instruction, in particular in the writing classroom, my students experienced both global and developmental errors, and I believe that if these errors are used effectively as a form of feedback, it can help students improve their work. As an EFL teacher, I always engage my students in the teaching and learning process in order to offer them ownership of their written work. It is true that there are contradictions between teacher and student perceptions in relation to the different forms of feedback used in the classroom. This led me to suggest an assessment protocol posing these questions in context: ‘Where I am going?’ ‘How I am going?’ and ‘Where to next?’ (Hattie & Timperley, 2007). With these questions in mind, it is my desire and teaching goal to guide students and motivate them to explore opportunities for learning. In my EFL classroom, my students’ voices are heard, and I always remind them that their mistakes in writing are part of their development and can help them develop awareness in becoming self-directed learners.

Course design, learning outcomes, assessment, and feedback mechanism Below is the WRIT4111 course that outlines learning outcomes, course content and learning activities, assessment design and feedback (see Table 5.1). The purpose of this description and documentation is to show the course I taught at the Faculty of Language Studies: how the course was developed, what topics were discussed, its assessment design, and feedback mechanisms and procedures used in informing students of their own learning. Feedback mechanisms were the most recent activities added in the course profile to guide my colleagues and students on how to conduct formative feedback (Ethical application approved). Course teaching and learning activities To develop and improve this writing course, I met my co-teachers and the course coordinator to discuss relevant additions to the course profile that relates to formative feedback and feedforward phases in order to guide students. The

Course Teaching and Learning Activities

Provide and engage in feedback sessions on the acquisition of writing strategies and application of different concepts in professional writing

Create and produce portfolio showcasing different forms of professional writing

Demonstrate proofreading skills for basic grammatical and punctuation errors, revise texts accordingly

Lectures- group discussions, pair Explain the rhetorical context of work, discussion leaders different kinds of professional writing (purpose, audience, medium) Tutorials- group discussions, pair work, learning to write writing Integrate the stages of the writing process to develop, organize, present to learn (LWWL) ideas and information in writing Lab and Independent study Compose concise, clear, well-focused sentences that commu- Homework- Independent learning, additional readings nicate effectively and professionally

Learning outcomes

Pair work

Written quiz which covers lectures and discussions conducted in the class

*Assessment is both summative and formative

Written exam

Assessment 4 (Final examination)

Collection and compilation of writing exercises, independent learning activities, group work activities, self-assessments, feedback sessions and reflection papers, rubrics

Portfolio developmen

Assessment 3

Proofreading and Editing Challenge

Assessment 2

Group discussions

Assessment 1

Can do statements

Learning to write writing to learn feedback sessions

Reporting

Feedback sessions

Self-editing checklist

Peer critique/peer review

Students were asked to share their sample papers/ writing

Feedback mechanisms used

Assessment design

Table 5.1 WRIT4111 Course Profile - WRIT4111: Professional writing course

A reflective practice in an EFL classroom 73

74 Junifer A. Abatayo

course is designed according to different levels of students’ performance and the nature of the course. The types of performance reflected in the course should demonstrate how students’ achieve learning, knowledge, skills, and understanding as described by the objective. The description of the performance served as the basis for acceptable evidence of achievement; therefore, focus of instruction, student learning and assessment were given utmost consideration. If students are made aware of the course learning outcomes at the beginning of instruction, both teachers and students are working towards the achievement of their common and shared goal, curricular objectives, teaching, and assessment. Significantly, clear definition of learning outcomes and emphasis of assessment designs are part of the process in supporting students, teaching, and learning. With regard to teaching and instruction, I made sure that the writing course is interesting even to struggling EFL students. It was not easy; it was very challenging. Nevertheless, seeing your students learn something new is already an achievement especially when they know how to direct their own learning. Al-wossabi (2019) shared the same observation in his EFL classroom in Saudi Arabia. He stated that to some extent, teaching in the EFL writing classroom is relatively challenging, yet it can result in students leading and directing their own learning. Providing different types of feedback in the process of writing enables students to acquire skills that improve their writing. In addition, students are also prepared to be autonomous learners even if developmental errors in their writing occurs (Al-wossabi, 2019; Lightbrown & Spada, 1999; Lyster, Lightbrown, & Spada, 1999) In my writing class, students’ questions and interest are of high value. I always encourage them to ask questions before I introduce another lesson or activity. I conducted individual and group consultations to maintain communication and bridge the gap between teaching content and task. A classroom environment where teachers have a dialogue with students helping them construct knowledge on their own made me realize that forging good relationships with them can help establish an effective classroom environment where teachers’ role is rooted in negotiation that facilitated in achieving purposeful learning.

Supporting students’ learning: Feedback design and implementation To support students at the FLS, I proposed the following feedback mechanism to strengthen learning tasks and to remedy learning failures particularly in the writing classroom, because the initial study and informal interviews I conducted with my colleagues and students showed that FLS did not have a uniform feedback mechanism and procedures. I presented the feedback model in one of the professional development workshops at the FLS to gather suggestions from my colleagues. Here are some of their comments and suggestions that helped me frame a feedback structure that can be integrated into writing classrooms:

A reflective practice in an EFL classroom 75

‘We need to ensure that all writing teachers receive training on how to implement the suggested feedback mechanism’. ‘Student-level representatives should be invited to increase students’ awareness on feedback structure’. ‘Teachers must be given the chance to also explore other feedback types that will work in EFL context’. ‘The feedback model should also be initially implemented in Level 1 writing course to determine its effectiveness and limitations’. ‘As members of the faculty, we must ensure that assessment literacy be the focus of our seminar–workshops so we can initiate a timely feedback mechanism, thus supporting assessment literacy across disciplines’. The model below (see Figure 5.1) is developed and modified based on Hattie and Timperley’s (2007) three major feedback questions. The design of this feedback instrument integrated the results of focus group discussions I conducted with my colleagues who are involved in the development of the writing course. Program leaders, course coordinators, and student level representatives met and discussed the suggestions and feedback from the students. In the feed up stage, learning outcomes, class activities, assessments, and marking criteria were explained explicitly to students. This helped them understand the nature of the course and teachers’ expectations of their output. The importance of learning outcomes was highlighted in the beginning of the course, so that students can prepare for tests and other in-class requirements and activities. Rubrics and other scoring guides were also discussed and made clear for the students during the first week of the class. In fact, students were involved in the development of the rubrics used in evaluating their work.

¥ Teachers inform students

of the learning outcomes, context, nature of the activity and assessment procedures ¥ Teachers and students discuss the expected output of class activities and how their works are evaluated

Feedback ¥ Teachers provide feedback to students through discussions, direct and indirect mechanisms

¥ Students are informed on how they can improve their work ¥ Teachers provide students with context related response ¥ Learners' social harmony

¥ Peer evaluation

Feed up

Figure 5.1 Three Major Feedback Questions Adapted from Hattie and Timperley, 2007

Feedforward

76 Junifer A. Abatayo

During the feedback stage, information was given to students regarding their own progress and achievement. Here, students were asked to submit their written work depending on the class activity every week. This was followed with the revision of their work integrating the valuable feedback and comments from their teachers and peers. Revisions at this stage involved two stages: teacher focus feedback and some suggestions using corrective feedback strategies. To understand students’ learning experiences, and to determine what they can do and cannot do in a particular task, students were asked to map their own learning using the ‘can do statements’. These are statements which are rooted from the course learning outcomes that provide students’ level of learning in the classroom. The results of this feedback mechanism helped teachers improve teaching strategies and adjust teaching tasks within students’ grasp. To add meaningful context to this mechanism, reflective sessions were also conducted where students got involved in focused group discussions, online discussions, and teacher consultations. Lastly, feedforward is a discussion phase between teacher and students on what they have achieved and how they can improve their own learning in the future. In addition, I conducted a discussion on the possibility of how they can move on to the next level of knowledge so as to help them to do better in the next phase of learning.

The use of ‘can do’ statements To enhance and strengthen the feedback mechanisms in my classroom, I also used and integrated a formative assessment type that is popular in general foundation programs and in language classrooms: the ‘can do’ statement. I designed this feedback sheet by rewriting intended learning outcomes as statements relating to what students can do in a particular course or classroom. The purpose of this feedback mechanism is to gather information relating to what students can do and cannot do in a particular course. With the use of this sample student feedback, we can form judgements and decisions on how we can improve teaching and learning. Moreover, determining what students can do is the primary concern of a delivery system, affecting curricular sequencing, strategies, and suitability of teaching materials. Apart from teachers’ improving their teaching and learning, students have also the opportunity to know their strengths and weaknesses thereby helping them to improve their skills and capabilities to achieve outstanding learning. To make feedback productive and beneficial, students must be trained in selfassessment classroom mechanisms so that they can understand the main purposes of their engagement in the classroom, and what they need to do to achieve purposeful learning (Black & Wiliam, 2010). Below is a sample ‘can do’ statement feedback tool that I integrated into the class (see Table 5.2). My students were asked to tick the column that corresponds to their own evaluation and understanding of their learning goals and targets. This feedback strategy helped students to chart their own progress as part of reflective learning.

A reflective practice in an EFL classroom 77 Table 5.2 Can Do Statements Feedback Sheet Statements of performance

I can do this well

I can do this with support

I cannot do this

I can explain different context of different kinds of professional writing such as purpose, audience, medium I can use and integrate the stages of the writing process to develop, organize, present ideas and information in writing compose concise, clear, well-focused sentences that communicate I can demonstrate proofreading skills for basic grammatical and punctuation errors I can create and produce portfolio showcasing different forms of professional writing I can write correct and effective sentences I can develop my portfolio following the guidelines and instructions I can identify sentence errors I can organize good ideas and information while writing paragraphs I can edit my own writing I can engage in group discussions and feedback sessions I can offer suggestions during feedback sessions

Corrective feedback strategies The corrective feedback strategies shown below were also adapted in correcting students’ writing samples (see Table 5.3). This is one of the many additions in the WRIT4111 course profile meant to strengthen student engagement in the writing process. In my class, I combined some, but not all of the suggested strategies, because teaching in this context depends on the nature of the activity, the course, and the number of students. Although it is obvious that the number of students can affect the delivery and the conduct of feedback, teachers encouraged to implement strategies to help increase students’ awareness of the feedback mechanism itself, the process, and their own learning. Contextual definitions and examples are provided to ensure that the delivery of its implementation reflects actual practice at FLS.

78 Junifer A. Abatayo Table 5.3 Typology of written corrective feedback types. Strategy

Contextual definition

Contextual examples

Direct

Feedback is given by providing students with the correct form.

Indirect

In this type of feedback, the teacher indicates the error, and informs students of the areas that need correction. Metalinguistic clues and codes are used by the teacher to indicate the error. The teacher provided feedback by highlighting some areas or specific areas that need improvement or correction.

Student: My mother goes to the market yesterday. Teacher: goes – went Student: Amina is ate chicken in the cafeteria. Teacher: Amina is eating chicken biryani in the cafeteria. The teacher may underline or circle the errors and ask students to correct them.

Metalinguistic

The focus of feedback

Electronic feedback

The teachers used codes, letters, and other symbols to inform students of the error (e.g. mohammed – m – C [spelling, capitalization]) Both strategies were used, focused and unfocused The teacher may provide a sample corrected sentences or paragraphs to inform students of the areas of correction. The teacher shared useful links or The teacher provided hyperlinks, internet sources, and elecfeedback on the errors through emails, links, and tronic discussions for students to explore in order to help them correct the errors. other electronic sources that are helpful for students In my class, I used WhatsApp chat, Moodle, and online discussions. in order to correct and improve their work.

Adapted from Ellis (2009)

The corrective feedback strategies that I used in my class proved to be useful particularly with foreign language pedagogy. I agree with Lantolf and Thorne (2007) when they said that corrective feedback enhanced noticing forms of students’ writing and prompted self-regulation. In the same context, other researchers supported the use of corrective feedback because of its usefulness in helping students understand meaning and form (Sauvignon, 2005; Ellis, 2005; Lyster, 2011). It is also worth to mention here that I faced a number of challenges despite the effectiveness of the integration of corrective feedback. Students continued to make mistakes regardless of the amount of feedback and the number of consultations conducted. There were researchers like Krashen (1982) and Ferris (1999) who suggested that written corrective feedback could be limited to features that are, respectively, simple and treatable. However, Ellis (2009) argued that none of the

A reflective practice in an EFL classroom 79

suggested proposals is easy to implement. I agree on this point, and based on my own experience, the integration of corrective feedback was challenging, though it facilitated EFL students’ improvement. Asari (2019) made the same observation in relation to her class in Japan: students realized the importance of corrective feedback and thus improved their learning.

Use of technology to support feedback in EFL classroom Many studies have illuminated the positive effects of technology integration in the classroom. Ogle and Beers (as cited in Borich, 2010) indicate that technology use in the classroom can increase student engagement and motivation. Contemporary students technologically adept; therefore, they find it useful to learn a skill or develop a strategy. Technology also can improve their reading and writing skills. In EFL classrooms, teachers supported the integration of technology because it provides students with the opportunity to discover a new way of learning that is different from books and other printed materials. In addition, technology also expands students’ responses and collaboration. In the case of electronic feedback in WRIT4111, it gave students autonomy in improving their own work. Technology not provides engagement and supports collaboration; it also expands students’ experiences, content knowledge, imaginations, and critical thinking. Internet technologies provide students with a flexible media environment where they can easily respond. They feel empowered because they can demonstrate their hypermedia skills. When these technologies are integrated into instruction, they can support different aspects of cognition. They can also help create an interactive and positive teaching and learning environment for students and teachers (Swaffar, Romano & Arens, 1998). At Sohar University, the design of online learning activities is built on instructional objectives. Moodle is a good learning platform that helps teachers and students create a positive learning environment that supports basic skills, higher-level skills, inquiry-based activity, online discussions, and online learning communities. The technology supported feedback, as shown below, offers an opportunity for students to express themselves, especially in peer feedback sessions. This also reduces the threat in EFL writing classrooms caused by face to face discussions and students’ sharing of their reflections on their writing.

Process and progress At the faculty level, some educational technologies were used in implementing feedback in the writing classroom. I made use of Moodle where my students could share their own evaluation of their writing. Online discussions were also conducted to help them understand the lesson and other activities in class. For example, I asked my students to submit their writing through Moodle, then I provided electronic feedback sessions by sending them links to read, as well as study sample forms, and types of writing to help them improve their work.

80 Junifer A. Abatayo

Aside from Moodle, I also explored other practical ways to transform feedback using technology. I created a WhatsApp group in my tutorial class with a weekly chat or sessions that I called ‘Help & Receive Help’ (HRH). Students in a group could send a message asking for help with improving and correcting spelling mistakes, the use of correct words, and sentence construction. My students were given a theme or topic every week, so that they were guided in their participation. Words and sentence types were sent in the WhatsApp group, then students offered help through sharing or providing correct spelling, grammar, or sentence construction. There were some challenges with this, but I believe the practicality of this technology was an opportunity that helped students improve their writing. Due to technological advancement, the implementation of feedback has been widely supported, especially on Internet chat sites where both teachers and students share thoughts and ideas through electronic mail, bulletin board systems, and on line discussion boards (Chen, 2016; Braine, 2001; Ware, 2004). Here is some unedited student feedback regarding the use of technology in the classroom (i.e. Moodle, an on-line discussion board, and WhatsApp): ‘Internet sources and computer use make me feel confident in writing’ ‘Writing is difficult but with computers is good’ ‘I am afraid of writing but the technology, internet and computer make me not afraid’ ‘My mister I like because he used computer and internet in class and lab’ ‘My mister and me can work fast in editing because of computers’ ‘WhatsApp chat helped me a lot in writing’ ‘WhatsApp group chat is a good practice in the WRIT2214’ ‘I like to participate in online discussions … my writing I think is improved’ My students’ feedback regarding the effect of technology in the classroom is not conclusive. However, based on my current practice, it shows positive results with students’ engagement and cooperation, and with the development of their skills, especially in the writing classroom. Like Ware (2004) in his networked environment classroom, technology helped raise student awareness, and increase writing skill, and communication. Warschauer (2002) noted that less proficient students were more comfortable with technology as it helped to create a non-threatening environment. To some extent, the current situation of my Omani students mirrors this. Though some of them could not write simple, correct sentences, they felt more comfortable with networked classes and other forms of technology used in providing writing feedback. As the students’ positive comments show, the use of technology made an impact on their learning and fostered a connection between instructional goals and pedagogical approach.

A reflective practice in an EFL classroom 81

Conclusion and future directions Effective feedback has been the focus of researchers and academic institutions for some years now. Studies have shown that feedback and effective assessments have made an impact on students’ learning and on the development of educational programmes. Motivating learners through feedback introduces them to a culture of success where they think and believe that through active participation, they can enhance their knowledge, skills, and potential. Simple practices of feedback can lead to significant improvement, most importantly, the enhancement of assessment literacy. When teachers make students aware of what they can do in the classroom, and what they have learned, students can also improve their engagement in the process. Students know the aspects of their performance that need to improve, and they are away of opportunities to develop metacognitive awareness to know and understand their own learning. In the case of language learning in an EFL context, students need systematic and timely feedback as it creates and develops an effective learning environment where they can claim ownership of their growth and learning. Providing students with appropriate assessments can also affect teaching, because good teaching involves more than developing curricular offerings and programmes. Good teaching also values the continuous support of students through monitoring their progress and achievement. The works of Beaumont, O’Doherty, and Shannon (2011), Brown (2019), and Lee (2014) provide excellent arguments about how feedback can be re-conceptualized by focusing on feedback quality, strategies, and the contextual and socio-cultural aspects of students’ work. Their studies will be of great help, as I would like to re-examine feedback practices at the FLS to determine how we can explore new ways to enhance assessment literacy and support students’ learning.

References Al-Issa, A. S. (2016). Meeting students’ expectations in an Arab ICLHE/EMI context: Implications for ELT education policy and practice. International Journal of Applied Linguistics and English Literature, 6(1), 209–226. Al-wossabi, S. A. N. (2019). Corrective feedback in the Saudi EFL writing context: A new perspective. Theory and Practice in Language Studies, 9(3), 325–331. Asari, Y. (2019). EFL teachers’ L1 backgrounds, beliefs, and the characteristics of their corrective feedback. Journal of Asia TEFL, 16(1), 250. Beaumont, C., O’Doherty, M., & Shannon, L. (2011). Reconceptualising assessment feedback: a key to improving student learning? Studies in Higher Education, 36(6), 671–687. Berg, E. C., (1999). The effects of peer trained response on ESL students’ revision types and writing quality. Journal of Second Language Writing, 8, 215–241. Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7–74. Braine, G. (2001). A study of English as a foreign language (EFL) writers on a local-area network (LAN) and in traditional classes. Computers and Composition, 18, 275–292.

82 Junifer A. Abatayo Borich, G. D. (2010). Effective teaching methods. (8th ed.) Pearson Education Inc. Brown, G. A., Bull, J., & Pendlebury, M. (2013). Assessing student learning in higher education. Routledge. Brown, J. D. (2019). Assessment feedback. The Journal of Asia TEFL. 16(1), 334–344. Brown, S. & Knight, P. (1994). Assessing learners in higher education. Kogan Page. Budimlic, D. (2012). Written feedback in English: Teachers’ practices and cognition [Unpublished master’s thesis, Norges teknisk-naturvitenskapelige universitet, Fakultet for samfunnsvitenskap og teknologiledelse, Program for lærerutdanning]. Carson, J. G., & Nelson, G. L. (1996) Chinese students’ perception of ESL peer response group interaction. Journal of Second Language Writing, 5(1),1–19. Chen, T. (2016). Technology-supported peer feedback in ESL/EFL writing classes: A research synthesis. Computer Assisted Language Learning, 29(2), 365–397. Education Council of Oman (2017). Philosophy of education in the sultanate of Oman. Ellis, R. (2009). Corrective feedback and teacher development. L2 Journal, 1(1), 1–18. Ellis, R. (2005). Principles of instructed language learning. System, 33(2), 209–224. Evans, N. W., Hartshorn, J., & Allen Tuioti., E. (2010). Written corrective feedback: practioners’ perspectives. International Journal of English Studies, 10(2), 47–77. Ferris, D. (1999). The case for grammar correction in L2 writing classes: A response to Truscott (1996). Journal of Second Language Writing, 8, 1–10. Hattie, J. (2009). The black box of tertiary assessment: An impending revolution. In L. H. Meyer, S. Davidson, H. Anderson, R. Fletcher, P. M. Johnston, & M. Rees (Eds.), Tertiary assessment & higher education student outcomes: Policy, practice & research (pp. 259–275). Ako Aotearoa. Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. Hillocks Jr, G. (1986). Research on written composition: New directions for teaching. National Council of Teachers of English. Hyland, F. (2003). Focusing on form: Student engagement with teacher feedback. System, 31, 217–230. Irons, A. (2007). Enhancing learning through formative assessment and feedback. Routledge. Krashen, S. (1982). Principles and practice in second language acquisition. Pergamon. Lantolf, J., & Thorne, S. L. (2007). Sociocultural theory and second language learning. In B. van Patten & J. Williams (Eds.), Theories in second language acquisition (pp. 201–224). Lee, I. (2014). Revisiting teacher feedback in EFL writing from sociocultural perspectives. Tesol Quarterly, 48(1), 201–213. Lightbrown, P. M., & Spada, N. (1999). Instruction, first language influence, and developmental readiness in second language acquisition. The Modern Language Journal, 83, 1–22. Lyster, R. (2011). Content-based second language teaching. In E. Hinkel, Handbook of research in second language teaching and learning (Vol. 2). (pp. 611–630) Routledge. Lyster, R., Lightbrown, P. M., & Spada, N. (1999). A response to Truscott’s ‘What’s wrong with oral grammar correction’. The Canadian Modern Language Review, 55, 457–467. McCord, M. B. (2012). Exploring Effective Feedback Techniques in the ESL Classroom. Language Arts Journal of Michigan, 27(2), 11. Noels, K. (2001). Learning Spanish as a second language: Learners’ orientations and perceptions of their teachers; communication style. Language Learning, 51, 107–144.

A reflective practice in an EFL classroom 83 Paulus, T. (1999). The effect of peer and teacher feedback on student writing. Journal of Second Language Writing, 8, 265–289. Peacock, M. (2001) Match or mismatch? Learning styles and teaching styles in EFL. International Journal of Applied Linguistics, 11, 1–20. Ware, P.D. (2004) Confidence and competition online: ESL student’s perspectives on web-based discussions in the classroom. Computers and Composition, 21(4), 451–468. Warschauer, M. (2002). Networking into academic discourse. Journal of English for Academic Purposes, 1(1), 45–58. Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119–144. Sauvignon, S. J. (2005). Communicative language teaching: Strategies and goals. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 635–651). Mahwah. Swaffar, J., Romano, S., & Arens, K. (1998). Language learning online: Theory and practice in ESL and L2 computers classroom. Labyrinth Publications. Torrance, H., & Pryor, J. (2002). Investigating formative assessment. Teaching, learning and assessment in the classroom. Open University Press. Tsui, A. B., & Ng, M. (2000). Do secondary L2 writers benefit from peer comments? Journal of Second Language Writing, 9(2), 147–170.

Chapter 6

Using checklists for developing student teachers’ language assessment literacy Olga Ukrayinska

Introduction By searching for efficient methods of developing student teachers’ language assessment literacy (LAL), we have concluded that checklists can be a good option. Thus, this chapter is dedicated to exploring the potential of checklists to be exploited as a pedagogical tool in pre-service foreign language (FL) teacher preparation. This research has been done as part of a long-term project aimed at designing and further reshaping a course on assessment for master’s degree students studying English and French as their second language. Teaching the afore-mentioned course is iterative, but the current study presents the results obtained after teaching it in 2018–2019 to experimental groups at three universities in Kharkiv, Ukraine. The objective of the course is to develop pre-service teachers’ competency in selecting, adapting, designing, and administering tasks for assessing bachelor’s degree students’ reading, listening, speaking, and writing skills, and assessing their oral and written performance respectively. Owing to their advantages, checklists have been a helpful instrument for teaching. The chapter focuses on their functions, types, structure, and implementation, aligned with the specific aim mentioned above. Some sample checklists are presented. This study commenced with a review of checklists available in the area of language assessment. A qualitative study revealed that most checklists appear to be evaluative and designed specifically for experienced test developers, in that they abound with meta language and require profound knowledge of procedure. This was followed by the adaptation stage at which we tried to enhance the clarity of rubrics in items by simplifying them and fine-tuning them to the specific focus of the study. This necessitated a more profound understanding of the nature of checklists, namely the principles and requirements of their development, structure, and types. For this we addressed not only checklists and theory on their design relevant to language assessment, but also other areas, with the intention of exploring their potential for teaching purposes by applying an interdisciplinary approach.

Using checklists 85

We also relied on a practical approach to make the checklist instrument more economical for university teachers to use in the classroom and to personalized for students, in order to internalize procedures.

Theoretical background At the present we can find few areas of life where checklists cannot be applied. They are extensively used in household as shopping lists, in aviation for ensuring pre-flight safety, in medicine for diagnostic purposes, in IT industry for software product quality assurance, etc. The scope of their application is so varied that enumerating all cases could be endless. First introduced by Osborn (1953) as a simple tool in the form of a series of comprehensive questions to encourage creative thinking in approaching complex design tasks, checklists were meant to be used individually or in groups. The questions relating to the point of focus should be answered one at a time to help explore all possible ways to handle a problem. In Multilingual glossary of language testing terms (ALTE, 1998) we find the following definition: ‘A checklist is a list of questions or points to be answered or covered. Often used in language testing as a tool of observation or analysis’ (p. 137). As can be seen, this definition is fairly laconic and cannot serve as a guideline for designing checklists. In order to have a better understanding of the phenomenon let’s turn to some other definitions in online dictionaries hit by a Google search. ‘A checklist is a list of all the things that you need to do, information that you want to find out, or things that you need to take somewhere, which you make in order to ensure that you do not forget anything; a list of things, names, etc. to be checked off or referred to for verifying, comparing, ordering, etc.’ (Collins). ‘A checklist is a comprehensive list of important or relevant actions, or steps to be taken in a specific order’ (WebFinanceInc). ‘A checklist is a list of items required, things to be done, or points to be considered, used as a reminder’ (Lexico). ‘A checklist is a type of informational job aid used to reduce failure by compensating for potential limits of human memory and attention. It helps to ensure consistency and completeness in carrying out a task’ (Educalingo). An overview of these definitions allowed us to formulate generalized points describing checklists, i.e. their general characteristics (in order to avoid confusion, a structural unit of a checklist is hereinafter referred to as an ‘item’): a)

Related to their structure: 

A checklist is a list where items come below another.

86 Olga Ukrayinska

   

 b)

Related to their content: 

  c)

Inclusion of an item on a checklist should be well-grounded, and underpinned with some theoretical rationale or practical need (also in AbdelWahab, 2013, p. 58; Mukundan & Nimehchisalem, 2012, p. 1128; Wilson, 2013). A checklist can cover main aspects or specific details. A checklist can refer to theoretical issues or physical actions.

Related to their functionality: 

d)

A checklist may comprise questions, statements, or just enumerate some objects or activities (also in Wilson, 2013). Enumerated items are logically connected (also in Scriven, 2000; 2007). Items can be strictly (also in AbdelWahab, 2013, p. 56) or randomly ordered (also in Scriven, 2000). A checklist consists of a minimum of one column with items to cover requiring action, and it may have another column on the right either for check-off marks or for answers (also in Mukundan & Nimehchisalem, 2012, p. 1128). The length of a checklist is not formally restricted and depends on a specific research goal (also in Scriven, 2000; Wilson, 2013).

A checklist can be used for observation of some phenomenon, human behaviour, or performance of some device or method, for analysis of assumptions or obtained results, for ordering according to the importance or functionality, or for reminding and error prevention (also in Scriven, 2000; Wilson, 2013).

Related to areas of their application:  

Checklists can be applied in all possible areas. Checklists can be used at any stage of the project, from planning and elaborating, to rounding off and verifying (for example, Wilson, 2013).

Turning to context-specific definitions may shed more light on the nature of the phenomenon. Fuentes and Risueno Martinez (2018, p. 27) defined (software) checklists for evaluating language learning websites as follows: … a checklist introduces a progression of inquiries or categories for judgement, so the evaluator should give a response as a reaction to all the information presented through the reviewing procedure. … checklists … request a yes/no sign or a response along a Likert scale. Others … additionally incorporate space for open-ended remarks after particular prompts. (p. 26)

Using checklists 87

Thus, we can add up another point to the application of checklists:     

A checklist can be computer-assisted, or be published and filled in with a pen or pencil. We can also add three more points to their structure: Items can be answered, marked as checked off, and/or do not require any marking, but can be followed by a physical response. The responses to the items may be gradable (using a Likert scale). A checklist may have a third and even a fourth column or space below the items for comments or remarks concerning some aspect(s) (also in Mukundan & Nimehchisalem, 2012, p. 1129).

Further examination of definitions of ‘checklist, may be excessive for they are similar to those we have already considered but are formulated in a slightly different language. As it has already been stated, checklists are widely employed. Evaluation and observation ones are frequently used as an assessment tool in language assessment/testing. Multiple studies reflect their usage to some extent, but in the present chapter we will refer only to some of them to illustrate the most typical tendencies. At the development stage, they guide item writers in evaluating the appropriacy of the task and its components before the submission (Council of Europe, 2011, p. 64; O’Sullivan, Weir, & Saville, 2002). Weir (2005) speculated on the efficacy of using various checklists for test validation; however, Fulcher (2015) questioned their practicality due to their being complex, and the whole procedure being tedious and meaningless at times. O’Sullivan, Weir, and Saville (2002) also used checklists for development and validation of speaking tests but remarked that evaluators need training for such usage. Bachman & Palmer (1996) considered numerous applications of checklists, from selecting an existing test to logical evaluation of test usefulness. Thus, this tool can assist in quality assurance at any stage of a single task or of examination development by identifying necessary steps to be taken and focusing on even the smallest details: test construction (timing for all the procedures, weighting of sections/items, presentation of the test to candidates, layout of input and items, characteristics of tasks [e.g. for listening: construct elements, types, number, sequence, quality of items, rubric characteristics, audio and video text characteristics (type, topic, duration, degree of authenticity, the number of speakers, quality of the recording), number hearing the recording, the way candidates should answer], expected responses), administration (venue, conditions, organizations responsible for the exam provision, people involved in administration of the exam), marking of the examination, grading, and the examination results analysis and reporting. Mostly often referred to are checklists developed by Bachman & Palmer (1996) and ALTE (2001) as being fundamental and comprehensive.

88 Olga Ukrayinska

There is a common practice to use checklists for peer-, self-, and rater assessment of speaking or writing. As Green (2013) remarks, they help focus on important aspects of oral performance whilst being straightforward to use, although he claims that using checklists deprives the process of its real-life communicative value by concentrating only on the presence/absence of certain elements. He also finds it possible to develop rating scales out of checklists. Khalifa & Salamoura (2011) and O’Sullivan, Weir, and Saville (2002) pointed out the potential of checklists to facilitate a comparison between different speaking test formats. Cambridge self-evaluation checklists of different proficiency levels are designed to help learners proofread and edit their pieces of writing (Cambridge Assessment). Wu (2014) offers cognitive processing checklists to test takers to self-report on their reading. Checklists can also be used for providing immediate diagnostic feedback to testees (Green, 2013). In FL teaching a checklist is an instrument that helps FL practitioners evaluate language teaching materials. Quantitative and qualitative teacher-made software checklists are used to evaluate English language learning websites as additional tools (Fuentes & Risueno Martinez, 2018, p. 27), and checklists are used to evaluate strengths and weaknesses in an English language textbook (AbdelWahab, 2013). For assessing pupils’ reading and writing in a native language, Fiderer (1999) designed a series of checklists. Though an assessment tool, these checklists could be perfect guidelines for teaching purposes. In classroom instruction checklists are considered pedagogical tools with vast functionality: observation of students in the learning process, evaluation of instruction, an assessment tool, and memory aids/mnemonic prompts (Dudden Rowlands, 2007; Strickland & Strickland, 2000). Observation checklists are lists of things to look at when observing either individuals or a group of learners doing some activity at or after class (Strickland & Strickland, 2000). They are employed in a search for remedies for teaching some aspects or teaching specific groups of learners (Alberta Education, 2008; Dudden Rowlands, 2007). They help to quickly gather information about how well learners perform, about their strengths or weaknesses, and about their learning styles. Normally they are written in a yes/no format and contain specific criteria. They can include spaces below or in the far-right column for brief comments, which provide additional information not captured in checklists (Alberta Education, 2008). Teachers rely on evaluation checklists in collecting data about the efficacy of their teaching methods, accuracy, appropriacy, and completeness of tasks done by their students, assessing outcomes. They help teachers clarify what is indicative of a successful performance (Fuentes & Risueno Martinez, 2018, p. 27). Checklists meant as an achievement assessment tool should cover a specific area taught in accordance with the curriculum.

Using checklists 89

Some researchers argue that when learners participate in designing checklists to be further used by them, it can increase their understanding of new complex materials (Dudden Rowlands, 2007; Green, 2013). Dudden Rowlands (2007, p. 62, 66) points out that, though being used as additional tools, checklists intensify students’ metacognitive processes by suggesting to them sequenced operations and by providing metacognitive cues, which contribute to increasing their confidence and independence. Checklists make instruction interactive by making students active agents in their learning. They get to take the initiative in controlling the process of doing a task. Moreover, with such a tool they can evaluate their peers, and, thus, better understand the requirements (Dudden Rowlands, 2007). To sum up, in language teaching, checklists can be used for carrying out assessments, developing students’ learning strategies, and evaluating the efficiency of methods and techniques applied by the teacher. There is no unified classification of checklists. Various researchers have determined a different number of types and have called them different names, though there is some overlap. We can explain it, on the one hand, by the vast area of their application and, on the other hand, by the very specific requirements of targeted contexts. We have already mentioned evaluation and observation checklists used in language assessment and language teaching. Evaluation checklists are a tool used for the assessment of a product against some predefined criteria (AbdelWahab, 2013; Dudden Rowlands, 2007; Wilson, 2013). AbdelWahab (2013, p. 56) further subdivided them into qualitative checklists for profound studies, and more reliable and convenient quantitative ones. For more precise evaluation Scriven (2000) offers criteria of merit checklists (comlists) with items of a different significance, which allows for scoring. Observation checklists or behaviour sampling checklists (Wilson, 2013) contain a list of behaviours/actions/activities that an investigator is to check off during observation at a specified interval. Scriven (2000) assigned the common name ‘diagnostic’ for evaluation and observation checklists in accordance with the kind of conclusion made on their basis –evaluative or descriptive. Checklists which are used only as reminders, and for which the order of actions does not matter much or at all, are called laundry lists (Scriven, 2000). Checklists which serve to remind of the order and range of steps to take are called operational (Dudden Rowlands, 2007), procedure (Wilson, 2013), or sequential (Scriven, 2000). With these checklists it is the order of steps that matters. They can serve to remind students of necessary steps and over time students get used to taking them, even with a different and more complex task (Dudden Rowlands, 2007). Iterative and one-shot checklists are variations of this type; they still prescribe some order of steps, but these steps might be taken multiple times until an item is completed or referred to only once (Scriven, 2000; 2007). Checklists which aim to remind the user of the range of elements to include in a task/product are called requirement (Dudden Rowlands, 2007) or feature checklists (Wilson, 2013). They tend to have a more complex structure in order to include numerous requirements, which are to be reflected in the final product.

90 Olga Ukrayinska

Entry/exit checklists (Wilson, 2013) relate more to the stage of their application, thus, they are used either at the beginning or at the end of the research or development process to evaluate a product’s degree of readiness for submission. Research checklists (Wilson, 2013) have a very specific area of application. They are meant for researchers to use for the evaluation and/or review of their own or other’s research. Gawande (2009) differentiated two types of checklists that depend on how experienced the respondent is: a) DO-CONFIRM checklists, which are relied on by experienced people after doing something to make sure all the necessary steps have been taken; b) READ-DO checklist, which is used by the less experienced or unexperienced, who do what an item says before proceeding to the next item. All the types mentioned above are meant to be mnemonic devices (for example, Dudden Rowlands, 2007; Scriven, 2000; Wilson, 2013), and it is not their only common feature. As we could see, they all have more in common. We found these classifications rather confusing due to their ambivalent nature. In fact, all the procedures described above require both observation and evaluation based on some standards/criteria that arranged in a specific order, though following that order is not crucial. As we could see checklists have a number of advantages but as well, they also have a number of limitations. Summing up the theoretical findings on checklists, and based on our own experience, we recognize the following advantages: Checklists:      

Are fairly easy to develop and use (also in AbdelWahab, 2013; Scriven, 2000). Can be detailed, though short and meaningful. Allow for exercising a high degree of control by the teacher. Help the teacher to evaluate, not only a particular student, but also other students, as well as the results of students’ interaction. Can be aligned with particular tasks (also in Dudden Rowlands, 2007; Wilson, 2013). Allow students to self-monitor their own progress (also in Dudden Rowlands, 2007). We regard the following as disadvantages of checklists:

  

Designing checklists can be time-consuming if done for the first time and not based on a justified theory framework. Using printed versions of checklists requires extra material resources. There is a necessity for double checking as respondents can tick the boxes without reading the items (also in Scriven, 2000).

Using checklists 91

In any case, checklists should be used only as an additional tool for teaching and assessing, supplementing other time-proven and well-reputed tools. They need to be validated to be efficient (for example, AbdelWahab, 2013; Gawande, 2009; Scriven, 2000). Gawande (2009) insists on keeping them short in order to fit the limit of working memory, with simply worded items, and without unfamiliar language. Judging by these general characteristics, checklists can undoubtedly contribute to achieving our overall goal of developing student teachers’ language assessment literacy. However, their efficient implementation in the language classroom required profound determining characteristics more specific to the context of classroom learning. Meanwhile, this issue seems neglected by researchers, despite the evident value of the tool under consideration. There are very few wellgrounded theoretical works, and limited practical evidence dedicated to the use of checklists by university teachers and students. For that, we revisited the studies from varied disciplines. On the basis of the generalized characteristics of checklists presented above, we compiled the ones specific for our teaching context (See Characteristics of checklists in Appendix A). These characteristics may serve as requirements for designing checklists in order to develop LAL pre-service.

Research problem One of the objectives of the study was to determine each aspect of classroom assessment’s degree of importance in order to develop a checklist to cover it. Experimental teaching was used in the checklists’ development process, which tested the practical significance of each checklist in order to decide which points should be included into the final checklists and to validate them in further experiments. First, we are going to illustrate the preparatory steps that should be taken when adjusting to a particular local teaching context. In fact, to start with, we need to describe the context. As it was mentioned above, the objective of the course is to develop master’s degree students’ language assessment literacy (competency). We consider LAL to be part of their assessment literacy, an integrated competency, which is the cumulative result of learning different disciplines. By LAL we mean master’s degree students’ ability to perform assessment and evaluation of bachelor’s degree students’ communicative competency, which includes: planning classroom assessments, preparing assessment materials (selecting or adapting tasks from respected sources meeting the standards, designing tasks or their complexes), administering assessments, rating oral and written performance, marking and scoring papers, reporting the results to students and other stakeholders, and providing feedback on the obtained results. The course is not meant to instruct students on test design, calibrating items, validating tasks, or anything involving repetitive piloting or the mathematical statistics typical of large-scale, high-stake tests. Moreover, students are supposed to learn about traditional and alternative methods of classroom assessment.

92 Olga Ukrayinska

The course comprises interactive lectures, seminars, and individual assignments. At the beginning of it we carried out a baseline assessment. The results of this assessment helped us outline the inclusiveness of the content. As the students studied the discipline FL teaching at secondary schools, and had pedagogical practice at schools in the 4th year, they were familiar with basics of assessment; however, only one seminar was allocated to this topic. That meant that the learners required more training in the preparation of assessment materials, and practice in carrying out assessments. Lectures and tutorials lacked practical guidance and appeared to be too abstract and theoretical for the students to digest at their still young age. Although the level of their mental processes is already high, and perception, reasoning, and generalization skills are developed, they showed low motivation in acquiring the targeted competency as they had difficulty seeing the practical value of it; they mainly relied on Teacher’s books. The students hadn’t experienced the need for developing assessments for any group of learners, thus they missed out on the experience of problem solving related to it. Consequently, we decided to help them with information processing while providing them with valuable practical experience. The main idea was to involve them in close-to-real-life activities which a university teacher normally does to perform assessments. To implement the concept, we had the students interact with the 1st–4th-year students during practical training at the universities (called ‘pedagogical practice’) and during their learning process, which required them to allocate time for interviewing younger students, piloting tasks and scales, and practicing marking, rating and giving feedback. Since we had limited time and material resources, there were not so many piloting opportunities. The students piloted their quizzes, tasks, and scales only once and with only three representatives, which had a negative impact on the quality of their final products but undoubtedly was better than none. It was necessary to have all the products evaluated at various stages of their development, thus the workload was immense, and it was necessary to optimize the learning process and make it cost-effective but not at the expense of quality. It is common practice to build language teachers’ LAL with the application of brainstorming, discussions, tests (gap fill, matching, MCQ, True/False, ordering, table completion), quizzes, and creative tasks (for example in Tsagari et al., 2018), but our experience proved that this is not enough to develop students’ LAL. To sum up, we were seeking a method that would help us to meet our goal and fit the context. Having limited time, scarce material resources, and a big workload, along with the program requirements and the students’ limited teaching experience, we came up with the idea of extensive use of checklists due to the large scope of their advantages and the need to standardize the process of learning. However ready-made checklists did not meet our needs as they included items concerned with skills the students were not supposed to develop, and did not

Using checklists 93

include items concerned with elements that were key for our students. In some cases, we failed to find any checklists, such as for text adaptation or the selfevaluation of formal letter (See Self-evaluation: Formal letter checklist in Appendix B for the specifics of the concrete task).

Rationale Working with ready-made checklists, we finally decided on assigning students to only respond to those items that applied to our specific situation, and to add issues that we considered important, but that had not been addressed in existing checklists for item writers, test designers, interlocutors, raters, and markers. For example, the questions related to audio text recordings or the equipment used to play them were eliminated from the Listening task checklist as the students were not engaged in audio and video recording production. They responded to items addressing the use of the internet, but not all of those related to equipment, as they did not have a choice of equipment, apart from their mobile phones. The question ‘Is the language of the rubric/item/option accurate?’ needed specifying as the students did not understand what they were expected to concentrate on, and just ticked the box when evaluating their own and the other students’ tasks. We found that the grammar mistakes the students most typically made were failing to agree subject-predicate, missing out the indefinite article when needed, and not sticking to the word order in questions. This is why we introduced three subcategories for this item: The language of the item is grammatically accurate when: a) b) c)

The verb is used in the third person singular (with -s) in the Present Simple. There is an auxiliary verb after the question word. There is a/an before singular countable nouns.

Checklists should cover all necessary steps that have to be carried out when selecting, adapting, and designing a task, but at the same time should be practical and realistic tasks for our students. Bearing this in mind, we thoroughly considered items to be deleted in order to minimize the amount of reading by students (there was a risk they would overlook some points when bored, tired, or confused), and to maximize their success with using checklists. Another option was to break down some longer checklists into separate checklists. For instance, we did not offer the use of Speaking production assessment checklist at the initial stage of learning, and students filled in Speaking production assessment checklists (for grammar/range of vocabulary/coherence/cohesion/pronunciation/fluency) one at a time while studying the same oral performance sample. Our rationale was that it is still a challenge for students to assess all necessary aspects simultaneously in one oral performance. Therefore, they were trained to assess one aspect at a time using a corresponding checklist. The list of all the checklists designed in line with the course is presented in Appendix C.

94 Olga Ukrayinska

In order to eliminate any factors that might hinder success of developing LAL in classroom we needed to identify possible pitfalls in advance and plan corresponding preventive measures. We piloted initial versions of checklists that helped us assess their possible impact on the teaching and learning process. This stage included having the students fill them in while designing needs analysis quizzes, tasks for assessing communicative competency, and rating scales, and while assessing samples of oral and written performances, doing peer- and self-evaluation, and evaluating their English or French teachers’ assessment practices. After that, we evaluated the quality of the students’ products and interviewed them about how helpful or misleading the checklists were. It was important to monitor the application of the checklists in order to check if everything was being done according to the plan. It turned out that apart from the factors enumerated above, the students found some terminology to be confusing. Due to that in making checklists we decided to use only those terms included by us in the Glossary of language assessment terms provided to the students at the beginning of the course, and included in exercises aimed at drilling the essential terminology. A teacher’s oral instructions do not standardize a procedure to a desirable extent, as the students may miss something important. In a situation such as this, the potential of checklists is vast. A teacher may ensure the quality of products by not only evaluating them themselves, but also having them evaluated by students, as checklists can enumerate necessary steps and facilitate the peer-evaluation of tasks. Checklists can not only be used by a teacher for evaluation but also by students as guidelines showing them what to focus on and thus developing their attention. Standardization is ensured by giving the same checklists to all students, which they fill in and exchange with three more groupmates, and finally submit to the teacher, accompanied by the corresponding product. Checklists are done and submitted electronically which makes it cost effective and quick, though not necessarily convenient to be read while processing another text on the screen. We used Telegram, an instant messenger, chosen by the students themselves.

Method Quantitative and qualitative methods were used for collecting and analyzing the data related to the efficacy of using checklists in the learning process. This chapter discusses the results based on the application of the qualitative method. The applied research methods include systematization of fundamentals of checklist design and implementation, critical analysis of the checklists and the products made by the students with their help, simulation of quizzes/tasks/ scales development, and the piloting and application processes at the universities. Practical methods are qualitative analysis of the items in the checklists, namely their inclusiveness and clarity.

Using checklists 95

The data presented in the present study were collected through direct observation and through qualitative analysis of the tasks by the students having used the tailored checklists. The sample of participants included more than 150 students in three universities in Kharkiv, Ukraine. The participants were those students who performed the role of item writers, raters, markers, and those who contributed as assessees. The sample using the elaborated checklists consisted of, on average, 21-year-old students with practically no teaching experience.

Results and discussion In order to design our own materials, we analyzed a number of ready-made checklists designed for observation and evaluation purposes (ALTE, 2001; Bachman & Palmer 1996; Cambridge Assessment, etc.). The analysis revealed some typical features, which we further exploited in our checklists. Apart from the content, we experienced the need to clarify the structure requirements we should rely on. Thus, we found that items can be expressed in a number of ways:      

Sentence fragments: Missing subjects with predicates expressed with the verbs either in the Present Simple or Past Simple: Describes classroom objects; retells a film; confused by words with multiple meaning. Modals: May display inconsistency in using …; may hesitate often when speaking; I can confirm that an item has one correct answer. Noun phrases: Frequent substitutions for words; weighting, number of tasks. Gerunds: Distinguishing fact from opinion; scoring; repeating words. Sentences: Every item is numbered, and option is lettered; the item as a whole measures the objective. Questions: Yes/No-questions (Have you developed the topic and provided details about all aspects of the task? Is the language of the text at an ambiguous level?), ‘Wh-questions’ (What is the purpose of the task? How adequate are rubrics?).

We also decided to add the option ‘Imperatives’ (Find and circle…; Check off the presence of the KEY; Ask a question to encourage the student to speak), as our checklists are meant to be guidelines and this format fits this context. In some checklists there is either a space or an extra column for comments, explanations, and/or references. For all task evaluation checklists to be filled in by the student’s groupmates, we provided a space for their comments in case they didn’t check something off or found it inappropriately done. More complex checklists are normally divided into sections to classify phenomena under analysis. We found this practice useful as it was a way to attract students’ attention to particular things and keep them concentrated on these subcategories (e.g., Item: Stem, options).

96 Olga Ukrayinska

As we have discussed, there are several ways to make checklist. We suggest that checklists for developing LAL require ticking, choosing Yes/No, or answering (e.g., for Assessment by teacher observation checklist). Our decision was not to use Likert scales in checklists for the time being as this might have reminded students of rating scales, customized versions of which were already in use by students. Checklists done by the 5th-year students were not anonymous as the teacher knew the author of the pieces for the purposes of evaluation. Checklists offered by the students to younger learners were completed anonymously so as to not put undue stress on the respondents. The data collected were analyzed by students and were not subjected to evaluation by the teacher. AbdelWahab (2013, p. 59) claimed that many scholars offer checklists with generalizable criteria, but to achieve better results checklists must be tailored to the needs and wants of students in a particular context. Therefore, in order to increase the efficiency of the checklist application, we personalized them (i.e. tailoring them to respond to particular tasks), which contributed to a better understanding of the processes by the students. There were checklists to review different task formats, to evaluate oral/written performance elicited by concrete tasks. Another challenge we faced was the classification of checklists. There was a great number of checklists and it was critical to identify the purpose for which each of them should be used. Hence, we classified them according to their function, but we had to assign double names: operational guidelines checklists, reminder submission checklists, evaluation selection checklists (for evaluation of selected input), evaluation quality assurance checklists (for task evaluation by a student item-writer), evaluation review checklists (for task/ rating scale reviewing), evaluation observation checklists (for rating oral and written performance), and evaluation error prevention checklists (for selfevaluation of written performance). It should be mentioned that the checklists we developed for students studying French differed from those for students studying English, as there were some test formats specifically used in French tests, including those where grammar and vocabulary were the focus in the rating scales. The students’ end-of-course reflections indicated increased autonomy in task selection, adaptation, and design, and in rating scale development and evaluation. They also indicated increased motivation as students witnessed the practical application of the acquired knowledge and skills. Furthermore, we observed the positive effect of close-to-real-life professional collaboration on knowledge construction processes. Experimental data collected during the experimental teaching testified that checklists can be effectively used for teaching purposes, and for developing LAL in particular, though the process of their design is fairly demanding and time-consuming.

Using checklists 97

Implications Having experimented with checklists and obtained more than satisfactory results, we are now in a position to claim that this pedagogical tool is suitable for use in developing LAL pre-service due to its qualities of being straightforward, gradable, and flexible. We have determined the practical value of checklists, though we have not yet explored their full potential. The presented general characteristics, and the description of the structure, functions, and ways of using checklists may serve as a foundation for further researches in this area. Despite the limited scope of this study, our findings can provide broader implications for university teachers and teacher trainers. These individuals can further develop and experiment with checklists reflecting their particular instructional context and the intended outcomes. Checklists are a practical tool for teachers to track their students’ competency. As we see it, developing LAL with the help of a checklist is an integral part of the assessment literacy development of preservice teachers. This is an opportunity for them to develop their professional competency, and to create partnerships with teachers and younger students. Moreover, students can learn to design checklists themselves and will thus acquire this tool as their learning strategy. The results specified in checklists will facilitate their reflection on learning. It is crucial for future teachers to get real, pre-service teaching practice while they still have an opportunity to make mistakes, to learn, and to get remedial help on the spot, which will prevent future problems.

Conclusion The present chapter was dedicated to the study of the theoretical grounds and practical application of checklists for developing student teachers’ language assessment literacy. Theoretical findings and ready-made checklists were revisited with the view of adapting them to the targeted learning context and our specific goal. We outlined the general characteristics of checklists and finetuned them. This chapter reports the findings of the experimental teaching of students doing their master’s degrees. The results obtained make us believe that the use of standardized checklists helps make the learning process more efficient, accurate, and reliable. Checklists provide a practical guide for tasks and rating scales design, and they indicate what constitutes each process based on target learners’ characteristics. These standards can be further refined and applied for other particular tasks. The chapter presents a sample of customized checklists to enable university teachers and other experts to evaluate them. The use of checklists is expected to make the teaching and evaluation process more efficient, accurate, and reliable. Future studies will focus on the reliability and validity of the elaborated checklists and present statistical evidence of their efficiency.

98 Olga Ukrayinska

Appendix A: Characteristics of checklists for developing student teachers’ language assessment literacy Structure Checklists have minimum of two columns and maximum of six columns. Some have spaces for comments. The number of columns depends on the number of people using the checklist. Thus, checklists with two columns are to be used by individual students only or by the teacher. Checklists with three columns are to be completed by the student and by the teacher evaluating their product or the appropriacy/accuracy of the observation with the teachers’ comments below the table. Checklists with four columns are to be filled in by the student and three other groupmates, evaluating the product of the student with the student’s comments included below the table. Checklists with five columns are to be used by the student and three groupmates, and checklists with six columns are to be completed by the student, three groupmates, and the teacher. The patterns vary depending on the function of the checklist. If meant to guide, then they are completed by only the student themselves. If used for evaluation and quality assurance, then they are filled in by other people and are followed by their commentaries.  

  

Items come each one below others. Different marking for checking off can be used (ticks, pluses, minuses, points). An item can be marked if the element is present in the product or the necessary action is done, if the element described in the item is not present but should be added, and if the element is present but not appropriate and should be changed. In checklists filled in by other people, a checkbox can be marked only if the element is present and appropriate. Hence, the patterns are Present/Absent, Present and Appropriate/Present and Inappropriate, Complete/Incomplete. Checklist may comprise nouns, phrases, Yes/No questions, and statements. Enumerated items follow the order of steps to be taken by students and can be grouped in categories. Checklists should not be longer than two pages.

Item

Student

Student 1

Student 2

Student 3

Teacher

+



+





Content  

The range and number of items should be well-grounded. Items in checklists should be based on learning objectives and reflect stages of task development. They should be standards specific for the context,

Using checklists 99



characteristics of the task, and its constituents, and characteristics of oral/ written performance. Items should be short, clear and not ambiguous, grammatically accurate, and with a limited range of terms.

Functions From a student’s perspective: To guide the processes of task or input selection, task or input adaptation, item writing, rating scales design and assessment of oral and written performance. To remind the student of the range of elements to include and order of steps to take to demonstrate the completeness of the task. To evaluate the elaborated products (peer and self-evaluation). To prevent errors in elaborated tasks and written performance to encourage to reflect on the results (both successes and failures). From a teacher’s perspective: To guide students through the selection, adaptation, and design processes, and to check the processes compliance. To control the accuracy and completeness of steps taken by students. To standardize students’ behaviour. To remind students of the range and order of things to do. To identify whether key steps have been taken. To identify the presence or absence of conceptual skills. To prevent errors. To evaluate the results. To know if students need assistance or further instruction. To simplify complex tasks. Types Operational checklists are used to standardize assessment procedures. Observational checklists are used by students to assess oral and written performance. Evaluative checklists are used to evaluate elaborated products for quality assurance. Application 

Checklists are used after the lectures are given, when students have been familiarized with the terminology and basics of assessment procedures. They are made use of throughout the course at the planning, development, and administration stages.

100 Olga Ukrayinska



 

Students can complete a checklist either while designing some product or assessing performance to READ-DO, or, after the task has been completed, to ensure DO-CONFIRM while they are engaged in specific activities or processes. Checklists are designed, filled in, and submitted electronically for the sake of practicality. Items are specified in line with particular groups of learners (i.e. years of study) and particular tasks. Thus, students are involved in their modification, though provided with the basic template.

Appendix B: Self-evaluation. Formal letter checklist Task for the 5th year students: You have received a call to apply to the interactive webinar on computer-assisted language learning. The number of participants is limited. Write a letter of application.

Parameters Task achievement / Structure Formal style: No contractions (doesn’t, haven’t, isn’t, etc.) No direct questions Polite clichés (I would like to …, I would be grateful, I would appreciate, If you could, etc.) Paragraphing (at least three paragraphs) Appropriate salutation (Dear Sir/Madam,) Introduction: Self-introduction Referencing Goal of writing Main body: Stating that participation is needed Appropriate reasoning why participation is important Closing lines: Thanking ‘I look forward to your reply’ Yours faithfully, / Faithfully Yours, Name, surname Range of vocabulary

Tick if present

Using checklists 101

Parameters

Tick if present

Adjectives: crucial / significant / essential / helpful / indispensable, valuable (experience), up-to-date, grateful, demanding, rewarding, ultimate (at least two from the list) Verbs: contribute to, find smth. to be, to necessitate, to motivate, to involve into, to meet (the needs), to facilitate, to keep up with, to enquire / inquire, to respond, to acknowledge, to demonstrate (at least two from the list) Nouns: course, advanced technologies, targeted learners, computer literacy, requirements, in regard to / in reference to, significance, provision, relevance, assistance, awareness (at least two from the list) Adverbs: essentially, obviously, undeniably, undoubtedly Range and accuracy of grammar Range of structures: Present Simple, Present Perfect, Passive Voice, indirect questions, conditionals, modals, Participles Subject-predicate agreement(underline in single line all the subjects, use double-line underline all the predicates, make sure you have both of them, circle the subjects third person singular, make sure their predicates have –(e)s if in the Present Simple) Doubling(find verbs with the endings -ed and -ing, make sure you doubled the final consonant if needed) Indefinite article(find countable nouns in singular, use a line of dashes to underline them, make sure you have the indefinite article in front them if there is no any other determiner Coherence / Cohesion Linking words: moreover, furthermore, in addition, eventually, on the contrary, meanwhile, similarly, likewise, therefore, consequently, the reason why, in particular, due to / owing to / on account of, provided (at least three from the list) Logical development of ideas

Appendix C: The list of checklists elaborated for developing student teachers’ language assessment literacy Name of the checklist Planning assessments stage Needs Analysis Quiz Checklist

Application

(for students) to design a quiz for collecting data on the needs and interests of students of the 1st-4th years (for the teacher) to evaluate the students’ needs analysis checklists Defining the Construct Checklists: to define the construct for assessing targeted skills for Assessing Reading / Listening in the stated context in line with the standards / Speaking / Writing (CEFR, curriculum, syllabus, specific teaching practice), the learners’ needs and interests to reflect them in the content and test formats

102 Olga Ukrayinska

Name of the checklist

Application

Task Analysis Checklists: for Assessing Reading / Listening / Speaking / Writing

(for students) to guide the analysis of ready-made tasks taken from various textbooks, proficiency tests past papers, tests made by Ukrainian teachers and students for: a) a better understanding of task structure; b) learning to evaluate tasks in order to detect drawbacks or understand whether they meet the assessment needs in the given context(for the teacher) to evaluate the tasks designed by the students

Developing assessments stage Task Selection Checklists: for Assessing Reading / Listening / Speaking / Writing

Task Adaptation Checklists: for Assessing Reading / Listening / Speaking / Writing

Text Selection Checklist

Input Selection (picture / photo, diagram) Checklist

Audiotext Selection Checklist

Video Selection Checklist

(for students) to guide the selection of ready-made tasks meant to be used to assess the level of achievements of the students of the 1st-4th years (for the teacher) to evaluate the appropriacy of tasks selected by the students (for students) to guide the analysis of ready-made tasks meant to be used to assess the level of achievements of the students of the 1st-4th years and to introduce changes to meet the standards and individual characteristics of the targeted group of learners(for the teacher) to evaluate the appropriacy and accuracy of the students’ task adaptation (for students) to guide the selection of an authentic text to be further exploited for designing a task for testing reading skills in line with the standards (curriculum, syllabus, specific teaching practice), the learners’ needs and interests(for the teacher) to evaluate the appropriacy of the text selected by the students (for students) to confirm that a photo / diagram can be exploited for designing a task for testing speaking or writing due to its content, clarity and quality(for the teacher) to evaluate the appropriacy of the input selection by the students (for students) to confirm that an audiotext is appropriate to be exploited for designing a task for testing listening skills in line with the standards (curriculum, syllabus, specific teaching practice), the learners’ needs and interests due to its content and quality(for the teacher) to evaluate the appropriacy of the audiotext selected by the students (for students) to confirm that a video is appropriate to be exploited for designing a task for testing listening skills in line with the standards (curriculum, syllabus, specific teaching practice), the learners’ needs and interests due to its content and quality (for the teacher) to evaluate the appropriacy of the video selected by the students

Using checklists 103

Name of the checklist

Application

Text Adaptation Checklist

(for students) to guide the adaptation of an authentic text so that it meets the standards and individual characteristics of the targeted group of learners to be further exploited for designing a task for testing reading skills(for the teacher) to evaluate the appropriacy and accuracy of the text adapted by the students (for students) to guide the selection of an appropriate task type to assess the level of achievements of the students of the 1st-4th years in line with the standards and the characteristics of the pre-selected input(for the teacher) to evaluate the appropriacy of the task type selected by the students (for students) to make sure that all the necessary information concerning the task, source is provided for exchanging it with three other students and then submitting it the teacher(for the teacher) to make sure that all the necessary information concerning the task, source is provided (for students) to guide development of rating scales for assessing of particular speaking / writing tasks(for the teacher) to evaluate scales designed by the students (for students) to guide development of rating scales for assessing of particular speaking / writing tasks done by a) students of the 5th year to learn about the idea of peer assessment; b) students of the 1st4th years(for the teacher) to evaluate scales designed by the students (for students) to guide development of rating scales for assessing of particular speaking / writing tasks done by a) students of the 5th year to learn about the idea of peer assessment; b) students of the 1st4th years(for the teacher) to evaluate scales designed by the students

Task Type Selection Checklist

Task Submission Checklist

Speaking / Writing Rating Scale Development Checklists Speaking / Writing Rating Scales Development for Peer-Assessment Checklists

Speaking / Writing Rating Scales Development for Self-Assessment Checklists

Administering assessments stage Spoken Production Assessment Checklists (separate checklists for each criterion: Task Achievement / Content; Range of Grammar; Range of Vocabulary; Coherence / Cohesion; Fluency; Pronunciation / Intonation and one checklist with all the criteria included) Spoken Interaction Assessment Checklists (separate checklists for each criterion: Turntaking (Initiativeness) / (Responsiveness) / Interactiveness and one checklist with all the criteria included)

(for students) to help carry out assessment of speaking of students of the 1st-4th years using their own tasks (scales are designed specifically for a particular task)(for the teacher) to evaluate the accuracy of the assessment done by the students

(for students) to help carry out assessment of speaking of students of the 1st-4th years using their own tasks (scales are designed specifically for a particular task)(for the teacher) to evaluate the accuracy of the assessment done by the students

104 Olga Ukrayinska

Name of the checklist

Application

Written Production Assessment Checklists (separate checklists for each criterion: Task Achievement / Content; Range of Grammar; Range of Vocabulary / Spelling; Coherence / Cohesion and one checklist with all the criteria included) Written Interaction Assessment Checklists (separate checklists for each criterion (the same as for Production but with specified descriptors and one checklist with all the criteria included) Spoken Production Self-Assessment Checklists (separate checklists for each criterion and one checklist with all the criteria included) Spoken Interaction Self-Assessment Checklists (separate checklists for each criterion and one checklist with all the criteria included) Written Production Self-Assessment Checklists (separate checklists for each criterion and one checklist with all the criteria included) Written Interaction Self-Assessment Checklists (separate checklists for each criterion and one checklist with all the criteria included)

(for students) to help carry out assessment of writing of students of the 1st-4th years using their own tasks (scales are designed specifically for a particular task)(for the teacher) to evaluate the accuracy of the assessment done by the students

Spoken Production Peer-Assessment Checklists (separate checklists for each criterion and one checklist with all the criteria included) Spoken Interaction Peer-Assessment Checklists (separate checklists for each criterion and one checklist with all the criteria included) Written Production Peer-Assessment Checklists (separate checklists for each criterion and one checklist with all the criteria included)

(for students) to help carry out assessment of writing of students of the 1st-4th years using their own tasks (scales are designed specifically for a particular task)(for the teacher) to evaluate the accuracy of the assessment done by the students (for students) to help carry out self-assessment of their speaking in order to understand better the idea of self-assessment(for the teacher) to evaluate the accuracy of the assessment done by the students (for students) to help carry out self-assessment of their dialogue speaking in order to understand better the idea of self-assessment(for the teacher) to evaluate the accuracy of the assessment done by the students (for students) to help carry out self-assessment of their writing in order to understand better the idea of self-assessment(for the teacher) to evaluate the accuracy of the assessment done by the students (for students) to help carry out self-assessment of their interactive writing in order to understand better the idea of self-assessment(for the teacher) to evaluate the accuracy of the assessment done by the students (for students) to help carry out peer-assessment of their groupmates’ speaking in order to understand better the idea of peer-assessment(for the teacher) to evaluate the accuracy of the assessment done by the students (for students) to help carry out peer-assessment of their groupmates’ dialogue speaking in order to understand better the idea of peer-assessment(for the teacher) to evaluate the accuracy of the assessment done by the students (for students) to help carry out peer-assessment of their groupmates’ writing in order to understand better the idea of peer-assessment(for the teacher) to evaluate the accuracy of the assessment done by the students

Using checklists 105 Name of the checklist

Application

Written Interaction Peer-Assessment Production Peer-Assessment Checklists (separate checklists for each criterion and one checklist with all the criteria included) Miscellaneous Giving Feedback Checklist

(for students) to help carry out peer-assessment of their groupmates’ interactive writing in order to understand better the idea of peer-assessment(for the teacher) to evaluate the accuracy of the assessment done by the students

(for students) to guide students in giving feedback to learners on the results of speaking / writing tests(for the teacher) to evaluate the accuracy of the feedback given by the students Assessment by Teacher Observa(for students) to guide students’ observation of their tion Checklist English / French teacher assessments(for the teacher) to evaluate the accuracy of the observation done by the students Materials for Submission Checklist (for students) to make sure that all the materials are ready to be submitted to the teacher(for the teacher) to make sure that all the materials are submitted by the students

References AbdelWahab, M. M. (2013). Developing an English language textbook evaluative checklist. Journal of Research & Method in Education (IOSR-JRME), 1(3), 55–70. Alberta Education. (2008). Assessment in mathematics: Assessment strategies and tools: Observation checklist. Alberta Education. http://www.learnalberta.ca/content/mewa/html/a ssessment/observation.html Association of Language Testers in Europe (ALTE). (2001). Resources – free guides and reference materials. Association of Language Testers in Europe. https://www.alte.org/Materials Association of Language Testers in Europe (ALTE). (1998). Studies in language testing: Multilingual glossary of language testing terms. (Vol. 6) Cambridge University Press. Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests. Oxford University Press. Cambridge Assessment. (n.d.). Checklist to improve your writing – Level C1. Cambridge University. https://www.cambridgeenglish.org/Images/286979-improve-yourenglish-checklist-c1.pdf Collins. (2020). Checklist. Collins English dictionary. Collins. https://www.collinsdictiona ry.com/dictionary/english/checklist Council of Europe. (2011). Manual for language test development and examining. Council of Europe. https://rm.coe.int/manual-for-language-test-development-and-examiningfor-use-with-the-ce/1680667a2b Dudden Rowlands, K. (2007). Check it out! Using checklist to support student learning. The English Journal, 96(6), 61–66. Educalingo. (2020). Checklist. Educalingo: The dictionary for curious people. Educalingo. https://educalingo.com/en/dic-en/checklist Fiderer, A. (1999). 40 rubrics & checklists: To assess reading and writing. Scholastic Inc.

106 Olga Ukrayinska Fuentes, E. M., & Risueno Martinez, J. J. (2018). Design of a checklist for evaluating language learning websites. Porta Linguarum, 30, 23–41. Fulcher, G. (2015). Re-examining language testing: A philosophical and social inquiry. Routledge. Gawande, A. (2009). The checklist manifesto: How to get things right. Metropolitan Books. Green, A. (2013). Exploring language assessment and testing: Language in action. Routledge. Khalifa, H., & Salamoura, A. (2011). Criterion-related validity. In L. Taylor (Ed.), Studies in language testing: Examining speaking: Research and practice in assessing second language speaking. (Vol. 30) (pp. 259–292). Cambridge University Press. Lexico. (2020). Checklist. Lexico: The English dictionary. Lexico. https://en.oxforddictiona ries.com/definition/checklist Mukundan, J., & Nimehchisalem, V. (2012). Evaluative criteria of an English language textbook evaluation checklist. Journal of Language Teaching and Research, 3(6), 1128–1134. O’Sullivan, B., Weir, C. J., & Saville, N. (2002). Using observation checklists to validate speaking-test tasks. Language Testing, 19(1), 33–56. Osborn, A. (1953). Applied imagination: Principles and procedures of creative problem solving. Charles Scribner’s Sons. Scriven, M. (2007). Key evaluation checklist. https://wmich.edu/sites/default/files/atta chments/u350/2014/key%20evaluation%20checklist.pdf Scriven, M. (2000). The logic and methodology of checklists. https://web.archive.org/ web/20100331200521/http://www.wmich.edu/evalctr/checklists/papers/logic% 26methodology_dec07.pdf Strickland, K., & Strickland, J. (2000). Making assessment elementary. Heinemann. Tsagari, D., Vogt, K., Froehlich, V., Csépes, I., Fekete, A., Green, A., Hamp-Lyons, L., Sifakis, N., & Kordia, S. (2018). Handbook of assessment for language teachers. The European Commission. WebFinanceInc. (2020). Checklist. In Business dictionary. WebFinanceInc. http://www. businessdictionary.com/definition/checklist.html Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Palgrave Macmillan. Wilson, C. (2013). Credible checklists and quality questionnaires: A user-centered design method. Elsevier. Wu, R Wi-fen. (2014). Studies in language testing: Validating second language reading examinations: Establishing the validity of the GEPT through alignment with the Common European Framework of Reference. (Vol. 41). Cambridge University Press.

Chapter 7

An investigation into the correlation between IETLS test preparation courses and writing scores Students’ reflective journals Fatema Al Awadi Introduction The use of the English language in the Arab world and United Arab Emirates (UAE) has grown. This growth has had an impact on academic achievements and studies. Hence, one of the requirements to be enrolled at a university in the UAE is to have English language skills that meet specific standards. Students at all levels must develop their English as they progress through their school years until they reach the university level. Due to the rapid changes in education, testing language skills has become important in measuring students’ language ability. Several tests have been used as a tool to exit high school and proceed to higher education, or to exit into a desired major at universities. Research has been conducted on the effectiveness of these tests as a tool of language measurement not only within the UAE, but also worldwide (Freimuth, 2014; Gitsaki et al., 2014; Raven, 2011). The International English Language Testing System (IELTS) test measures students’ skills in reading, writing, listening, and speaking. This test may impact students’ language usage and their performance of university or school tasks (Ata, 2015). This test has been used as an entrance tool for higher education institutions in the UAE, as well as for exiting foundation studies into specific majors of interest (Freimuth, 2014). Despite all the data on the efficiency of IELTS tests, this efficiency still differs from one culture to another and from one student to another. This is because various conditions affect the implementation of the test, the test takers, and the band classification result. The amount of test preparation by students is another factor that affects test results (Gitsaki et al., 2014). Students at a particular level or from similar cultural backgrounds might have common errors when taking the test, and these can be revealed either during preparation courses, or by the band result (Ata, 2015; Hughes, 2003) The IELTS test score has influenced the study plans of those joining colleges and universities within EFL communities. As research has stated, there is a correlation between test scores, students’ coursework, and how

108 Fatema Al Awadi

students employ English in educational tasks (Gitsaki et al., 2014). In addition, IELTS test preparation courses help students develop writing skills as well as an understanding of different discourses. The students learn how to write coherent and cohesive texts, which enhances their IELTS writing and also writing in other genres (Qin & Uccelli, 2016).

Review of the literature The foundation of this study consists of three different theories that contribute to its theoretical scheme. The correlation of constructivist theory, systemic-functional theory, and the theories of reflective practice and writing in education will be discussed in detail individually (see Figure 7.1). One reason for choosing these theories is that they are considered as major components in this field of study, and they will support and explain the findings related to the correlation between IELTS scores, preparation courses, and the writing of reflections. Furthermore, EFL learners construct language through interaction, which gives social-constructivist theory achieves greater importance than the other theories. Both the systemic-functional theory and the theories of reflective practice in education are key components of the reflective writing, and the teaching written genres in preparation courses, and of IELTS testing, which are the focus of this research.

SocialConstructivism Theory of Language

Theories of Reflective Practice in Education

An Investigation into the correlation Between IETLS Test Preparation Course and IELTS Writing Scores and Students’ Reflective Journals SystemicFuctional Approach

Figure 7.1 A chart showing the correlation between the theories in the theoretical framework

An investigation 109

Constructivist theory is based on work by Piaget and Vygotsky, and it posits that the ‘disequilibration process’ of previous knowledge and new concepts can lead to a cognitive change in the language (Slavin, 2014). In this theory, learners are viewed as builders constructing their knowledge through interacting with the surrounding environment, people, and objects, resulting in the construction of meanings, understanding, and interpretations (Freeman & Freeman, 2001). According to researchers, Vygotsky’s work in the 1930s revealed that learners can perform better with the help of a more knowledgeable person to reach the Zone of Proximal Development (ZPD), which pointed out the significance of social interaction in language learning (Berk, 2009; Brewster et al., 2002; Slavin, 2014). Constructivists take their main foundations from Vygotsky’s theory, particularly from the four key principles which are: ZPD, social interaction, mediated learning, and cognitive practice (Slavin, 2014). The systemic-functional approach (SFL) to genre writing was first developed by Halliday. It emphasizes the use of language within a genre to build meanings in humans’ minds (McCarthy & Carter, 1994). This approach influenced the understanding of how genre is implemented in teaching language and text analysis (Bawarshi & Reiff, 2010). It reflects the link between the language and social context of the genre, and it also focuses on how the language is employed purposefully in chronological order (Hyland, 2007). Halliday’s study represented a classification of seven functions used by learners, and stressed the fact that learners use the language purposefully within the surrounding environmental conditions. Later, students might employ that language within strict communicated contexts (Emmitt et al., 2003). Emmitt et al., (2003) points out that ‘Halliday collapsed his original seven functions to three major ones that became the basis for functional systemic linguistics; ideational, for the communication of ideas; interpersonal, for the expression of feelings; and textual, for the relationships within a text’ (p.32). Despite the fact that the systemic-functional approach focuses on language’s textual use, it also emphasizes how meaning is communicated through the text. Similarly, it pays attention to how that meaning is constructed within a genre with attention to how the learner is using the language (Derewianka, 2000).

Reflective journals in education Reflections are often used by teachers to evaluate teaching abilities, strategies, and skills, and they enable educators to build on and criticize own abilities (Kyriacou, 2007). Reflective journaling is based on reasoning drawn from the teacher’s own practice and covers not only teaching, but also the learning process and professional development of educators (Johnson 1999). Reflective written journals offer a chance for teachers to identify solutions to recurring problems and to recognize incidents. Dewey and Schon (cited in Zeichner & Liston, 1996), were theorists who viewed and reframed reflective teaching and teachers. Dewey (cited in Zeichner & Liston, 1996) viewed the catalyst of

110 Fatema Al Awadi

reflection to be the encounter with a difficult event, following which teachers would provide an analysis of their experience, by revisiting it. Findings from researches shown that Schon divided reflective teaching into two different frames which are ‘reflection on action’ and ‘reflection in action’ (Johnson, 1999; Zeichner & Liston, 1996). ‘Reflection on action’ can occur before an action or afterward, or it can happen unexpectedly so that it requires the teacher to adapt the instructional method. Therefore, solutions can happen spontaneously, and the teacher does not have to plan in advance for them (Johnson, 1999). This dimension will be considered in the analysis of reflections and in identifying students’ understanding of reflective writing. Not only does reflective journal writing impact teachers’ professional development, it also affects EFL teachers’ language development. Being a native or non-native speaker requires reflection on language use, grammar, and functions throughout the lesson (Brewster et al., 2002). In fact, when teachers are not reflective, they are faced with issues in thinking critically about their teaching, which might make them unable to find alternatives (Kyriacou, 2007; Zeichner & Liston, 1996). Researchers have found that writing reflective journals can clarify the link between the implementation of theory and practice (Kyriacou, 2007; Quirke & Zagallo, 2009; Spencer, 2009). In addition, reflective journal writing should include teachers’ goals and their perspectives on their teaching skills and instruction, which promotes an in-depth analysis and reflects back teachers’ essential focus (Kyriacou, 2007; Quirke & Zagallo, 2009). Similarly, reflective writers often try for accuracy in narrating events, as they are given the freedom to determine which information to reflect on. Hence, teachers are provided with opportunities to learn and improve their experience along with their language (Burton, 2009). Thus, the research intends to use discourse analysis to look at the correlation between the writing of reflections and factors of language development in teachers. Learners at higher education levels are required to take notes from lectures, write academic essays, and manage referencing skills in English, all of which leads to the concept of English for academic purposes (EAP) (Harmer, 2001). EAP focuses on cognitive skills and developing language concepts to succeed at a university level. This can be achieved not only through learning essential skills such as reading, writing, speaking, and listening, but also by using critical thinking skills (Wilson, 2016). With English being the language of instruction for EFL students, it is meant to provide comprehensible input on the content being taught, learned, or practiced (Buri, 2012). This concept of EAP, originally formulated by Krashen (cited in Buri, 2012; Lightbown & Spada, 2013), shapes students understanding of the concepts acquired through the use of the second language. Thus, in EAP, writing and reading skills are learned within a context of interest, and employed further in a context of social experience, rather simply learning them as isolated skills. Wingate & Tribble (2012) assert that a learner will develop a critical language awareness through practicing within a context related to their cultural background and learning experience.

An investigation 111

One of the key elements in EAP learning is to understand ‘how’ to communicate ideas clearly to teachers and other students. As a result, students improve their reasoning and their ability to discuss events and experiences (Asoodar et al., 2016; Wilson, 2016). In addition, students in higher education are required to draw resources, assumptions, and notes from the field of learning to develop their arguments when using English in academic contexts (Krekeler, 2013).

IELTS in EFL context The International English Language Testing System (IETLS) has become a crucial topic of discussion among educators. Many researchers argue that there are differences between students’ knowledge and their performance in the IELTS test (Ata, 2015; Panahi & Mohammaditabar, 2015). There are other factors which impact IELTS tests in EFL contexts beside knowledge and language performance, such as gender, age, language motivation, and interest (El Massah & Fadly, 2017). Within a UAE context, IELTS has become an important test for students joining higher education institutions, as it determines their level of study based on test performance (Freimuth, 2014). Due to change in the economy and English being a central language of communication, teachers have to keep up and work on improving English language skills. This is the major focus of higher education institutions in the UAE in terms of equipping EFL Emirati students with English skills to meet the standards of the workforce and community (Gitsaki et al., 2014; Raven, 2011). This focus determines how EAP is taught to college students and how they prepare for the IELTS test (Gitsaki et al., 2014). Thus, a great deal of emphasis is given to academic writing in IELTS preparation courses (Moore & Morton, 2005). In spite of the differences between preparation components and academic writing courses at college levels, they all help to build EFL students’ language skills (Gitsaki et al., 2014; Moore & Morton, 2005). Educators teaching test skills and writing must pay attention to the strategies that help students to write cohesive and coherent texts (Hughes, 2003). Many researchers believe that the more preparation that is done, the more EFL learners acquire and master the different skills, and the better they will perform in the IELTS test (Moore & Morton, 2005; Panahi & Mohammaditabar, 2015). They also believe that test markers and designed descriptors can affect the reliability of tests, which preparations courses try to achieve as much as possible (Gitsaki et al., 2014; Panahi & Mohammaditabar, 2015).

Problem and rationale of the study Teaching and learning standards have been raised due to the rapid development of the UAE education system and the growth of English language study in different educational fields (Gitsaki et al., 2014). Therefore, students must be able to meet those standards to progress through their studies. English as a

112 Fatema Al Awadi

means of communication within higher education institutions has become more important; therefore, students must demonstrate and increase their abilities in English skills (Freimuth, 2014). Higher education determines students’ enrolment and specific place using IELTS results. Recently, the relationship between students’ performances in higher education and their IELTS band has been a topic of debate and concern of educators in the EFL context (Gitsaki et al., 2014; Panahi & Mohammaditabar, 2015). College language standards are designed to meet the requirements of the job market, and to ensure graduating students are able to use English effectively (Moore & Morton, 2005). As part of students’ progression, special programs are designed to improve students’ skills in English, as well as raising their bands in all IELTS skills. This helps students to exit programs with the highest qualifications (El Massah & Fadly, 2017). Research has considered the importance of IELTS preparation courses and the correlation between the amount of work done in advance and the IELTS band students reach (Johnston et al., 2014). Regarding my experience in the field of education, there is a demand for IELTS preparation and high IELTS scores as programs such as education, require future English teachers to demonstrate language knowledge. Students in such a program must demonstrate the development of language skill through coursework and assignments (Gitsaki et al., 2014; Johnston et al., 2014). Teachers tend to use reflective journals as assigned tasks and they are graded in terms of language ability. Preparing for IELTS and the correlation between scores and the writing of reflections is a topic researchers are still debating. This study attempts to investigate the correlation between an IELTS writing band and students’ performance in reflections assigned at higher education institutions. It also investigates how preparation courses for Emirati students have affected the quality of EFL learners’ writing reflections. This study aims to identify the correlation between IELTS band and the development of Emirati college students’ reflections. It focuses on the analysis of students’ journals reflecting on practicum experiences. The main reason for writing this study is, though discourse analysis, to investigate the correlation between IELTS scores and students’ production of reflective journals within the Education department of one higher education institution in the UAE. The study also examines the impact of IELTS course preparation on students’ achievements and on their understanding of the reflective genre. It will consider other factors revealed in the course of research, and will investigate previous studies of IELTS exams, as well as of teaching English skills to students at higher education levels. The study’s hypothesis is that students with prior preparations for IELTS would perform better in writing reflective journals. A further hypothesis is that students’ differences in IELTS bands would correlate with their level of achievement in writing reflections. The research questions help to clarify the possible the correlation between IELTS writing bands and preparation courses, and students’ reflections. The questions are as follows:

An investigation 113

1 2

How can IELTS preparation courses impact students’ writing of reflective journals? What is the correlation between the IELTS writing band and the student’s writing of reflective journals?

Method Participants and context The research was conducted in one of the UAE federal higher education institutions, in the Education department within that institution. The process of selecting participants for the study and identifying samples was undertaken in the Education department, and was based on students’ different levels. Also, the researcher ensured participants had two different IELTS scores when they joined and that they were about to exit the program. The researcher also noted that she had taught the participants. The participants were three female students who were majoring as Primary Generalists teaching English, science, and math. These students were selected from different program levels. Student A has graduated, student B is in her graduation year, and student C is currently in the middle of study. They all answered the interview questions in detail through emailing the questions with answers and they agreed to share their reflective journals and IELTS scores. All samples were accessible by the researcher who works as an Education Faculty member at the institution. The teaching and learning experiences varied between the students according to their level and teaching practicum experiences. The demographic data shows correlation between study levels in the B.Ed. program, IELTS writing scores, and reflective writing (see Table 7.1). Eight samples were collected from each student, except for student C, from Table 7.1 The participants’ demographic data Student Major

Number of Reflection Samples

Graduation

IELTS Writing Score Before

IELTS Writing Score After

IELTS Overall Score Before

IELTS Overall Score Before

A

8

Yes

5.5

6.5

5.5

6.5

8

No Final level

5.0

5.5

5.5

6.0

7

No

5.5

6.0

5.5

6.0

B

C

B.Ed. Primary Generalist B.Ed. Primary Generalist B.Ed. Primary Generalist

114 Fatema Al Awadi

whom seven samples were collected. The number of participants’ samples is possibly a limitation of this study. Research ethics required getting permissions from the Dean of Academic Operations as well as the research participants. This enabled the researcher to gain access to the resources needed for the study (Mills, 2014). The research permission letter given to the Dean of Academic Operations to gain the approval and get the needed data collection tools explained the research purpose and the results expected from the study. This information was also verbally communicated to the Dean. Before the interviews with the participants were conducted, they were given a verbal explanation and a brief written explanation via email. Participants’ responses to the questions in this email were considered to be their agreement to participate in the research. Similarly, participants’ permission to use their journals, IELTS preparation course materials, and IELTS test scores was verbally obtained after explaining how the materials would be used. As requested by the participants, their identities were kept anonymous in interviews and throughout the study. The letters A, B, and C will be used to refer to each one of the participants. ‘A’ refers to the student who already graduated, ‘B’ refers to the student who was at her last education level, and ‘C’ refers to the student who was in the middle level. The investigation ensured confidentiality in order to avoid any stress or embarrassment for participants as a result of their participation and the sharing of the study’s findings (Creswell, 2012; Mills, 2014).

Findings Genre and reflections topic vocabulary The analysis of students’ reflections revealed they improved understanding of the genre as they progressed through their study levels, which was presented through the ‘topic vocabulary’ to indicate to the context of reflections and the narrative genre. Since all the reflections were about the same general topic, the practicum, it was noted that the three participants employed the topic words to shape the register of the genre. For example, words such as ‘school’, ‘students’, ‘classroom’, ‘lesson plan’, ‘teacher’, and ‘lesson’ were frequently used in almost all reflections. These topic words indicated the context the reflections discussed, and it was noted that students A and C managed to use the topic words in a more linked structure. However, student B developed the use of topic vocabulary in reflections as she progressed through the study levels in terms of using linked, focused topic words related to the main idea of the context discussed. For example, in level 7 reflections, which were about the verb ‘to have’ and the family tree lesson, student B mentioned terms such as ‘lesson plan’, ‘planned’, ‘stage’, ‘teaching’, etc., which all represented the topic vocabulary of the text and are linked together to narrate the concept. I assume that this development was related to the increase of language level and performance in writing.

An investigation 115

As the students’ progressed through the levels in their education, it seems they further understood the narrative history genre, which was represented through the increased number of topic words to narrate events. The students used several words to narrate their experiences to the reader and to represent the events’ sequence and time, and to reflect upon their own teaching performance. For instance, student A in her level 1 reflection, used minor topic words to indicate the time, such as ‘in the third week’. Meanwhile, she increased the use of sequential narrating words when writing her level 7 reflection, such as ‘then’, ‘firstly’, ‘after twenty minutes’, and ‘in addition’. Students A, B, and C also demonstrated an understanding of the purpose of writing reflective journals, which included their opinions on their teaching, and which identified issues along with providing recommendations (see Table 7.2). I assume that the students developed an understanding of adding sequential topic terms to the narrative as they proceeded in learning further aspects to improve the genre context. In addition, in most of the reflections, students used an informal tone with some formality on occasion. They employed the topic vocabulary and expressions to affect the tone of the text. Some of these informal expressions were ‘I was delighted’, ‘I felt happy’, ‘playing hangman game’, ‘relation with the student’ etc. On the other hand, the students used words to increase the formality of the context such as: ‘rule violations’, ‘instructor’, ‘my mentor’, ‘special need student’, ‘school facilities’, ‘meetings’ etc. There was also frequent use of first person pronouns, which raised the level of informality, as the writers were narrating the history of their practicum experiences.

Grammatical & lexical cohesion Grammatical cohesion With reference to the literature about the importance of the grammatical cohesion in reflections in particular, and the narrative history genre in general, Table 7.2 Topic vocabulary used by the participants in all reflections Sequential topic vocabulary to narrate events

Topic vocabulary to represent the context and purpose of reflections

     

          

My last experience … Firstly, … Secondly, … Thirdly, … Every week … At the beginning,

Students Lesson plan Objectives Classroom Worksheet Activity Planning In my opinion … I think … I felt … It would be better if …

116 Fatema Al Awadi

students A, B, and C developed the use of suitable tenses to enhance the cohesion of the text. They based the texts on the use of past tenses, either past simple as a part of the narrative genre, or past continuous to talk, for example, about routines that occurred in schools in the discussed situation. Despite the use of past tenses, students sometimes included present or future tenses to mention a desire or a future decision in the upcoming lesson or plan (see Table 7.3 and Figure 7.6). Another key feature noticed in reflective journals was the occurrence of conjunctive cohesion, which includes conjunctions and adverbials. Each student in the focus group used different conjunctions and developed further understanding of their use as they went through the levels (see Appendices). Student B used ‘moreover’ in reflection 4 to add further information regarding an event that happened at the end of the practicum week (see Figure 7.2). Meanwhile, student C included several conjunctions in the first reflection for different purposes (see Figure 7.2). She used ‘secondly’ to add a further incident at school, which was Table 7.3 Pronouns used in some reflections as anaphoric reference Student A Subject pronoun

Object pronoun

I, She, They, It, You I, She, They, It We, I, They, He, It

Her, Them My, Their Them Their, his, my Them Their, Its, My

Themselves, Myself

Subject pronoun

Object pronoun

Possessives

Reflexive pronouns

Reflection 4

I, He, We, They, It

His, My, Their

-

Reflection 5

I, You, He, They, It

My

-

Reflection 7

I, They, It

Us, Me, Them Him, Me, Them Them, Me

Their, My

-

Reflection 4 Reflection 6 Reflection 7

Possessives

Reflexive pronouns

Student B

Student C Subject pronoun

Object pronoun

Possessives

Reflexive pronouns

Reflection 2 Reflection 3

I, She, They, It I, It, They, We, She

Their, My My, Their

Themselves

Reflection 5 Reflection 6

I, She, They, It I

Me Her, Me, Them Her, Them Them, Me

Their, My Their, My

-

An investigation 117

Figure 7.2 Excerpts representing the use of conjunctions by students B and C

followed by ‘suddenly’ to indicate a sudden change in the event; she also used ‘after that’ to follow the sequence of what she narrated. Similarly, she added a controversy in the event using ‘however’ to indicate that the event changed into the opposite. In contrast, student A tended to use coordinating linking words like ‘and’ and ‘but’ more frequently in her reflection, and she used other conjunctions less frequently than other students. In spite of this, her reflection pieces still had a basic type of cohesion since the use of conjunctions was of either subordinating or coordinating conjunctions. This development in the use of conjunctions showed that the students gained more language input, which led to further understanding on the use and functions of conjunctives within the genre. The existence of different types of references in the focus group’s reflections showed the achievement of further cohesion. First, the analysis of the texts revealed that the students used personal pronouns to refer to other participants in reflections and to avoid repetition. This was clear through the use of anaphoric referencing using the 3rd person pronoun ‘he’, ‘she’, ‘they’ or ‘it’ (see Table 7.3). To elaborate, student A in her level 4 reflection used the third person pronoun ‘she’ to refer to her MST (Mentor School Teacher) in the previous sentence. Student C used the pronoun ‘it’ to refer to the activity the student wanted to choose. Similarly, student B used ‘it’ to refer to the noun ‘language’ in her level 6 reflection. Second, since students A, B, and C were narrating their own experience, they used the first person pronoun ‘I’ to refer to themselves as the main character in their own reflections (see Table 7.3). Third, the students used possessive pronouns for the purpose of possessive reference, such as the third person possessive ‘their’. For example, student A used ‘their’ in reflection 7 to refer to the class that she will be teaching. Also, student B used ‘their’ to refer to the students mentioned in reflection 4, which was also used in student C’s reflection 2 for the same purpose (see Table 7.3).

118 Fatema Al Awadi

Furthermore, the focus group used the object pronoun ‘her’ to refer to a singular character in the story, as in student C’s reflection 3, where it was used as a reference to a girl in the class. The plural form ‘them’ was used more often to refer to characters in the narrative history genre, and thus referred to the ‘students’ in almost all the reflections. Additionally, student A used the reflective pronoun ‘themselves’ to refer to the ‘literacy class’ in reflection 7 as well as using ‘myself’ to refer to herself in the same reflection. While student C used ‘themselves’ to refer to ‘one by one student’ and ‘their friends’, student B didn’t make use of reflexive pronouns in her texts. This is also applied to the use of the possessive (-s’, -‘s’) to relate things to the nouns in the writing, and this was mostly found in student A’s reflections. In addition, the students used mostly the determiners ‘this’, ‘that’, ‘those’, and ‘these’ for referencing purposes, which in most occasions preceded the noun (Figure 7.3). These determiners were mostly used by all students to refer to nouns such as ‘class’, ‘lesson’, ‘strategy’, ‘student’, and ‘teaching practice’ etc. Only student C used ‘those’ when she quoted a sentence from a book where it was used to refer to ‘teachers who did not have high quality relationships’. ‘That’ was also mentioned in student C’s reflection 1 to point to ‘warning for the second time’, and it was also used by student B in reflection 5 to refer to

Figure 7.3 Excerpts representing the use of determiners in reflections

An investigation 119

the sentence ‘eyes on me’. Student A used ‘that’ to point to ‘a brilliant lesson plan’ and ‘preparing a great lesson’ in reflection 3. This shows that the students understood that the reflective genre is related to their personal experience in practicum and employed the determiners to refer to the participants in the text. Another feature of grammatical cohesion was the use of ‘article reference’. Students used the definite article ‘the’ at several places in the text to refer back to something that was introduced previously. Figure 7.4 shows that the students used ‘the’ mostly when talking about ‘class’, ‘time’, ‘students’, ‘classroom’, ‘teacher’, etc., which all were introduced previously in their reflections and were mentioned with the definite article ‘the’ since the reader was familiar with the concept. This demonstrates the students’ ability to relate ideas and showed links that were developed as they progressed. Lexical cohesion Regarding the lexical cohesion aspects found in the reflections, different items were noticed. The first key device was the use of ‘word repetition’ as part of tracking the readers’ attention along the reading, which was also mentioned as part of key topic vocabulary use. Repeating vocabulary as in Table 7.2 helped to tie together the ideas in the reflections, such as the repetition of the phrases and vocabulary like ‘students’ learning’, and ‘teacher’ (see Figure 7.5). Student A repeated ‘students’ learning’ which was the focus of the teacher during that lesson. Similarly, student B, in her eighth reflection, discussed the importance of the ‘questioning technique’ in the classroom. Thus, she repeated words such as ‘questions’, ‘questioning techniques’, ‘answers’ and ‘guessing’, to keep the attention on the same topic being discussed. A similar example can be found in what student C discussed in her first reflection – that she will observe her Mentor Teacher and consider her tasks and responsibilities. Therefore, ‘role and responsibilities’ was repeated twice to

Figure 7.4 Excerpts showing the use of ‘the’ for referencing purposes

120 Fatema Al Awadi

Figure 7.5 Excerpt from student A’s reflection 6 showing the repetition of lexical terms

emphasize the concept. Also, ‘teachers’ and ‘students’ were repeated a lot in the reflection to lead the narrative (see Figure 7.6). Student A repeated words such as ‘misbehaviour’, ‘violently’, ‘lesson plan’, and ‘attention’ as well, to emphasize the importance of a lesson plan in controlling behaviour. This shows that student A had a previous ability to tie the ideas together and link points through ‘word repetition’, while the other two improved this as they progressed. Linked to this point, the students used another feature of lexical cohesion, which was the synonym. It is true that the amount of repetition exceeded the number of synonyms in the text, but still some synonyms existed in some reflections of each student. Student C used ‘students’ rather than saying ‘boys

Figure 7.6 Excerpts representing word repetition used the participants

An investigation 121

and girls’ (see Table 7.6). She also used synonyms of ‘teacher’ in the same reflection like, ‘tutor’ and ‘instructor’. Another use of synonyms was in student’s A reflection 1 when she wrote ‘happy – delighted’ in the same text, and she used synonyms in reflection level 3, such as ‘misbehaviour problems – misbehaviour actions’, ‘confident – self-assured’ and ‘kids – students – children’ which showed a variation in using different terms that meant the same (see Table 7.4). Student B used synonyms in reflection 5 such as ‘involve – engage’ and ‘kids – students’ (see Table 7.5). However, she had an error at the end of reflection 5 when using the synonyms ‘in my opinion’ and ‘I think’ as one whole phrase at the beginning of the sentence ‘so in my opinion I think my …’. Considering these data and IELTS preparation course materials, the use of synonyms shows the students’ attempts to use a range of vocabulary which could be due to increased language knowledge. Students appeared to use the antonyms to show contradictory ideas under similar topics (see Table 7.4, Table 7.5, & Table 7.6). For instance, student A, in reflection 3, mentioned the following antonyms: ‘kids – adults’ and ‘negatively – effectively’ to indicate classroom management strategies and their effectiveness. As shown in Table 7.4, student A used ‘higher ability students – lower ability students’ to show the differences between the levels, ‘together – one student’ to demonstrate the differences in interaction patterns in class, ‘damaged – fixed’ and ‘the circuit was broken – the circuit must be connected’ to show how she explained and helped the students in the exploration to discover differences. There was more effective use of antonyms in student A’s reflection 8, which added a different type of link between the ideas discussed when she taught a science lesson. Conversely, both students B and C used fewer antonyms than student A. Student B used antonyms in the first and last reflections such as ‘confident – low self-esteem’, ‘correct – wrong’ and ‘critical answer – guessing’, and student C used three antonyms in reflection 5 as the highest number among reflections, but in the last two reflections she did not use any. This analysis shows also the use of collocations as another device to increase the text cohesion and make the written piece more predictable. All the students tended to use similar collocations related to the topic vocabulary. These are demonstrated above in Table 7.4, Table 7.5, and Table 7.6. They were ‘language skills’, ‘teaching strategies’, ‘paying attention’, ‘class time’, ‘checking understanding’, and a ‘teacher assistant’. It appeared that use of collocations in the reflections increased accordingly, depending on the topic discussed and as students progressed. To elaborate, the increase occurred when the students reflected on their teaching experiences rather than when they were observed for a task to be completed as part of the practicum requirement. For example, student A in reflections 4, 1, 7, and 8 used more collocations as she talked about her own experience in teaching and working in schools, while in reflections 2 and 3 she narrated observation experiences and was instructed to write about topics such as ‘observing the school mentor’ and ‘the importance of

Antonyms -

Starts – ends Put – remove Kids – adults Control the classroom – good management skills Effectively – negatively -

-

Themselves – myself Boys – girls

Higher ability – lower ability Together – one student Wasn’t damaged – was damaged Circuit was broken – circuit must be connected Damaged – normal one Damaged – fixed

Synonyms

Student – kids Delighted – happy

Planning – preparation Managing – organizing

Kids – students – children Misbehaviour problems – misbehaviour actions Confident – self-assured

Language skills –reading, writing, and listening Observe – see Children – students

Care – raising Encourage – reinforcement

Helps – supports

Literacy – read and write Helping – scaffold Helping – assisting

A gap in the wires – circuit was broken Showed – presented Simple – easy Students – children Experiencing – exploring

Reflections

Reflection 1

Reflection 2

Reflection 3

Reflection 4

Reflection 5

Reflection 6

Reflection 7

Reflection 8

Focus group

Student A

Table 7.4 Examples of synonyms, antonyms and collocations in student A’s reflections

Grab attention At the beginning Building knowledge Faced problems Discussed together

Language skills Enough time Calm down Went back It would be better if …

Classroom environment Have fun

Teaching practice Different things Break time

Good idea Paying attention Teaching strategies Interested in

Lesson plan Strong personality Eye contact

Lesson preparation

Built a good relationship Played a game Special need students Inappropriate behaviour Every week Teaching practice In my opinion,

Collocations

122 Fatema Al Awadi

Practical – suitable Presented – talked about Students – learners Scaffolding – support Promote – encourage Speak up – talk – share ideas

Reflection 7

Reflection 8

Reflection 6

Confidence – low self-esteem Guessing answers – thinking about it Guessing – critical answers

-

Whole class – individual

Polite – don’t be angry -

Moreover – in addition, etc. Involve – engage In control – manage In my opinion – I think Kids – students I believe – in my opinion I think – I believe Control – manage

Reflection 4 Reflection 5

Reflection 2 Reflection 3

Motivated – active and attractive Misbehave – managing the students Correct – wrong Engage – active and motivated Each student – everyone

Reflection 1

Student B

Antonyms

Synonyms

Reflections

Focus group

Table 7.5 Examples of synonyms, antonyms, and collocations in student B’s reflections Collocations

Light energy At the end of In the next day Family tree Lesson plan Feel worried Teaching goal Already know Critical thinking Background knowledge Questioning technique Lesson planning Thinking about it

Answer the question In the right way Lesson plan Move around Teaching practice In my opinion, etc. Eyes on me

An investigation 123

Learner – students To learn student – teaches A good relation – high-quality relationship Problems – rule violations Class – lesson Feedback – recommendations Students – body and girls Appropriate–suitable Went well – went good Three levels – low, middle and high Students – boys and girls

Reflection 2

Reflection 7

Reflection 6

Reflection 5

Reflection 4

Learners – students Work more – work hard Vocabulary – new words Work effectively – wholly involved Remember words – consolidate meanings of vocabulary words

Roles responsibilities Managing – control Grade 12 art section – students Grade 12 science section – students Teacher – tutor – instructor Job – work Delighted – happy

Reflection 1

Student C

Reflection 3

Synonyms

Reflections

Focus group Collocations

Classroom management Morning assembly First of all Naughty students Class time Second time Bad mood Good mood Hard – easy Teaching practice Have fun Past – present Working together Laugh – upset In my point of view Classroom management Overall aim Boys – girls Teaching practice Challenging – appropriate Classroom rules Overall aim Low – middle – high levels Teacher assistant Boy – girls Moving around Confused – get the idea In the future, Class time Lesson plan Do my best New words Correct answers To conclude

Naughty students – respectful

Antonyms

Table 7.6 Examples of synonyms, antonyms and collocations in student C’s reflections

124 Fatema Al Awadi

An investigation 125

planning’. Student B also had a similar example of using fewer collocations in reflections 1, 2, 3, and 4, as they were related to observation tasks and short teaching lessons as a novice teacher. Conversely, in reflections 7 and 8 student B used collocations such ‘lesson plan’, ‘teaching goal’, ‘family tree’, ‘questioning technique’ and ‘background knowledge’. However, student C was a different case, which could be related to her writing ability. Even though reflection 1 was written based on an observation task, she narrated the events more than the other students; she used ‘classroom management’, ‘morning assembly’, ‘bad mood’, ‘good mood’, and ‘class time’. The use of word sets in reflections was the most notable feature of lexical cohesion in students’ texts. The students used words such as ‘whole’ and ‘part’ indicate parts of the lesson or the practicum as a whole term. Student C mentioned the ‘very hungry caterpillar lesson’ as the whole term and other related terms, which are; ‘aim’, ‘different activities’, ‘matching worksheet’ and ‘feedback’. Student A mentioned the ‘science lesson plan’ as a whole term and the other sub-terms are ‘engagement stage’, ‘building knowledge stage’, ‘transition stage’, ‘worksheet’, ‘played a game’, and ‘exploring’. Similarly, in reflection 8, student B employed ‘questioning techniques’ as the whole term, while ‘guessing’, ‘critical answers’, ‘thinking’, ‘expect answers’, and ‘students’ background knowledge’ were the parts. This shows that the students developed an understanding of the reflective genre’s different purposes and what can support the understanding of the reader through the lexical cohesion. That improvement can perhaps be related to the input gained in education classes or IELTS preparation courses.

Discussion To answer the stated research focus questions, the qualitative data will be discussed. For this purpose, students’ reflections from different education levels, IELTS reports, preparation courses materials, and student interviews will be examined to answer the research questions and hypothesis. The hypothesis that students with prior preparations for the IELTS test could perform better in writing reflections, and the fact that their differences in IELTS bands could correlate with achievement in writing reflections, are considered when grading students’ exams. The data revealed different results related to the correlation of IELTS bands and students’ writing of reflections. The first result found is that the IELTS band cannot reflect all the students’ actual language ability accurately (Ata, 2015; Freimuth, 2014). For example, some students were recognized to be at a higher level and already performed strongly in writing reflections, as in the case of student A. The reason is that student A developed an understanding of writing the genre of reflections categorized as a narrative history genre (Derewianka, 2000; McGuire et al., 2009). In addition, the students developed a sense of writing the narrative history genre, which may be due to the development of writing skills and the increase in language level as they progressed

126 Fatema Al Awadi

(Harmer, 2001). The reason for this is the efficiency in using the topic vocabulary and genre register, which added clarity to the content being discussed (Quirke et al., 2009). This cannot guarantee the students’ successful understanding of the genre, but rather it can represent the understanding of writing reflections due to course requirements, or being trained to write such texts (Lightbown & Spada, 2013; Wingate & Tribble, 2012). Therefore, the students with high language ability can be expected to have a wide vocabulary, which can be employed to support the text register and enhance the clarity of the genre (Bawarshi & Reiff, 2010). This also leads to representing the social context and the purpose of writing reflective journals through the presentation of terms that identify the narration, as well as providing critical opinions to mirror the teaching experiences and recommend further improvements in learning and teaching (Bawarshi & Reiff, 2010; Lukin et al., 2011). Moreover, based on the interview answers and the previous analysis, students A and B already had very good language skills, which didn’t relate to the IELTS test band. This was related to to having strong foundational knowledge of English from instruction in schools, or when entering university. Some students with good language skills considered the requirement to take the IELTS test to be part of completing their studies (El Massah & Fadly, 2017; Freimuth, 2014) rather than as a way to improve their English, which shows a weakness in the IELTS test as a testing tool. In contrast, student B’s language ability was lower than the received IELTS band when she joined the education program. Hence, that low language ability was shown in her writing of reflective journals through less coherence, and less grammatical and lexical cohesion. Perhaps the reason for having an IELTS band that is higher than the student’s actual language skills could be due to the student being trained to answer the questions and practiced in how to deal with IELTS test types (Hughes, 2003; Mahlberg, 2006; Panahi & Mohammaditabar, 2015). However, writing reflective journals proved that free writing strategies can show students’ actual abilities in understanding and presenting writing aspects. Writing reflective journals can also affect written genre comprehension and the ability to present the purpose of the text, these abilities can also be the result of having a wide range of grammatical resources and vocabulary (Cameron, 2001; Leech et al., 2001). The second result identified is that IELTS bands can still indicate a change in some students’ language performance, even if that change is considered to be a slight difference from the actual English language ability (Gitsaki et al., 2014; Raven, 2011). The representation of the IELTS band can be shown either as the student’s recent performance or a development in English language skills. The findings showed that student C’s writing performance matched the IELTS band 5.5 and student B’s language development; however it did not match student A’s performance (Brown, 2007; Spencer, 2009). Perhaps this is because when student C first joined the university’s education programme, her language level fell within the range of an IELTS band, which was indicated by her

An investigation 127

understanding not only of the IELTS test’s writing genres, but also of reflections (Asador et al., 2016; Wilson, 2016). However, with student B’s case, and due to the development of language that occurred at the end of her education years, the IELTS band described the language learning level, which matches the analysis of the reflections done in the previous section. Therefore, a further result can be arrived at: The change in student B’s level was due to taking preparation courses to improve her language, which was stated in the interviews questions (Hughes, 2003; Panahi & Mohammaditabar, 2015). This led to the student raising her understanding of writing reflections, learning how to represent a tied, coherent genre through using a text register, developing her use of punctuation, and providing a line of connected thoughts using the grammatical cohesion (Bawarshi & Reiff, 2010; Harmer, 2004; Lukin et al., 2011). In reflective writing, the student was able to present opinions on her own teaching, think critically about issues and identify solutions, as well as judge the classroom experience (McGuire et al., 2009). Student C shared the same opinion about taking the IELTS band and preparation courses, and showed similarity between the evaluation of her reflections and the match with her second score.

Limitations The study has limitations due to the size of the focus group, as only three students were selected for the research. Therefore, the results cannot represent all English language learners taking the IELTS test, and cannot represent all cases of students writing reflections in higher education. Also, the research only involved female students, and therefore cannot be generalized over male learners. Another limitation is that, due to the time at which the research was conducted, it was difficult to gain the tools easily from the students, as some had already graduated and other had lost some documents. For example, students A didn’t have the first IELTS band, and students C had not yet reached level eight and therefore hadn’t completed the eighth reflection. The number of students taking the preparation courses was considered low, which is not enough to measure their effectiveness with the consideration of other students’ attitudes toward the course.

Recommendations Two main types of recommendations are stated to enhance this study; one is related to the improvement of writing reflective journals, and the other is related to further research to be conducted related to this study. Based on the findings, it is recommended that language learning must not be separated from college assignments, especially if that requires the use of language to write a report, reflection, or essay. Another aspect is that EFL learners require further consolidation of language even if they are at a college level, and more emphasis

128 Fatema Al Awadi

on language assessment needs to be presented through assessments other than the IELTS test. In addition, providing additional sessions to improve language skills is highly recommended to add further to students’ performances. Regarding further research to be conducted in a similar field, the study can be extended to compare the impact of IELTS performance on the new testing system implemented in UAE higher education, which is called EmSAT (Emirates Standard Test). Another extension of the study could consider the correlation between the EmSAT test and English as foreign language learners’ performance. The use of discourse analysis is recommended for this research, along with using a wider range of participants. This will help to identify the improvement of performance in writing reflections, and to consider the impact of college courses on students writing development. The study can be also extended to measure the development of language skills other than writing for novice teachers until their graduation.

Conclusion This study was implemented to explore the correlation of IELTS band and preparation courses to students’ writing of reflective journals in one of the UAE higher education institutions. The research found that the IELTS band cannot be understood as a final judgment, as there are different factors affecting it. Students’ level in English, previous language knowledge, study input, and preparation courses can all affect the IELTS test as a tool for language testing (Ata, 2015; Quirke & Zagallo, 2009). Yet, the study revealed that preparation courses had an impact on reflective writing as they added further input and consolidation of the language for two participants. The students’ language and writing showed improvement as the students progressed through the levels. They built further confidence towards writing reflective journals, which was observed through their reflection samples (Bamberg, 2011; Hughes, 2003). Their ability to critically identify issues, anticipate solutions, critically judge the experience, and provide opinions on and recommendations for further enhancement were effectively constructed (Hyland, 2007). That change was notable at the end of the study levels and after the second IELTS band test, which could be due to their education studies or preparation courses. The students’ development in English writing and other skills was identified through the students’ responses to questions in the interviews. The IELTS reports, and materials from the preparation courses provided a clearer idea of the amount of change in the language. The students who took the preparation courses were provided with opportunities to practice and improve their writing and language skills while preparing for the IELTS test. Lessons in the preparation courses were designed to cover broader skills of writing and text organization than just those required for IELTS writing tests (Panahi & Mohammaditabar, 2015; Slavin, 2014; Wilson, 2016). However, the student

An investigation 129

who didn’t take the course also developed her language abilities, which can be related to the input given through her college study. In spite of this, the IELTS score was considered highly important simply because it is a requirement; as the students stated in the interviews, there were other factors leading to the change in their language and writing of reflections. The amount of knowledge gained through education courses and the assessment of reflections also helped to shape the students’ understanding of reflective journals. The students learned to narrow their focus to narrating the practicum experience and they identified strengths and weaknesses (Spencer, 2009). The findings also show that the course criteria for language assessment may not match the ones for IELTS, which presents another factor to support that IELTS requirements may differ from academic writing at university. To conclude, students’ second language learning performance and ability cannot be judged by only a language test such as the IELTS. The students face challenges in learning language and understanding different genres, which they can overcome through additional practice and input given by instructors.

References Asador, M., Marandi, S., Vaezi, S., & Desmet, P. (2016). Podcasting in a virtual English for academic purposes course: learner motivation. Interactive Learning Environments, 24(4), 875–896. Ata, A. W. (2015). Knowledge, education, and attitudes of international students to IELTS: a case of Australia. Journal of International Students, 5(4), 488–500. Bamberg, M. (2011). Narrative discourse. Narratology beyond literary criticism: Mediality, disciplinarity. In J. C. Meister, T. Kindt, & W. Schernus (Eds.), (pp. 213–237). Walter de Grutyer. Bawarshi, A., & Reiff, M. (2010). Genre: An introduction to history, theory, research and pedagogy. Parlor Press. Berk, L. (2009). Child development. Pearson. Brewster, J., Ellis, G., & Girard, D. (2002). The primary English teacher’s guide. Penguin English. Brown, P. (2007). Reflective teaching, reflective learning. MA. Thesis. University of Birmingham. Buri, C. (2012). Determinants in the choice of comprehensible input in science classes. Journal of International Education Research, 8(1), 1–18. Burton, J. (2009). Reflective writing-getting to the heart of teaching and learning. In J. Burton, P. Quirke, C. Reichmann, & J. Peyton (Eds.), Reflective writing: a way to lifelong teacher learning (pp. 1–11). TESL-EJ Publications. Cameron, L. (2001). Teaching languages to young learners. Cambridge University Press. Creswell, J. (2012). Research design: qualitative, quantitative, and mixed methods approaches. Sage Publications. Derewianka, B. (2000). Exploring how texts work. Primary English Teaching Association. El Massah, S., & Fadly, D. (2017). Predictors of academic performance for finance students: Women at higher education in the UAE. The International Journal of Educational Management, 31(7), 854–864.

130 Fatema Al Awadi Emmitt, M., Pollock, J., & Komesaroff, R. (2003). Language and learning: An introduction for teaching. Oxford University Press. Freeman, D., & Freeman, Y. (2001). Between worlds: Access to second language acquisition. Heinemann. Freimuth, H. (2014). Cultural bias in university entrance examinations in the UAE. The Emirates Occasional Papers, 85, 1–81. Gitsaki, C., A. Robby, M., & Bourini, A. (2014). Preparing Emirati students to meet the English language requirements for higher education: a pilot study. Education, Business and Society: Contemporary Middle Eastern Issues, 7(3), 167–184. Harmer, J. (2004). How to teach writing. Pearson. Harmer, J. (2001). The practice of English language teaching. Longman. Hyland, K. (2007). Genre pedagogy: Language, literacy and L2 writing instruction. Journal of Second Language Writing, 16, 147–164. Hughes, A. (2003). Testing for language teachers. Cambridge University Press. Johnson, K. (1999). Understanding language teaching: Reasoning in action. Heinle ELT. Johnston, N., Partridge, H., & Hughes, H. (2014). Understanding the information literacy experiences of EFL (English as a foreign language) students. Reference Services Review, 43(4), 552–568. Krekeler, C. (2013). Languages for specific academic purposes or languages for general academic purposes? A critical reappraisal of a key issue for language provision in higher education. Language Learning in Higher Education, 3(1), 43–60. Kyriacou, C. (2007). Essential teaching skills. Nelson Thornes. Leech, G., Cruickshank, B., & Ivanic, R. (2001). An A-Z of English grammar & usage. Pearson. Lightbown, P., & Spada, N. (2013). How languages are learned. Oxford University Press. Lukin, A., Moore, A., Herke, M., Wegener, R., & Wu, C. (2011). Halliday’s model of register revisited and explored. Linguistics and the Human Sciences, 4(2), 187–213. Mahlberg, M. (2006). Lexical cohesion: corpus linguistic theory and its application in English language teaching. International Journal of Corpus Linguistics, 11(3), 363–383. McCarthy, M., & Carter, R. (1994). Language as discourse perspectives for language teaching. Longman Group. McGuire, L., Lay, K., & Peters, J. (2009). Pedagogy of reflective writing in professional education. Journal of the Scholarship of Teaching and Learning, 9(1), 93–107. Mills, J. (2014). Action research: a guide for the teacher researcher. Pearson. Moore, T., & Morton, J. (2005). Dimensions of difference: A comparison of university writing and IELTS writing. Journal of English for Academic Purposes, 4(1), 43–66. Panahi, R., & Mohammaditabar, M. (2015). The strengths and weaknesses of Iranian IETLS candidates in academic writing task 2. Theory and Practice in Language Studies, 5(5), 957–967. Qin, W., & Uccelli, P. (2016). Same language, different functions: a cross-genre analysis of Chinese EFL learners’ writing performance. Journal of Second Language Writing, 33, 2–17. Quirke, P., & Zagallo, E. (2009). Moving towards truly reflective writing. In J. Burton, P. Quirke, C. Reichmann, & J. Peyton (Eds.), Reflective writing: a way to lifelong teacher learning (pp. 12–30). TESL-EJ Publications. Raven, J. (2011). Emiratizing the education sector in the UAE: contextualization and challenges. Education, Business and Society: Contemporary Middle Eastern Issues, 4(2), 134–141.

An investigation 131 Slavin, R. (2014). Education psychology theory and practice. Pearson. Spencer, S. (2009). The language teacher as language learner. In J. Burton, P. Quirke, C. Reichmann, & J. Peyton (Eds.), Reflective writing: A way to lifelong teacher learning (pp. 31–48). TESL-EJ Publications. Wilson, K. (2016). Critical reading, critical thinking: Delicate scaffolding in English for academic purposes. Thinking skills and creativity, 22, 256–265. Wingate, U., & Tribble, C. (2012). The best of both worlds? Towards an English for academic purposes/academic literacies writing pedagogy. Studies in Higher Education, 37(4), 481–495. Zeichner, K., & Liston, D. (1996). Reflective teaching: an introduction. Lawrence Erlbaum Associates.

Part 3

Teachers’ language assessment literacy

Chapter 8

Language assessment literacy of novice EFL teachers Perceptions, experiences, and training1 Aylin Sevimel-Sahin2

Introduction Language assessment is an important element of language education because it feeds language teaching and can be used for different purposes. On the one hand, assessment gives feedback about the effectiveness of instruction, the progress of language learners in the target language, whether the expected learning outcomes are achieved, what students learn or do not learn, and whether the syllabus, the teaching method, and the materials are useful for the ongoing language learning (Bachmann, 2005; Brennan, 2015; Hidri, 2018). On the other hand, assessment helps teachers to make results-dependent decisions in order to improve their instruction, to foster a better language learning context, as well as to evaluate themselves in respect to their teaching (Rea-Dickins, 2004). Moreover, language assessment can motivate both teachers and students – for teachers, they can find out how effective their teaching is, and for students, they can detect their strengths and weaknesses regarding their language development (Heaton, 2011). As a consequence, they can be motivated to enhance their teaching/learning more. Language teachers need to be competent in the target language they teach, to know what and how to teach, and to know how to assess. Among these three qualifications, language assessment can be considered an essential part of professional competence because it is a teacher’s guide to planning their teaching to support learning, and to regulating their pedagogical decisions. Accordingly, it can be deduced that language teachers have two roles – a teacher and an assessor (Scarino, 2013; Wach, 2012). In order to carry out efficient and suitable assessment practices, language teachers need to be assessment literate. Language assessment literacy (LAL) means possessing the required knowledge of assessment and the ability to perform assessment practices. If teachers are assessment literate, then they can respond to the needs of their educational context more effectively. Therefore, LAL is a fundamental aspect of their teaching competence. However, to be able to assess effectively and take advantage of assessment results appropriately are not easy as they seem. Since language teachers are not born with the required competence or ability related to assessment (Jin, 2010), they need to be trained and equipped with the

136 Aylin Sevimel-Sahin

necessary knowledge and skills in this respect to reinforce language learning/ teaching. Then, they become able to conduct effective and proper language assessment procedures, which indicates they are assessment literate.

Review of literature Recently, there has been much more emphasis on language assessment due to changing notions and approaches in language education. Earlier, the concepts of teaching and assessment were considered separate, and assessment was performed independently of teaching, especially, in the form of testing (Viengsang, 2016). But as a result of sociocultural theories of learning which have also influenced the domain of language assessment, the testing notion has been transformed into the concept of assessment (Berger, 2012; Hidri, 2016; Inbar-Lourie, 2017; O’Loughlin, 2006), which incorporates not only measuring language proficiency of learners as in testing, but also monitoring and improving their progress in the target language (Csépes, 2014). Hence, the importance of language assessment to motivate and reinforce language learning by tracking the development of learners has been realized. That is, tests or assessment tools can be used to improve learning apart from just measuring or testing language knowledge (Heaton, 2011). So, assessment for learning has become more important than assessment of learning. In line with these considerations, researchers have focused more on how teachers can be effective assessors according to their teaching contexts in order to support language learning, what kind of characteristics they need to have for this, and how they can better administer assessment. For all these issues, researchers have recently explored what it means to be language assessment literate, and whether teachers have acquired LAL necessities and have been able to practice them in their classes successfully.

Conceptual framework The concept of LAL originated from the term assessment literacy (AL) introduced by Stiggins (1995) in general education. Yet, the concept of AL remained rather general. Thus, researchers became concerned with the specific competencies of teaching subject areas that may require different kinds of assessment. For foreign language teaching/learning environments, they tried to describe what LAL refers to and what it constitutes. Also, since teachers are the main stakeholder in foreign language education (Giraldo, 2018), the definitions of LAL are mostly based on the assessment literacy of classroom teachers, which is also called ‘language teacher assessment literacy’. LAL refers to the competence of knowing what, when, and how to use assessment to gather information about language learners’ development as well as using the results to enhance the quality of instruction (Jeong, 2013). Nonetheless, Fulcher (2012) presented a comprehensive definition of LAL:

LAL of novice EFL teachers 137

The knowledge, skills and abilities required to design, develop, maintain or evaluate, large-scale standardized and/or classroom-based tests, familiarity with test processes, and awareness of principles and concepts that guide and underpin practice, including ethics and codes of practice. The ability to place knowledge, skills, processes, principles, and concepts within wider historical, social, political, and philosophical frameworks in order to understand why practices have arisen as they have, and to evaluate the role and impact of testing on society, institutions, and individuals. (p. 125) Considering this definition, LAL does not only mean the knowledge and skills of assessment but also it deals with reasoning, impact, frameworks, and contextual and ethical issues related to language assessment. Similarly, Giraldo Aristizabal (2018) indicated language teachers need to be competent both in classroom assessment and large-scale testing to make their students develop in the target language. Hence, Fulcher’s definition of LAL can be considered quite extensive in terms of its meaning. However, it should be noted that the term foreign language assessment literacy (FLAL) is preferred in the current study in order to emphasize foreign language learning context; that is, the English as a foreign language (EFL) context. As for the constituents of LAL, researchers have reported their own frameworks corresponding to their own LAL perspectives. For instance, Davies (2008) asserted LAL is made up of ‘knowledge’ (defining language measurement and framework), ‘skills’ (designing, administering, and analyzing) and ‘principles’ (using properly and ethically). In the same vein, Inbar-Lourie (2008) demonstrated the components of LAL as ‘what’ is to be measured as language construct, ‘how’ assessment is carried out, and ‘why’ a certain assessment practice is conducted. Fulcher (2012) also suggested a three-layered model of LAL in which there are ‘practices’ (constructing, applying, evaluating), ‘principles’ (fundamentals, ethics, concerns) and ‘contexts’ (origins, frameworks, impact). In addition, Hill (2017) focused on the elements of classroom teacher language assessment literacy and argued that LAL consists of ‘practice’ (knowledge and skills), ‘concepts’ (own understanding and beliefs), and ‘context’ (impact of teaching environment). Likewise, Inbar-Lourie (2013) and Stabler-Havener (2018) emphasized the importance of teaching contexts with respect to LAL components and they pointed out that knowledge and skills of assessment should be employed suitable to a teacher’s own context. Furthermore, Giraldo Aristizabal (2018) and Hidri (2016) highlighted that beliefs, conceptions, and previous experiences about language assessment affect the way how LAL is constructed. As a conclusion, the proposed constituents of LAL in the literature have a lot in common such as knowledge, skills, practices, concepts, contexts, and principles. When it comes to the characteristics of language assessment literate teachers, several researchers found similar dispositions (Gotch & French, 2014; Huang & He, 2016; Khajideh & Amir, 2015; Rogier, 2014; Ukrayinska, 2018). For

138 Aylin Sevimel-Sahin

instance, language assessment literate teachers are able to distinguish between sound and unsound assessment. They are also able to determine aims of assessment and use assessment ethically. Besides, they are able to know how to select, design, administer, and analyze assessment, and as a consequence, they know how to interpret and integrate the results of assessment for student learning. Moreover, they are able to relate assessment to their instruction, approaches, or techniques. After all, language teachers need to have such qualifications to be assessment literate because LAL is useful to plan, regulate, and reinforce teaching, and to support and motivate student learning (Howerton, 2016; Newfields, 2006; White, 2009). In conclusion, the concept of LAL is essential for every language teacher to be beneficial to language learning, and the knowledge, skills, and characteristics of LAL are mostly achieved by the help of teacher education programs (DeLuca & Klinger, 2010; Lam, 2014). So, training programs are expected to educate their teacher candidates to be assessment literate for their future career.

Relevant research studies Over the last decade, there have been significant attempts to describe and characterize the concept of LAL due to the greater emphasis on the role of teachers as assessors (Csépes, 2014). However, research into this concept is acknowledged to be new in foreign language education settings (Fulcher, 2012). The research studies have mostly focused on the effects of demographic variables, beliefs, and practices of EFL teachers on their language assessment literacy as well as the impact of workshops designed as in-service training for LAL. To start with, the studies about the perceptions of in-service EFL teachers about what they think of language assessment revealed that most teachers tended to consider assessment or evaluation something equal to testing or measurement (Berry et al., 2017; Duboc, 2009). That is, in-service EFL teachers thought that assessment is only used for the purpose of measuring the language knowledge of learners, and thus, they did not know how to take advantage of testing for other purposes, such as to improve their teaching and learners’ language development (Klinger, 2016; Tsagari, 2013). Some studies have also shown most EFL teachers were not familiar with the concept of being language assessment literate (Berry et al., 2017; Semiz & Odabas, 2016). In addition, the research into what extent in-service EFL teachers feel ready to assess yielded that most teachers did not feel ready to undertake effective assessment procedures, which has indicated lower levels of teacher LAL (Buyukkarci, 2016; Fard & Tabatabei, 2018; Tsagari & Vogt, 2017; Vogt & Tsagari, 2014). Similarly, some EFL teachers were also found to be familiar with basic knowledge assessment but nothing more (Semiz & Odabas, 2016; Turk, 2018). Besides, Hudaya (2017) presented that most of the participant EFL teachers felt prepared to assess, but stated their difficulty in giving

LAL of novice EFL teachers 139

feedback. Likewise, Ukrayinska (2018) demonstrated that most of the EFL teachers had some assessment challenges in their classrooms even after they took their pre-service training for LAL. With respect to whether teaching experience has an effect in LAL, some studies reached the conclusion that more experienced teachers had less difficulty in their assessing skills while less experienced ones encountered significantly more challenges in their assessment activities (Hakim, 2015; Oz, 2014; Turk, 2018; Yan et al., 2018; Yastibas & Takkac, 2018). It is because more experienced teachers were more aware of assessment purposes and tools, and that experience then affected their preferences and uses of assessment. Also, some studies have established that teaching experience did not create any differentiating point for EFL teachers in terms of their language assessment procedures because most teachers had a great deal of knowledge in how to assess language learners, but most of them used similar assessment techniques such as traditional testing tools in the form of summative assessment (Buyukkarci, 2016; Jannati, 2015; Onalan & Karagul, 2018; Oz & Atay, 2017; Sahinkarakas, 2012). Moreover, several research studies about the relationship between assessment beliefs and practices confirmed there was a mismatch between what EFL teachers believed and what they practiced in their classrooms regarding assessment. For example, most EFL teachers believed in, and held positive attitudes towards, formative assessment; nonetheless, they did not prefer it. Instead, they preferred to use summative assessment techniques to measure only the product of language learning (Buyukkarci, 2014; Karagul et al., 2017; Klinger, 2016; Munoz et al., 2012; Oz & Atay, 2017; Tsagari & Vogt, 2017). Also, most EFL teachers were found to test grammar and vocabulary knowledge as a sign of language development and without much assessment of other language skills (Duboc, 2009; Mede & Atay, 2017; Semiz & Odabas, 2016; Tsagari, 2013; Tsagari & Vogt, 2017; Wach, 2012). In the same vein, most teachers utilized traditional testing tools as written pen-and-paper tests, multiple-choice, fill in the blanks, and similar techniques to find out how much students learned and what sort of knowledge was not acquired in terms of grammar and vocabulary (Buyukkarci, 2014; Duboc, 2009; Oz, 2014; Semiz & Odabas, 2016; Tsagari, 2013; Tsagari & Vogt, 2017). Yet, some studies concluded most EFL teachers wanted to implement formative assessment in their classes but their government policy about education did not allow such assessment owing to the fact that the policy adopted a high-stakes examination system, which caused them to employ exam-oriented approaches (Saka, 2016; Yan et al., 2018). Regarding the relationship between the knowledge and the practice of language assessment, some studies revealed most EFL teachers had adequate knowledge; nevertheless, they were not able to put their knowledge into practice (Giraldo Aristizabal, 2018; Hakim, 2015; Jannati, 2015; Tsagari & Vogt, 2017). Thereupon, some studies investigated the effect of LAL training on their knowledge and practices of assessment. On the one hand, some studies showed EFL teachers had not had any LAL training, so they did not know how

140 Aylin Sevimel-Sahin

to assess (Berry et al., 2017; Vogt & Tsagari, 2014). On the other hand, LAL training given through teacher education programs was found to be fruitful. For instance, Hilden and Frojdendahl (2018) demonstrated EFL teachers gained certain assessment abilities and developed more learner-centred conceptions for assessment by means of their teacher education training programs. In spite of the benefits of training, especially with respect to knowledge base, some other studies focused on the points missed by such training, because EFL teachers came across difficulties in assessment despite their training, and as a result, they found teacher education programs could not develop the LAL of their future teachers as expected (Djoub, 2017; Fulcher, 2012; Gebril, 2017; Klinger, 2016; Lam, 2014). For example, Turk (2018) indicated pre-service training related to language assessment knowledge was good, but teachers were not able to take effective assessment procedures. Also, Hatipoglu (2015) and Lam (2014) reported such training was mostly based on testing-oriented issues while highlighting the importance of testing in language learning/teaching. Considering the points of lack in such training, most studies determined that LAL training could not provide EFL teachers with practical assessment skills (Hatipoglu, 2010; Lam, 2014; Sariyildiz, 2018; Sheehan & Munro, 2017; Yan et al., 2018) and hence, EFL teachers could not perform appropriate assessment strategies in their classes. In addition, Mede and Atay (2017) noted that EFL teachers had difficulties in designing assessment and giving proper feedback due to the lack of training in those points. Semiz and Odabas (2016) also revealed training should include testing language skills apart from grammar and vocabulary. Likewise, Mede and Atay (2017), and Turk (2018) suggested EFL teachers need training, especially in formative assessment, because the training they had at university did not improve their assessment for learning strategies. Moreover, Tsagari and Vogt (2017), and Vogt and Tsagari (2014) pointed out that as EFL teachers criticized, LAL training did not emphasize the local conditions with respect to language assessment, and therefore, they had difficulties in responding to the assessment needs of their local contexts. It can be seen that there have been notable attempts to investigate concept of LAL in the literature. Yet, few of them have investigated how LAL is viewed from the eyes of EFL teachers, especially the beginner ones, and what LAL training contains, and whether it is beneficial to teachers. This issue is also present in Turkey. Therefore, there is a need to understand how LAL is perceived and experienced by in-service EFL teachers, especially by novice ones, because they have just started in their teaching profession and their language assessment knowledge and skills can be regarded as being updated and fresh. The current study is designed to investigate what Turkish EFL teachers know about the concepts of ELTE (English Language Testing and Evaluation) and FLAL (Foreign Language Assessment Literacy), whether their training is sufficient for their recent assessment practices, and how they put what they learn into practice. In this way, the present study is believed to provide some insight into the perceptions and experiences of novice EFL teachers with respect to

LAL of novice EFL teachers 141

their LAL, to reflect on the assessment training of novice EFL teachers, and to contribute to the field of LAL research due to the limited numbers of similar studies, especially with novice teachers. For these purposes, the following research questions were addressed: 1 2 3

What do novice EFL teachers think about the concepts of ELTE and FLAL? How do novice EFL teachers evaluate their LAL training taken at university? How does novice EFL teachers’ LAL training affect their practices of language assessment in their classes?

Methodology The present study was designed as a phenomenological study, which is one of the qualitative research designs. The main goal of phenomenology is to obtain a deeper understanding of the meanings of lived experiences through describing peoples’ multiple perceptions to them by means of finding common shared experiences of a concept or a phenomenon (Creswell, 2007; Fraenkel et al., 2011; Patton, 2002). Since the purpose of this study is to reveal the perceptions and experiences of novice EFL teachers about the LAL concept, phenomenology was the preferred approach to investigate, describe, and interpret the LAL commonalities among novice EFL teachers to get a better understanding of LAL. Research context and participants In Turkey, there is a four-year training in teacher education programs at university for those who wants to be an English language teacher. During this training, undergraduate students take several courses about English language teaching (ELT), and, in their last two semesters, attend a teaching practicum. After training is completed successfully, most of the graduates are appointed by the Ministry of Education (MoNE) to the state schools of Turkey across different age groups and language levels each year. In terms of LAL training, students take an ELTE course at their last semester before graduation. This course consists of the topics such as approaches and principles of language testing, different kinds of testing tools, and how to design and evaluate tests/tasks for each language skill and area, all of which are about classroom assessment suitable for different language levels and age groups. As a coursebook, Heaton’s (2011) Writing English language tests is employed and the course lecturers try to supplement this course by adding other resources as well. It should be noted teacher candidates are not evaluated for their testing and assessment skills in the practicum; only their teaching skills are evaluated. This ELTE course is assumed to provide a course-based LAL training for the future EFL teachers in Turkey.

142 Aylin Sevimel-Sahin

As for the participants, 22 novice EFL teachers working at state schools teaching different language levels across Turkey took part in the study. They graduated from the same university and the same program (ELT) within five years. Since this study was conducted in the Spring Semester 2018, the graduates of 2013, 2014, 2015, 2016, and 2017 participated in the research. All of them had taken the ELTE course by the same course lecturers with the same syllabus and the coursebook mentioned above. They were novice in-service EFL teachers because their teaching experience ranged from six months to a maximum of five years. Therefore, the sampling of the participants was based on purposeful sampling, and all the participants took part in the study voluntarily (see Table 8.1). Data collection procedure and analysis The present study was carried out in the Spring Semester of 2018 and the following steps were taken to collect and analyze the data. First, to collect the data, open-ended questions were constructed by the researcher and were evaluated by three field experts for reliability and validity issues. For this phenomenological study, open-ended questions were preferred as the data collection tool instead of interviews in order to reach the sample easily and broadly across Turkey. There were eight questions with sub-questions about novice EFL teachers’ understanding of ELTE and FLAL, the ELTE course content as related to their LAL training, their current assessment practices in their schools, and background information to identify the sample characteristics. There was also a consent form to show their voluntariness in the study. All the open-ended questions with the consent forms were inserted into Google Forms. Second, an invitation email to take part in the study was sent to alumni who had graduated from the same university within five years and who were working as EFL teachers. 37 novice EFL teachers responded to the questions in Google Forms, but only 22 of the responses were valid due to some missing responses. Table 8.1 The profile of novice EFL teachers (numbers) School grade (all state) primary

elementary

high school

total

teaching experience

0 1 2 1 1 5

2 3 1 1 0 7

1 3 2 2 2 10

3 7 5 4 3 22

6 months 1.5 years 2.5 years 3.5 years 4.5 years

Graduation year 2017 2016 2015 2014 2013 total

LAL of novice EFL teachers 143

Third, interpretive phenomenological analysis (IPA) was conducted to analyze the data. In IPA, researchers try to get the clusters of meanings to identify the common thoughts and experiences about a phenomenon by analyzing the extracts of each participant; these clusters are then categorized thematically (Creswell, 2007; Fraenkel et al., 2011; Smith et al., 2009). In other words, IPA is made up of the methods used in thematic and content analysis, which are performed inductively (Smith et al., 2009). Therefore, all the responses were subjected to an initial coding to get a general vision of the data and to assign a few themes by the researcher. Then, the data were analyzed in detail by the help of NVivo 11 Pro program: The data were coded thematically, the number of data chunks were calculated, and the organization of the codes and themes were done. Afterwards, to establish the interrater reliability, two experts in the field of English language assessment, and who have taught a university ELTE course for fifteen years, examined the consistency between the codes and the themes. According to their feedback, some modifications were performed, and the last version of the data analysis was generated (see Figure 8.1). Intercoder agreement Nodes Name −

Files

FLAL-Eng (Alumni) −

Experiences about ELTE (past & present) −

ELTE course-related experiences (previous experi −

General benefits of ELTE course +

+

+ −





References 3

367

2

252

2

198

2

98

Acquired knowledge & skills (content lea

2

37

Preparing for future

2

Responding to needs

3 19

Showing own FLAL

2

3

Sufficiency of the course content

2

36

2

100

+

Shortfalls of ELTE course Shortcomings

2

12

+

Suggestions

2

88

2

54

Practices of using ELTE course in career life (curr +

Able to use ELTE info in class (how)

2

27



Difficulties in using

2

27

+

Difficulties as external factors

2

9

+

Difficulties as internal factors

2

18

2

115 44

FLAL-ELTE Concept +

Definition of ELTE

1

+

Importance of ELTE

1

27



Perceptions of own FLAL

1

44

+

Perceived competency level

1

22

+

Teaching experience

1

22

Figure 8.1 The view of the themes and subthemes in NVivo 11 Pro Program

144 Aylin Sevimel-Sahin

was also calculated through Miles and Huberman’s (1994) suggested procedure and the agreement was found high between the raters (97%). Finally, in accordance with the determined themes, the findings were interpreted and discussed narratively by giving some examples from participants quotes.

Findings The novice EFL teachers conveyed their thoughts and experiences about English language assessment in terms of both their ELTE training at university and their ELTE applications in their current classrooms. The analysis of their responses yielded two themes: ‘FLAL-ELTE Concept’ and ‘Experiences about ELTE (past & present)’ (see Figure 8.2). The sample commented more on experiences about ELTE (past & present) (f=252) than the FLAL-ELTE concept (f=115). The first theme is about how the FLAL-ELTE concept is perceived by novice EFL teachers. For ‘FLAL-ELTE Concept’, novice EFL teachers first defined what ELTE means. They mostly concentrated on the issue of language proficiency (language skills and areas) and the aims of language learning. Most of them indicated language assessment means measuring language proficiency and determining the deficiencies in the target language. Still, only a few reported that ELTE refers to the understanding of whether course aims are achieved, the planning and evaluation of language learning progress, and reporting the findings. For example, one noted that ‘assessment is to measure the proficiency level of language learners as well as evaluating their successes and progress in that foreign language’ (Low.T.10). Apart from this, novice EFL teachers focused on the importance of testing and evaluation in English language classes. Most of them underscored the fact that ELTE is important because of its significant role in testing language proficiency, determining and evaluating language progress, showing whether the aims are achieved, and getting feedback about deficient points in that language. For instance, one of the novice EFL teachers pointed out that: In the teaching/learning context, feedback should be given in order to evaluate and improve language proficiency of learners. So, feedback comes from the assessment results of learners, and is used again to develop their language proficiency. In this way, the shortfalls of the education may be fulfilled. (High.T.6) As far as their perceptions of their own levels in FLAL – that is, how much qualified they felt in English language assessment – half of the participants perceived themselves to be highly competent (n=11) whereas only two teachers felt themselves inadequate; others felt they could deal with assessment at the moderate level (n=7). Nonetheless, only one novice EFL teacher, who felt highly competent in terms of FLAL, mentioned that teachers should have the

LAL of novice EFL teachers 145

Figure 8.2 The whole findings of novice EFL teachers

146 Aylin Sevimel-Sahin

knowledge and skills of ELTE itself (i.e. knowing how to construct exams, how to interpret results, and how to report them). In terms of FLAL components, one underlined that: Language assessment means to have knowledge about the concepts and methods related to the testing of language skills, to examine test preparation processes related to language proficiency and development, to prepare a language test for a specific group of students, and to plan, implement and evaluate the results via item analysis. And also, it means to report assessment results in written format and present them verbally. (High.T.17) The second theme revealed by the findings of novice EFL teachers’ statements is ‘Experiences about ELTE (past & present)’. For ‘Experiences about ELTE (past & present)’, they highlighted their ‘previous experiences about ELTE course’ (f=198), and their ‘current experiences about using’ their knowledge and skills in their professional lives (f=54). The first subcategory is about their ‘previous experiences about ELTE course’ they took in their ELT training program. First, they mentioned what they acquired after they took the ELTE course. For example, they stated they learned a lot about language assessment, the criteria for effective testing (i.e. reliability, validity), how to design several language testing tools to test students’ language skills, and the kinds of tests or tasks that can be used to measure language proficiency. They also underscored that the course was useful to prepare teacher candidates for their future teaching career regarding language assessment, and their expectations and needs were more or less met thanks to this course. Therefore, most of them found the content of the course was sufficient and beneficial to their teaching lives. For instance, some of them exemplified that: Surely, this course [ELTE in teacher training] has helped me a lot. Simply put, it has been helpful with respect to such topics as ‘how a language test should be designed, what it should be covered, how the questions/tasks should be organized, and how the evaluation should be made. (Moderate.T.7) I think the course [ELTE course] was adequate. It provided us with necessary knowledge and skills about assessing language skills. (Moderate.T.2) On the other hand, despite being helpful, there were some aspects of the ELTE course that were criticized by all the novice EFL teachers irrespective of their perceived FLAL levels, and they, made some suggestions for further improvement. For example, the course was found to be very theoretical and lacking in the practical dimension because the respondents said they could not

LAL of novice EFL teachers 147

practice what they learned during the course until they were appointed. The novice EFL teachers believed training about language assessment should have included more practical exercises and implementations such as simulations to perform tests. Besides, few of them underlined that the class hours were inadequate, and the last semester was too late to take the course. For instance, some commented that: The content of the course [ELTE course] was satisfying in terms of theory but the practical side was not enough. (Moderate.T.20) The [ELTE] course was very theoretical. More practice would have been better. For example, for each language skill, students should prepare an assessment tool every week, and the criteria for the evaluation should be interpreted. (High.T.1) Besides, some novice EFL teachers argued that they did not have any opportunity to test language learners in the practicum; they were only evaluated in terms of their teaching skill and not testing skills. Thus, according to them, teacher candidates should be made to prepare and administer some tests in their teaching practice for the sake of gaining experience in assessment in addition to teaching. Therefore, one of them suggested that: ‘Before teacher trainees start [in the] teaching profession, they should be provided with experience [practice] in terms of assessment process by making them to prepare at least a part of a language exam during micro-macro training [practicum courses]’ (Moderate.T.5). Moreover, some novice EFL teachers complained they did not learn anything about the evaluation aspects of testing such as scoring and feedback. For this reason, new topics such as analyzing test results should be added. One participant teacher recommended that: ‘[To the ELTE course content], exercises of preparing various test items, techniques of evaluation and analysis, and statistical calculations [can be added as new topics]’ (Moderate.T.2). Furthermore, some participant teachers discussed the great degree to which the ELTE course content concentrated on testing grammar and vocabulary and how some attention was given to receptive skills testing. They also emphasized how they learned to design multiple choice tests, cloze tests, and the like. Nevertheless, the testing of productive skills (speaking and writing) was not much emphasized in the course or in other types of assessment, such as alternative assessment methods. For instance, one of them indicated that: I think I acquired the essential knowledge about assessing grammar, listening, and reading. But I think the [ELTE] course was not successful enough in terms of the topics about assessing writing and speaking. […]

148 Aylin Sevimel-Sahin

Especially, I cannot assess students’ speaking and writing skills. In my opinion, the knowledge about assessing speaking and writing skills that I gained at university is inadequate. It is because there was not much practice about that topic. (Low.T.10) In order to make the ELTE better, some novice EFL teachers suggested in-service training as a way to cover some of the missing parts of the course, which would help to eliminate their inabilities related to language assessment. For instance, they might learn from other teachers’ experiences by sharing their testing techniques and discussing their assessment ideas. Also, they assumed in-service training might be helpful to remind them of previous ELTE knowledge and practices they learned as they complained they forgot certain things about assessment throughout years. For that, one of them stated that: I think in-service training [about ELTE] is necessary because in-service teachers prepare language exams without paying attention to some assessment criteria after a while. Therefore, constantly, in-service seminars should be provided suitable to the changing needs of the system and what is learned at university should be kept fresh all the time. (Moderate.T.12) With respect to in-service training, some also reported that not all teachers had language assessment training at their universities and they criticized that even those who took ELTE training, might still be incompetent in language assessment because not all universities took care of the course at the expected level and to the expected extent. For instance, one of them argued that: In-service training [about ELTE] should be provided. I think each university does not put emphasis on this topic in the same way. I have come to this point on the basis of the materials that are prepared by my colleagues. (High.T.19) The second subcategory is about their ‘current experiences in their career life’; that is, how they used and practiced what they learned at the end of their LAL training. For this topic, most of the novice EFL teachers gave specific examples how they used testing knowledge to assess their students’ language proficiency. Regardless of their perceived LAL levels, they mostly discussed the test development criteria and techniques to measure students’ proficiency rather than how to monitor students’ progress in the target language during the teaching process. One of them stated that:

LAL of novice EFL teachers 149

It did work [ELTE course was helpful at work]. For example, I pay attention to the difficulty level of test items while preparing exams. I avoid giving clues about correct answer in true-false test items. I try to be careful about the clarity of questions. I prepared each test item which measures only one skill. I pay attention to the length of the gaps in the fill-in-the-blanks test items. (High.T.8) Likewise, one of the novice EFL teachers, who felt highly competent, focused on their ability to prepare language tests according to students’ levels and course aims in contrast to their colleagues who used available exams regardless of such points. That participant reported that: Considering my first experience [about language assessment], I can say that I have not had any difficulty in testing language [at my school]. I am able to construct my own assessment criteria. Although the other teachers in the school [colleagues] prefer ready-made assessment tools, I created my own tests. Since my tests are more appropriate to my students as well as the learning outcomes, I have achieved more success. (High.T.4) On the other hand, although they stated they could apply their testing knowledge, and did well in designing own language tests, they mentioned they still had some challenges. For example, they stated their students had lower levels of English language proficiency even though their age and education levels were high; and thus, they could not apply high-level tests to their students. In other words, they could not respond to the needs of their local contexts as the external factor. This is because there was nothing about Turkish contextual issues related to language assessment in their LAL training; they just studied imaginary situations and hence, the course was considered somewhat idealized by the participants. One of the novice EFL teachers drew attention to the Turkish examination system, which is based on high-stakes testing in the form of multiple-choice questions, and how such exams have priority in Turkey. Therefore, they had to conduct written exams like that in their classrooms. In addition, most of the novice EFL teachers focused on the lack of practice as the internal factor, and complained about their inability to perform effective testing practices in their current classes because they were unable to adapt their theoretical knowledge to their teaching contexts. It is because, as they mentioned before, they did not have any chance to experience language testing during their LAL training at university. For that, one of them argued: ‘I have difficulties in practice and implementation. I have theoretical knowledge [about language assessment], but I have difficulty in combining such knowledge with the teaching style and testing system at state schools [in Turkey]’ (Moderate.T.20).

150 Aylin Sevimel-Sahin

To sum up, it can be concluded that most of novice EFL teachers had apparently great knowledge of ELTE thanks to the course and learned how to design language tests, but they had some difficulties, especially in practicing, due to lack of experience and practice during the course. While they defined the concept of ELTE from testing perspective and believed the importance of assessment, they mostly emphasized the notion of summative assessment both in terms of the definitions and their current uses of assessment in teaching contexts.

Discussion The present study has focused on the perceptions and experiences of novice EFL teachers as well as their LAL training. The findings yielded two themes: ‘FLAL-ELTE Concept’ and ‘Experiences about ELTE (past & present)’ that illustrate the responses to the research questions thematically. Regarding the first research question (what the sample thought about ELTE and FLAL), the responses gathered under the theme ‘FLAL-ELTE Concept’ indicated their perceptions related to these concepts. Novice EFL teachers perceived ELTE as measuring language proficiency and determining the success or deficiencies of students in the target language. Therefore, they mostly concentrated on the testing purposes of achievement and diagnosis. However, even though they appreciated the importance of ELTE in teaching for several of its benefits, only few of them were aware of other assessment purposes, such as to improve learning/teaching. So, it can be inferred that their perceptions were based on testing rather than assessment. This result was also similar to the studies which showed EFL teachers thought of assessment as only testing (Berry et al., 2017; Duboc, 2009; Klinger, 2016; Tsagari, 2013). As Giraldo Aristizabal (2018) has maintained, beliefs and previous experiences affect the perceptions of teachers, and testing perception in this study can be attributed to the content of their ELTE course, which focused on only testing issues; therefore, their perceptions were shaped according to the training they underwent. This also reflected in their practice in such a way that they used summative assessment to determine the product of learning. As Herrera and Macias (2015) have put forward, testing means summative assessment while ignoring other purposes, as in this study. As for the LAL concept itself, few of the novice EFL teachers’ responses highlighted the dimensions such as ‘knowledge’ and ‘skills’ (Davies, 2008; Hill, 2017), ‘what’ and ‘how’ (Inbar-Lourie, 2008) and ‘practices’ (Fulcher, 2012) along with some ‘contextual issues’ with respect to ‘local conditions’ (Hill, 2017; Stabler-Havener, 2018). However, when compared to the proposed components of LAL in the literature, the dimension of ‘principles’ (use of tests ethically and appropriately) (Davies, 2008; Fulcher, 2012), the element of ‘why’ (reasoning behind assessment) (Inbar-Lourie, 2008), and certain contextual issues, such as historical and philosophical frameworks that form the origins of assessment (Fulcher, 2012), were not much discussed. Therefore, it can be

LAL of novice EFL teachers 151

concluded that the novice EFL teachers were not very familiar with what being assessment literate means, which is similar to the conclusions of Berry et al.’s (2017), and Semiz and Odabas’ (2016) studies. This finding can be related to their training content which focused on only knowledge and skills, and thus, they were not introduced to LAL concept before graduation. When it comes to perceived levels of LAL, most of the participant novice EFL teachers felt they were good at assessment, which indicates they had higher perceived levels of LAL. This finding is different from some studies which found lower levels of LAL combined with teachers not feeling prepared to test (Buyukkarci, 2016; Fard & Tabatabei, 2018; Tsagari & Vogt, 2017; Vogt & Tsagari, 2014). This point of difference can be the result of their belief that their knowledge of language assessment was good and hence, they felt confident in assessment knowledge though they were confronted with difficulties in practicing their testing skills. Considering the second research question (how the sample evaluated their LAL training as the course-based training (ELTE course) at university), the responses for the subtheme ‘Previous Experiences about ELTE’ demonstrated their opinions and experiences of, and their suggestions for the course. Most of them stated they learned a lot about language testing at the end of the course, and all the knowledge they gained was useful to their testing knowledge in their teaching. Therefore, they found the course sufficient. It seems that at the end of such LAL training, novice EFL teachers became aware of language testing issues, and familiar with certain knowledge about and abilities in language testing. This finding is similar to some studies in the literature that showed ELTE course provides teachers with a basic training, which makes them familiar with language testing issues and procedures (Hilden & Frojdendahl, 2018; Semiz & Odabas, 2016; Turk, 2018). In contrast to the study of Gebril (2017), which revealed the training was ineffective in developing LAL of its attenders, this study reached positive impacts. Nevertheless, when the novice EFL teachers evaluated the ELTE course, they stressed testing issues, rather than assessments, such as learning how to design multiple choice tests to measure grammar and vocabulary knowledge. The did not report anything about formative assessment, assessment for learning, or alternative assessment techniques; they just reported the ways summative assessment tests what is learned in the end. Also, they noted that little emphasis on testing productive language skills was given during the course period. Thus, these findings can be associated with the content of the course itself and the book used as the main resource; they were based on language testing topics and there were not any recent topics in the syllabus. This was also illustrated by some research studies in the literature that noted that the content of training was exam-oriented, as in the current study (Hatipoglu, 2015; Lam, 2014). Therefore, as Mede and Atay (2017), and Turk (2018) have suggested, training should include the topic of formative assessment in order to equip teachers with using assessment for learning.

152 Aylin Sevimel-Sahin

Apart from these opinions, novice EFL teachers criticized the course for being very theoretical and they noted it did not improve their language testing skills; they just became familiar with how to design tests, and what kind of test items can be used for each language skill testing. Further, they could not gain experience in their teaching practice because only their teaching skills were taken into account in the practicum. Therefore, despite being good in terms of theory, the course was deprived of practice. Similarly, this finding has also been much present in the literature; LAL training lacked the issue of developing practical skills in terms of language assessment (Hatipoglu, 2015; Lam, 2014; Sariyildiz, 2018; Sheehan & Munro, 2017; Yan et al., 2018). Most of the participant novice EFL teachers argued there was nothing about analyzing, interpreting, and evaluating test scores in the course; this topic was ignored in the training phase. As Hudaya (2017), and Mede and Atay (2017) have demonstrated training should educate teachers in how to interpret scores and accordingly, how to give feedback to improve language learning. They also stressed that class hours were inadequate since there was no time for practice, and the last semester to take the course was too late. All in all, it seems the coursebased LAL training was beneficial, and the content was satisfying in terms of theory. However, though novice EFL teachers received training, they felt they still needed more training to be good assessors of their teaching contexts and to improve their practical skills, as underlined by some studies in the literature (Djoub, 2017; Fulcher, 2012; Klinger, 2016). Thus, as some novice EFL teachers highlighted, in-service training could provide the missing course content. As for the third research question (how the sample’s training affected their assessment practices in teaching), the responses for the subtheme ‘Current Experiences about ELTE’ illustrated whether the novice EFL teachers were able to use their acquired knowledge and skills in their career life. Most of novice EFL teachers, in line with the course content, focused on the testing issues: They exemplified how they paid attention to testing criteria, how they designed language tests, and how they measured what was learned by using testing techniques such as multiple choice, true/false statements, and fill-in-theblanks. Because the ELTE course content was testing-oriented, it can be inferred that the novice EFL teachers used mostly language tests and traditional testing techniques to measure the sum of learning, and thus, they were able to apply what they learned in the course to their teaching lives. In contrast to Ukrayinska’s (2018) study, which showed that teachers still had some challenges in designing language tests even after training, the participants of the present study were able to properly design their own language tests. However, similar to those studies in the literature that showed teachers employed mostly traditional testing tools rather than portfolio or other types of assessment to measure knowledge of language than language skills (Buyukkarci, 2014; Duboc, 2009; Mede & Atay, 2017; Oz, 2014; Semiz & Odabas, 2016; Tsagari, 2013; Wach, 2012), novice EFL teachers in this study also used such testing procedures. This finding can be attributed to the fact that Turkey is an

LAL of novice EFL teachers 153

exam-oriented country and the sample was most familiar with multiple-choice exams and, as in other studies in the literature (Saka, 2016; Yan et al., 2018), they were expected to use them. Thus, they preferred such testing tools in their classes due to the testing policy of Turkey, as well as what they learned in the training. In addition, though novice EFL teachers reported they had good knowledge of designing language tests, they also stated they had some difficulty in applying their tests, due to the local needs of their teaching context. Thus, as some studies indicated, LAL training should include more discussion of the local conditions in relation to language assessment to assist future teachers (Tsagari & Vogt, 2017; Vogt & Tsagari, 2014). Moreover, most of them highlighted that they lacked practice, owing to the fact they could not gain experience during the course period, and they just started to practice their language testing skills. They thus had some challenges in relation to this; they could not put their knowledge into practice as they expected. Several studies in the literature concluded that most in-service teachers had knowledge of assessment procedures, but could not practice them effectively; so too did the present study (Giraldo Aristizabal, 2018; Hakim, 2015; Jannati, 2015; Mede & Atay, 2017; Oz & Atay, 2017; Yan et al., 2018). The need for further training was stressed in much of the literature (Hatipoglu, 2010; Sariyildiz, 2018; Sheehan & Munro, 2017; Tsagari & Vogt, 2017; Wach, 2012), therefore novice EFL teachers need much more LAL training in relation to certain topics to be assessment literate. After all, though the present study revealed the ELTE course provided basic LAL training, and thus trained teachers in terms of language testing knowledge and techniques, novice EFL teachers still need much more training to be good assessors of their teaching contexts and to gain the characteristics of assessment literate teachers to respond to the needs of their students.

Conclusion The current study aimed to investigate the perceptions and experiences of novice EFL teachers about language assessment, and also, the effect of their teacher education training on their development of LAL. Overall, as LAL training, the ELTE course was found to be beneficial and helpful because of knowledge of language testing and skills of how to design language tests were acquired in the course, and accordingly, applied in professional life. Yet, the perceptions, experiences, and practices of novice EFL teachers regarding language assessment were limited to testing, rather than assessment itself. Further, the course contributed to theoretical knowledge of language testing, rather than practical knowledge. Therefore, there are still some gaps to be filled in order to make novice EFL teachers better in terms of LAL, as demonstrated by some studies in the literature (Hatipoglu, 2010; Sariyildiz, 2018; Sheehan & Munro, 2017; Tsagari & Vogt, 2017). For this reason, some pedagogical implications can be shared.

154 Aylin Sevimel-Sahin

Since the perceptions of novice EFL teachers regarding language assessment were mostly shaped in their training years, and then in their experiences in their teaching lives, course-based LAL training should be redesigned. EFL teachers need good models of language assessment training early in learning – that is, in their teacher education years – in order for it to be useful in their future teaching career, and to shape their LAL perceptions (Herrera & Macias, 2015; Volante & Fazio, 2017). For instance, the course should explicitly study the terms of testing, assessment, and evaluation, as well as forms and purposes of testing and assessment other than summative ones. The course should also discuss and provide examples of traditional testing tools. In this way, novice EFL teachers may be made more familiar with these terms, and thus potentially increase the chance that other forms of assessment to support learning will be used. In addition, other and more dimensions of LAL should be integrated into training such as reasoning, ethical use, origins, principles, washback effect, and contextual issues, which were not much mentioned by the sample. To do this, information about such issues should be given and some exercises should be performed to better understand them. Much more attention and time should be devoted to assessing language skills in addition to grammar and vocabulary. Moreover, new topics can be added to the course to improve teachers in terms of assessment, such as formative assessment, alternative assessment (i.e. portfolio, computerized testing, self-assessment, etc.), the Turkish examination system and policy, ethical concerns about assessment, analyzing, interpreting results and giving feedback, evaluation tasks, and the like. Although novice EFL teachers mentioned they learned how to design language tests, they could not practice that skill, even in their practicums. Therefore, both in the ELTE course and in the practicum, the practical side of assessment should be developed; in other words, both testing and teaching should be involved in the practicum as Viengsang (2016) has indicated. Thus, to practice assessment before starting to teach may make EFL teachers more confident and skilful in terms of language assessment. In brief, it is hoped that such recommendations will make training better for EFL teachers to enhance their assessment literacy. Teachers need to know what to test for which purpose, to be able to select, construct, and administer assessment, and to reflect on assessment results for better assessment practices (Hidri, 2018), and thereby, for effective teaching opportunities and developed language learning. All of this leads to higher levels of LAL. For further research, more empirical studies can be conducted to improve and observe practicing skills of language assessment. Comparisons between preservice and in-service teachers in terms of LAL can be made to obtain a better understanding, and the effect of teacher educators in the training process may be investigated. Other types of instruments such as observations and other types of research designs might be useful to further enlighten understanding of the concept of LAL.

LAL of novice EFL teachers 155

Notes 1 This paper is based on the doctoral dissertation titled ‘Exploring foreign language assessment literacy of pre-service English language teachers’. 2 Corresponding Author: Dr. Aylin Sevimel-Sahin, ELT Department, Anadolu University, Eskisehir, Turkey, [email protected]

References Bachman, L. F. (2005). Statistical analyses for language assessment. Cambridge University Press. Berger, A. (2012). Creating language-assessment literacy: A model for teacher education. In J. Hüttner, B. Mehlmauer-Larcher, S. Reichl, & B. Schiftner (Eds.), Theory and practice in EFL teacher education: Bridging the gap (pp. 57–82). Short Run Press Ltd. Berry, V., Sheehan, S., & Munro, S. (2017, May 3–5). Exploring teachers’ language assessment literacy: A social constructivist approach to understanding effective practices [Paper presentation]. ALTE 6th International Conference Learning and Assessment: Making the Connections, Bologna, Italy. http://eprints.hud.ac.uk/id/eprint/33342/ Brennan, M. (2015, May 21–23). Building assessment literacy with teachers and students: New challenges? [Paper presentation]. ACER EPCC Conference, Sydney, Australia. https:// www.acer.org/files/eppc15-Brennan-Building-Assessment-Literacy-with-teachers-a nd-students2.pptx+&cd=1&hl=tr&ct=clnk&gl=tr&client=firefox-b-d Buyukkarci, K. (2014). Assessment beliefs and practices of language teachers in primary education. International Journal of Instruction, 7(1), 107–120. Buyukkarci, K. (2016). Identifying the areas for English language teacher development: A study of assessment literacy. Pegem Egitim ve Ogretim Dergisi [Pegem Journal of Education and Instruction], 6(3), 333–346. Creswell, J. W. (2007). Qualitative inquiry and research design: Choosing among five approaches (2nd edition). SAGE Publications. Csépes, I. (2014). Language assessment literacy in English teacher training programmes in Hungary. In J. Hovarth, & P. Medgyes (Eds.), Studies in honour of Marianne Nikolov (pp. 399–411). Lingua Franca Csoport. Davies, A. (2008). Textbook trends in teaching language testing. Language Testing, 25(3), 327–347. DeLuca, C., & Klinger, D. A. (2010). Assessment literacy development: Identifying gaps in teacher candidates’ learning. Assessment in Education: Principles, Policy & Practice, 17 (4), 419–438. Djoub, Z. (2017). Assessment literacy: Beyond teacher practice. In R. Al-Mahrooqi, C. Coombe, F. Al-Maamari, & V. Thakur (Eds.), Revisiting EFL assessment (pp. 9–27). Springer International Publishing. Duboc, A. P. M. (2009). Language assessment and the new literacy studies. Lenguaje, 37 (1), 159–178. Fard, Z. R., & Tabatabei, O. (2018). Investigating assessment literacy of EFL teachers in Iran. Journal of Applied Linguistics and Language Research, 5(3), 91–100. Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2011). How to design and evaluate research in education (8th edition). McGraw-Hill. Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment Quarterly, 9(2), 113–132.

156 Aylin Sevimel-Sahin Gebril, A. (2017). Language teachers’ conceptions of assessment: An Egyptian perspective. Teacher Development, 21(1), 81–100. Giraldo Aristizabal, F. G. (2018). A diagnostic study on teachers’ beliefs and practices in foreign language assessment. Íkala: Revista de Lenguaje y Cultura, 23(1), 25–44. Giraldo, F. (2018). Language assessment literacy: Implications for language teachers. Profile: Issues in Teachers’ Professional Development, 20(1), 179–195. Gotch, C. M., & French, B. F. (2014). A systematic review of assessment literacy measures. Educational Measurement: Issues and Practice, 33(2), 14–18. Hakim, B. (2015). English language teachers’ ideology of ELT assessment literacy. International Journal of Education & Literacy Studies, 3(4), 42–48. Hatipoglu, C. (2010). Summative evaluation of an English language testing and evaluation course for future English language teachers in Turkey. English Language Teacher Education and Development (ELTED), 13, 40–51. Hatipoglu, C. (2015). English language testing and evaluation (ELTE) training in Turkey: expectations and needs of pre-service English language teachers. ELT Research Journal, 4(2), 111–128. Heaton, J. B. (2011). Writing English language tests (new edition). Longman Group. Herrera, L., & Macias, D. (2015). A call for language assessment literacy in the education and development of teachers of English as a foreign language. Colombian Applied Linguistics Journal, 17(2), 302–312. Hidri, S. (2016). Conceptions of assessment: Investigating what assessment means to secondary and university teachers. Arab Journal of Applied Linguistics, 1(1), 19–43. Hidri, S. (2018). Introduction: State of the art of assessing second language abilities. In S. Hidri (Ed.), Revisiting the assessment of second language abilities: From theory to practice (pp. 1–19). Springer International Publishing. Hilden, R., & Frojdendahl, B. (2018). The dawn of assessment literacy – exploring the conceptions of Finnish student teachers in foreign languages. Apples – Journal of Applied Language Studies, 12(1), 1–24. Hill, K. (2017). Understanding classroom-based assessment practices: A precondition for teacher assessment literacy. Papers in Language Testing and Assessment, 6(1), 1–17. Howerton, A. M. (2016). Elephant on a stepladder: an exploration of pre-service English teacher assessment literacy (Publication No. 10240273) [Doctoral dissertation, Northern Illinois University]. ProQuest Dissertations and Theses Global. Huang, J., & He, Z. (2016). Exploring assessment literacy. Higher Education of Social Science, 11(2), 18–27. Hudaya, D. W. (2017). Teachers’ assessment literacy in applying principles of language assessment. Proceedings of Education and Language International Conference, 1(1), 247–260. Inbar-Lourie, O. (2008). Constructing a language assessment knowledge base: A focus on language assessment courses. Language Testing, 25(3), 385–402. Inbar-Lourie, O. (2017). Language assessment literacy. In E. Shohamy, L. Or, & S. May (Eds.), Language Testing and Assessment (pp. 257–270). Springer International Publishing. Inbar-Lourie, O. (2013, November 1–2). Language assessment literacy: What are the ingredients? [Paper presentation]. The 4th EALTA-CBLA SIG Symposium, Nicosia, Cyprus. http://www.ealta.eu.org/events/CBLAcyprus2013/Lectures_workshops/O_Inba r-Lourie%20-%20plenary%20-1.pdf Jannati, S. (2015). ELT teachers’ language assessment literacy: Perceptions and practices. The International Journal of Research in Teacher Education, 6(2), 26–37.

LAL of novice EFL teachers 157 Jeong, H. (2013). Defining assessment literacy: Is it different for language testers and non-language testers? Language Testing, 30(3), 345–362. Jin, Y. (2010). The place of language testing and assessment in the professional preparation of foreign language teachers in China. Language Testing, 27(4), 555–584. Karagul, B. I., Yuksel, D., & Altay, M. (2017). Assessment and grading practices of EFL teachers in Turkey. International Journal of Language Academy, 5(5), 168–174. Khadijeh, B., & Amir, R. (2015). Importance of teachers’ assessment literacy. International Journal of English Language Education, 3(1), 139–146. Klinger, C. J. T. (2016). EFL professors’ beliefs of assessment practices in an EFL pre-service teacher training undergraduate program in Colombia (Publication No. 10239483) [Doctoral dissertation, Southern Illinois University Carbondale]. ProQuest Dissertations and Theses Global. Lam, R. (2014). Language assessment training in Hong Kong: Implications for language assessment literacy. Language Testing, 32(2), 169–197. Mede, E., & Atay, D. (2017). English language teachers’ assessment literacy: The Turkish context. Dil Dergisi-Ankara Universitesi TOMER [Language Journal-Ankara University TOMER], 168(1), 43–60. Miles, M. B., & Huberman, A. M. (1994). An expanded sourcebook: Qualitative data analysis (2nd edition). SAGE Publications. Munoz, A. P., Palacio, M., & Escobar, L. (2012). Teachers’ beliefs about assessment in an EFL context in Colombia. Profile, 14(1), 143–158. Newfields, T. (2006). Teacher development and assessment literacy. Authentic Communication: Proceedings of the 5th Annual JALT Pan-SIG Conference, 48–73. http://hosted. jalt.org/pansig/2006/PDF/Newfields.pdf O’Loughlin, K. (2006). Learning about second language assessment: Insights from a postgraduate student online subject forum. University of Sydney Papers in TESOL, 1, 71–85. Onalan, O., & Karagul, A. E. (2018). A study on Turkish EFL teachers’ beliefs about assessment and its different uses in teaching English. Journal of Language and Linguistic Studies, 14(3), 190–201. Oz, H. (2014). Turkish teachers’ practices of assessment for learning in the English as a foreign language classroom. Journal of Language Teaching and Research, 5(4), 775–785. Oz, S., & Atay, D. (2017). Turkish EFL instructors’ in-class language assessment literacy: Perceptions and practices. ELT Research Journal, 6(1), 25–44. Patton, M. Q. (2002). Qualitative research and evaluation methods (3rd edition). SAGE Publications. Rea-Dickins, P. (2004). Understanding teachers as agents of assessment. Language Testing, 21(3), 249–258. Rogier, D. (2014). Assessment literacy: Building a base for better teaching and learning. English Language Teaching Forum, 3, 2–13. Sahinkarakas, S. (2012). The role of teaching experience on teachers’ perceptions of language assessment. Procedia – Social and Behavioral Sciences, 47, 1787–1792. Saka, F. O. (2016). What do teachers think about testing procedures at schools? Procedia – Social and Behavioral Sciences, 232, 575–582. Sariyildiz, G. (2018). A study into language assessment literacy of pre-service English as a foreign language teachers in Turkish context [Unpublished master’s thesis]. Hacettepe University.

158 Aylin Sevimel-Sahin Scarino, A. (2013). Language assessment literacy as self-awareness: Understanding the role of interpretation in assessment and in teacher learning. Language Testing, 30(3), 309–327. Semiz, O., & Odabas, K. (2016). Turkish EFL teachers’ familiarity of and perceived needs for language testing and assessment literacy. Proceedings of the Third International Linguistics and Language Studies Conference, 66–72. https://www.academia.edu/ 34827097/Turkish_EFL_Teachers_Familiarity_of_and_Perceived_Needs_for_Langua ge_Testing_and_Assesment_Literacy Sheehan, S., & Munro, S. (2017). Assessment: Attitudes, practices and needs. ELT Research Papers 17.08. British Council. https://www.teachingenglish.org.uk/sites/teacheng/ files/pub_G239_ELTRA_Sheenan%20and%20Munro_FINAL_web%20v2.pdf Smith, J. A., Flowers, P., & Larkin, M. (2009). Interpretive phenomenological analysis: Theory, method and research. SAGE Publications. Stabler-Havener, M. L. (2018). Defining, conceptualizing, problematizing, and assessing language teacher assessment literacy. Teachers College, Columbia University Working Papers in Applied Linguistics & TESOL, 18(1), 1–22. Stiggins, R. J. (1995). Assessment literacy for the 21st century. Phi Delta Kappan, 77(3), 238–245. Tsagari, D. (2013). EFL students’ perceptions of assessment in higher education. In D. Tsagari, S. Papadima-Sophocleous, & S. Ioannou-Georgiou (Eds.), International experiences in language testing and evaluation (pp. 117–143). Peter Lang. Tsagari, D., & Vogt, T. (2017). Assessment literacy of foreign language teachers around Europe: Research, challenges and future prospects. Papers in Language Testing and Assessment, 6(1), 41–63. Turk, M. (2018). Language assessment training level and perceived training needs of English language instructors: A mixed methods study [Unpublished master’s thesis]. Bahcesehir University. Ukrayinska, O. (2018). Developing student teachers’ classroom assessment literacy: The Ukrainian context. In S. Hidri (Ed.), Revisiting the assessment of second language abilities: From theory to practice (pp. 351–371). Springer International Publishing. Viengsang, R. (2016). Exploring pre-service English teachers’ language assessment literacy. Modern Journal of Language Teaching Methods (MJLTM), 6(5), 432–442. Vogt, K., & Tsagari, D. (2014). Assessment literacy of foreign language teachers: Findings of a European study. Language Assessment Quarterly, 11(4), 374–402. Volante, L., & Fazio, X. (2007). Exploring teacher candidates’ assessment literacy: Implications for teacher education reform and professional development. Canadian Journal of Education, 30(3), 749–770. Wach, A. (2012). Classroom-based language efficiency assessment: A challenge for EFL teachers. Glottodidactica, 39(1), 81–92. White, E. (2009). Are you assessment literate? Some fundamental questions regarding effective classroom-assessment. OnCUE Journal, 3(1), 3–25. Yan, X., Zhang, C., & Fan, J. J. (2018). ‘Assessment knowledge is important but …’: How contextual and experiential factors mediate assessment practice and training needs of language teachers. System, 74, 158–168. Yastibas, A. E., & Takkac, M. (2018). Understanding the development of language assessment literacy. Bingol University Journal of Social Sciences Institute, 8(15), 89–106.

Chapter 9

Teachers’ assessment of academic writing Implications for language assessment literacy Zulfiqar Ahmad Introduction In most academic settings, the course teachers are responsible for creating, administering, and grading all the course assessment interventions, which include but are not limited to: quizzes, in-class assignments, portfolio management, and mid and final term examinations. The course teachers are expected to produce and report academically reliable accounts of students’ performance as charted out in curricular, institutional, and national policies. This multifaceted role, coupled with pedagogic assignments, anticipates a high level of assessment literacy (AL), which Stiggins (1995, p.240) understands as ‘knowing the difference between sound and unsound assessment’. Gapsin understanding and executing the principles of sound assessment are liable to produce inaccurate test results that may be vulnerable to faulty interpretations and decisions, and may adversely affect the stakeholders’ perceptions of assessment, more specifically test takers’ perceptions (Rahimi, Esfandiari & Amini, 2016). In the field of language teaching, the term language assessment literacy (LAL) has been introduced to differentiate this specialized form from its more global variant of AL (Giraldo, 2018). LAL is based on the premise that the raters are knowledgeable about the language they teach and test, as well as adequately trained and skilled in the theoretical and practical underpinnings of language testing (Davies, 2008; Fulcher, 2012; Inbar-Lourie, 2013). Following these assumptions and Malone (2013), the operational construct for this study has been situated in teachers’ ability to create and follow appropriate assessment rubrics as well as grade academic writing, paraphrasing in this case, as closely to the construct of the writing task as is possible. Several studies report teachers’ lack of suitable training and skills in LAL (Lin, 2014; Popham, 2006), but most of these studies are based only on survey reports involving different stakeholders related to LAL. One serious limitation of these type of studies is that they base their findings and conclusions on the perceptual understanding of the participants without actually analyzing the teachers’ real-life assessments of any specific language skills. This research gap in LAL prompted the researcher to use already graded examination scripts as the unit of analysis in order

160 Zulfiqar Ahmad

to find out the appropriateness of exam rubrics, the measurement scale, and teachers’ use of these rubrics and measurement scale. The researcher anticipated that the relationship of these variables with the test scores would help not only to identify gaps in assessment practices, but also to foreground implications for the LAL training of teachers of academic writing in particular and English as a Foreign Language in general.

Review of the literature The review of literature encompasses theoretical perspectives on LAL, especially in the context of academic writing; paraphrasing as an academic literacy skill; issues in the analysis of paraphrasing; and a brief overview of research work on LAL. Language assessment literacy and assessment of academic writing One of the primary aims of summative assessment (SA) is to showcase the extent to which pedagogic interventions have been successful in achieving course learning objectives (CLO). Considering academic writing to be the most complex and challenging of the language skills (Nunan cited in Ahmad, 2017b), SA of writing in academic contexts assumes a special significance, for it unfolds not only the writing proficiency student writers have achieved at the end of a course, but also the relevance of the teaching methods, instructional materials, and assessment practices (Thomas, Allman, & Beech, 2004). Teachers’ lack of competence in language assessment (Nunan, 1988) may render performance indicators unreliable and invalid, thereby resulting in negative washback effect, which may challenge the entire content and delivery design of anacademic writing programme. Following López and Bernal’s (2009) emphasis on training language teachers in LAL, it seems crucial that writing teachers be trained in LAL, which, though ‘a large and still developing construct in applied linguistics’ (Giraldo, 2018, p.191), could provide them with the skillset and knowledge base essential for reliable assessment of the academic texts. LAL for teachers of academic writing refers to their ability to analyze and grade writing samples in compliance with assessment rubrics. However, the term carries much more than this oversimplified view of the assessment role. Fulcher (2012, p.125) elaborates on this basic premise of LAL as such: The knowledge, skills, and abilities required to design, develop, maintain, or evaluate, large scale standardized and/or classroom-based tests, familiarity with test processes, and awareness of principles and concepts that guide and underpin practice, including ethics and codes of practice. The ability to place knowledge, skills, processes, principles, and concepts within wider historical, social, political, and philosophical frameworks in order to understand why practices have arisen as they have, and to evaluate the role and impact of testing on society, institutions, and individuals.

Teachers’ assessment 161

LAL thus refers to the execution of skills and knowledge in a way that is grounded in theory, and which aims toempower the teacher to have a clear cognizance of his or her role as an assessor of academic writing. The role, which may appear supra-academic in its orientation, involves an understanding of the nature, application, and implications of the what, why, when, and how of assessment. What refers to the language trait being assessed, whymeans the purpose of assessment, when means the learning or course stage when a particular trait should be tested, and how includes the assessment processes inclusive of test design, administration, and grading. Designing an academic writing test is primarily the job of a language assessment specialist or trained writing examiners, as is done in large scale standardized tests such as the International English Language Testing System (IELTS) and Test of English as a Foreign Language (TOEFL). But in most academic contexts it is the teachers of academic writing who have the responsibility of managing all the essentials of assessment. This indicates the need for training in test construction, assessment rubrics, and consistent measurement of the writing tasks. Owing to contextual variations and curricular preferences, it is hard to establish a workable construct for writing (Weigle cited in Ahmad, 2019), and even a small deviation from contextual parameters, which are situated in institutional policies and course objectives, can adversely affect the purpose of assessment as well as performance of the teachers as raters. It is equally important to ascertain the timeframe for assessment or the learning stage when a particular assessment intervention is to be used. LAL also prioritizes the rationale for assessment so that the teachers must know what they are assessing for and why in this specific way. However, the most significant dimension of LAL seems to be its focus on the ‘how’ of assessment which entails holistic yet rationalistic implementation of the skills and knowledge received through training and experience. LAL expects the raters to be able to produce a reliable and valid assessment of the writing sample despite individual differences. Stiggins (cited in Herrera &Macias, 2015, p.307) bases his notion of LAL competence on the following benchmarks: (a) identifying clear rationale for assessment, (b) explicitly stating anticipated outcomes, (c) using appropriate assessment strategies and methods, (d) designing reliable assessment items, rubrics, and sampling, (e) eliminating rater bias, (f) reporting the results honestly, and (g) employing assessment as a pedagogic tool. Researchers (e.g. Lin & Su, 2015; Sultana, 2019, etc.) have identified a lack of appropriate training in LAL among English as a Foreign Language (EFL) teachers. Mai (2019, p.104), for instance, argues that ‘most teachers of English at all levels of language education still face the challenge of identifying “criteria” for writing assessment scales’. Issues like this can have serious implications for the performance of teachers of academic writing who are responsible for designing and grading writing exams. Most teacher training programmes do include a module on language testing and assessment but they are not comprehensive enough to equip the teachers, especially novices, to confidentlyundertake language assessment. There is a dearth of both

162 Zulfiqar Ahmad

pre-service and in-service training in language assessment with the result that the theoretical base that is developed in academic degree programmes such as the MA in Teaching English to Speakers of Other Languages (TESOL) or Applied Linguistics is not suitably refined for the practicum in real-life teaching and assessment situations. Unskilled or inappropriate assessment by teachers can raze the entire assessment edifice to the ground. A teacher who is not knowledgeable enough to identify, for instance, the misuse of cohesive devices or a lack of coherence in the text cannot effectively assess student writing. Teachers have also been found to adopt holistic assessment when analytic or criterion-based assessment was required. This causes visible deviations from the assessment rubrics with the result that, not only are the immediate assessment cases rendered ineffective, but the other measures of course assessment and evaluation become unreliable. In some cases, the test design or the assessment rubrics are flawed. There have been huge gaps in the writing construct and assessment rubrics with the consequence that theteacher assessment cannot produce an accurate picture of the students’ proficiency in academic writing. The issues with the assessment of academic writing revealdeficiencies in LAL among those involved in assessment. One obvious implication is the need to identify the LAL needs of the writing teachers so that they canbe trained in the dynamics of assessment. The present study proposes to use samples of students’ paraphrases assessed summatively to find out the LAL needs of the teachers engaged in the assessment of academic writing. Research studies on LAL Most studies on LAL issues use surveys to gauge perceptions which focus mainly on the knowledge base in regard to language assessment (e.g. Bailey & Brown, 1996; Fulcher, 2012). Plake and Impara (1997), for instance, used their 35-item survey on 555 teachers across the US and found alarmingly low levels of proficiency in language assessment. Similar findings were reported by Campbell, Murphy, and Holt (2002) and Mertler (2004), who employed Plake and Impara’s (1997) framework. Jin (cited in Lin &Su, 2015), in a comprehensive analysis of language testing programmes across Chinese universities, found the assessment perspective being relegated in favour of the testing and measurement perspective. Lin and Su (2015) applied Coombe et al.’s (2007) framework to measure LAL of Chinese EFL teachers. The results revealed statistically non-significant variations in regard to teacher experience and training in assessment. In another study, Kalajahi and Abdullah (2016) surveyed LAL levels of 65 Malaysian university teachers and concluded that the participants lacked adequate training in assessment literacy. In Turkey, Öz1 and Atay (2017, p.25) interviewed EFL teachers to collect their perceptions about in-class language assessment and its implication for actual practice. They found teachers generally knowledgeable about the theoretical aspects of assessment but found disparities between ‘assessment literacy and classroom reflection’.

Teachers’ assessment 163

In Tunisia, Hidri (2016) investigated university and secondary school teachers’ conceptions of assessment using a three-factor inventory on LAL and concluded that in this context they had conflicting and fuzzy ideas about LAL, and that in most of it, assessment meant irrelevance. Similarly, Ölmezer-Öztürk and Aydın (2018) used their own framework, Language Assessment Knowledge Scale (LAKS), containing 60 items with four constructs to gauge the assessment skills and knowledge of 542 Turkish university teachers. The participants received less than half of the total score, which significantly questioned their skill and knowledge base in assessment matters. Janatifar and Marandi (2018) used Fulchers’ (2012) LAL survey to identify characteristics of LAL in Iranian EFL settings. The participants mentioned deficiency in LAL and felt the need for hands-on training in practical language assessment apart from theoretical issues. Mellati and Khademi (2018) used teachers’ assessment literacy inventory, semistructured interview, non-participatory observation, and the Writing Competence Rating Scale to measure the impact of Iranian teachers LAL competence in assessing writing programmes. The study found significant associations between teachers’ LAL competence and students’ writing achievement. A study by Sultana (2019, p.1) of the LAL levels of Bangladeshi EFL teachers indicated ‘how the inadequate academic and professional testing background of teachers hindered their performance in conducting assessment-related tasks and contributed to their limitations in the use of assessments to improve teaching’. Ölmezer-Öztürk and Aydın (2019) found that a lack of training in LAL in both pre-service and inservice Turkish EFL teachers was the main reason for low levels of language assessment knowledge, and found that the instructors felt insufficiently equipped to assess the individual language skills competently.

Paraphrasing as an academic literacy skill Paraphrasing is an important academic literacy skill that aims at transforming the source text (ST) into a meaningfully compatible text with due acknowledgement to the original source. A paraphrase can be identified as a substantial paraphrase, a patchwriting paraphrase, a superficial paraphrase, or a completely inaccurate paraphrase (Sun & Yang, 2015). Paraphrases that employ only specific terminology or general words that are frequently used in the ST are substantial paraphrases (Keck, 2006). Patchwriting, on the other hand, is a paraphrasing strategy which employs direct copying from the ST and then omitting a few words or phrases, transforming syntactic structures or providing synonyms for individual content words (Howard, 1995). The extent of direct borrowing determines if the paraphrasing is superficial or not. Some scholars (Roig, 1999; Shi, 2004) consider the borrowing of five or more consecutive words to be a superficial paraphrase. Inaccurate or unacceptable paraphrase can be understood as the unchanged borrowing from the ST either of individual words or syntax (Oshima & Hogue, 1999), or it could be a semantically deviated or ambiguous replica of the ST.

164 Zulfiqar Ahmad

Paraphrasing can be assumed as the backbone of academic research as authors frequently paraphrase relevant research to substantiate, integrate, and support their argument . Hard to master (Yamada, 2003), paraphrasing as an academic literacy skill showcases the discourse competence of writers in terms of their ability to use a variety of syntactic structures an morphological patterns, and lexical range and diversity. More importantly, paraphrasing reveals the extent to which a writer comprehends the text and reports it to maintain originality of meaning and content. Following Shi, Fazel, and Kowkabi (2018), expert writers are also expected to incorporate their authorial stance into the paraphrased text alongside the ST. Strict adherence to the prescribed academic conventions is the norm and any deviations can question the academic integrity of the writer or the researcher.

Issues with the assessment of paraphrasing Assessment of paraphrasing is a challenging task, especially in the absence of a ‘consensus on what constitutes a good paraphrase’ (Shi cited in Shi et al., 2018, p.32). Paraphrasing a ST involves a thematically and semantically equivalent text; however, variations in the paraphrased text length can cause certain issues which may affect test scores, as has been revealed in the studies by Kennedy and Thorp (2007) and Mayor et al. (2007) where higher scores were statistically positively correlated with longer text and clause length respectively. Though there is no prescribed length for the paraphrase in regard to the ST, a much smaller sample is likely to be read as a summary or in some cases a précis of the ST. Paraphrases from novice writers with shorter text length may miss important ideas or content from the ST. Then there is the issue of developing an assessment scale for measuring the presence or absence of ideas. The paraphrased text itself can lend itself to varying levels of interpretation and grading in the absence of measurement criteria which segregate different types of paraphrases and remove rater bias to ensure optimum levels of reliability. The quality of paraphrase is directly linked with the writer’s comprehension of the ST (Sun, 2012), and a patchwriting or superficial paraphrase, despite language errors, can be meaningfully appropriate or otherwise. How to assess and grade understanding of the ST, especially in relation to substantial patchwriting and superficial paraphrasing, could become a daunting task for the assessors. The assessment criteria must operationalize the notion of plagiarism, and clearly determine the extent to which direct borrowing of the ST is permissible. Similarly, the issues of language use, lexical range, and other discourse features should receive due place in the scheme of assessment. All of these, and a few unanticipated issues that might crop up during the process of actual assessment, put the onus of responsibility on the assessors. Without proper training in LAL, chances are that the assessors would resort to their subjective preferences and may deviate from the rubrics, and thereby measure inconsistently.

Teachers’ assessment 165

Aims and significance There is no study, especially in the Arab EFL context, which presents an analysis of academic literacy skill – paraphrasing in this case – to find gaps in assessment practices from an LAL perspective. With this gap in focus, the researcher proposed to use samples of paraphrases of novice EFL undergraduate students to see how they had been assessed in compliance with the assessment criteria and what gaps had been left in the assessment performance of the raters which could provoke a need for training in LAL. More specifically, the researcher set the following aims to find out the implications of teacher-led assessment of academic writing for LAL: i ii

the extent to which the assessment criteria and rubrics provide for appropriate analysis of students’ paraphrasing skills the extent to which the test scores correlate with the assessment criteria and rubrics

Method This section of the chapter details the participants and research context of the study, the characteristics of the writing samples collected for paraphrase analysis, and the analytical procedures adopted for analysis of the data. The participants and the research context The study was conducted at Yanbu English Language Institute (YELI), Yanbu Al-Sinaiyah, Saudi Arabia. YELI provides English language training to the Saudi male and female students enrolled in various science and technology, business studies, and humanities programmes at the Preparatory Year, the Associate, and the undergraduate levels. The participants of the present study were male undergraduate students taking a two-semester mandatory Academic Writing course based on Oshima and Hogue’s (2006) book. Before starting this academic writing course, the participants had already completed the Preparatory Year as well as the Associate Level English Language courses in the same institute. The course was taught by qualified language instructors who were recruited from across the globe on the basis of their qualifications, experience, and suitability for the teaching context. Apart from the course delivery, they were also responsible for the course assessment and evaluation. The academic writing programme at YELI trained students in writing academic essays and acquiring academic literacy skills such as paraphrasing and academic writing conventions. The learner achievement was assessed through summative assessment which included in-class writing assignments, as well as mid- and final-term examinations. The students’ performance was evaluated on a grade/point system, and the results were formally communicated and later reflected in their transcripts.

166 Zulfiqar Ahmad

The writing samples Following Best and Kahn (cited in Ahmad, 2017a), that a sample size of n = 30 or more can yield significant results, the sample paraphrases (n = 55) were randomly collected from the YELI. The task – a ST of 207 words (Appendix A) was an excerpt from Chase (cited in Verma, 2015, p.491). Saudi EFL undergraduate students had produced these paraphrases in response to a final-term examination question which aimed to test their proficiency in academic literacy skills and academic writing through paraphrasing a ST. The examination scripts were assessed by the teachers of academic writing on this course based on the assessment criteria and rubrics detailed in Table 9.1:

Table 9.1: Assessment Criteria and Rubrics

Paraphrasingcontents-expression& Plagiarism

Grammar & Spellings

Standard Not Met0-1

Progressing2-3

Proficient4-5

Exemplary6-7

The text is plagiarized due to a major violation of paraphrasing rules.

Student is in minor violation of one of the paraphrasing rules (order, phrasing, ideas), but the text cannot be considered plagiarized. ORThe paraphrased text fits in the Proficient category, but is awkward and fairly uncontrolled.

Student uses effective paraphrasing strategies and does not violate any of the paraphrasing rules (order, phrasing, ideas), but the paraphrased text is not completely smooth and controlled.

Student uses effective paraphrasing strategies, does not violate any of the paraphrasing rules (order, phrasing, ideas), and develops a smooth, natural sounding paraphrase of the original text with grade level appropriate conventions.

0-0.5

1

2

Serious and numerous errors in mechanics, usage, grammar, or spelling block the audience’s understanding of the process.

Errors in mechanics, usage, grammar, or spelling interfere with the audience’s understanding of the process

There are some errors in mechanics, usage, grammar, or spelling

There are few or no errors in mechanics, usage, grammar, or spelling.

Teachers’ assessment 167

Analytical procedures The first step after the paraphrase samples had been collected was to type the hand written student writing in a Word document with all the errorsintact to maintain originality and transparency. Each text was allotted a code, and word length and exam score were recorded for later analysis. The assessment rubrics had four performance descriptors which were graded on four-point criteria – order, paraphrasing, ideas, and language use (grammar and mechanics). The next step was to devise a measurement scale because the exam scripts had been marked holistically with a rounded score for the overall performance instead of the four-point criteria stated in the rubrics. Because the focus of the study was to investigate how the teachers had assessed the sample paraphrases and not the grammatical issues, the researcher decided not to analyze language problems in view of the absence of specific marks for the language use, and to consider the teacher-awarded scores to account for the three measurement criteria, namely: order, paraphrasing, and ideas. However, a few interventions had to be introduced to facilitate the analytical process. The researcher developed a template which was used to segregate and analyze the sample paraphrases by the criteria of: order, paraphrasing, and ideas. Since all paraphrases followed the order of the ST, no further analysis was done. For paraphrasing, the rubrics were found to be vague and ambiguous as they did not provide for a systematic scale or criteria which could be used to analyze teachers’ assessment of paraphrasing. Therefore, the researcher had to first establish a text length which could be considered a paraphrase. Two groups of paraphrases were identified – paraphrases with 150 or more words were assumed to be a reliable text-length equivalent of the ST, and paraphrases with 149 or lesser words were assumed to be either summaries of the ST or an unreliable version of the ST. To find out if the sample texts were substantial, superficial, patchwriting, or inaccurate, the written samples were analyzed for these paraphrasing standards based on the difference between the original and the plagiarized parts of the text. For analysis, plagiarism was operationalized to be the incidence and frequency of five or more consecutive words (Shi, 2012; Sun & Yang, 2015) from the ST or repetition of the ST words with minor changes in word order. The last measurement criteria – order– was analyzed based on the count of missed ideas per text. The samples were also analyzed for correlation of the exam scores with the performance descriptors. The texts were also analyzedfor citing the source in compliance with the academic conventions. Statistical Package for the Social Sciences (SPSS) was used to obtain descriptive statistics for the word length, exam scores, paraphrasing, ideas, and missed ideas. Percentage scores were obtained for these variables as well as for citations given or not, and for the four performance descriptors. Non-parametric correlation analysis was also done to ascertain the presence of any statistically significant correlations between different variable of the study.

168 Zulfiqar Ahmad

Results The paraphrases had been examined on the four-point assessment criteria (order, paraphrasing, ideas, and language use) which had been set for the original assessment at the research site. Since all the paraphrases (n=55) were found to adhere to the order of the STs, no further analysis was conducted. As for the paraphrasing, all the paraphrased texts were both patchwriting and inaccurate. However, the third descriptor (i.e. ‘ideas’) was segregated between complete and incomplete paraphrases to allow for further analysis. A little more than half of the paraphrases failed to achieve the operationalized word length for this study. Percentage scores reveal that 47.27% of the paraphrases were 150 or more words, whereas 52.72% of the paraphrases were found to be in the range of 50 to 149 words. The major reason for this seems to be the number of ideas that had been dropped by the students in their attempt to paraphrase the ST. Only 20% of the paraphrases restated all of the ideas from the ST. 21.81% of paraphrases were found to have missed 3 ideas while 16.36% of paraphrases had 2 and 4 dropped ideas respectively. 33.89% of the paraphrased texts were plagiarized, with the minimum being.75% and the maximum 87.37%. The students’ test scores ranged from 4 to 8 out of 10. 36.36% of the paraphrases were awarded 6 followed by 16.36% of the texts awarded 6.5 and 7 points, and 12.72% by paraphrases awarded either 5 or 5.5 points. 60.09% of paraphrases were found to be ‘Proficient’ with the score range from 6 to 7. Following the exam rubrics, 20.09% of the paraphrases with the score of 5.5, 6.5, 7.5, and 8, could not be identified with any of the performance descriptors. 87.27% of the paraphrases did not cite the source of the ST. SPSS was used to obtain descriptive statistics and correlation analysis for the text length, test scores, paraphrasing, and the missing ideas. The results for the Text Length (TL), Test Score (TS), Paraphrasing (PP), and Missed Ideas (MI) were found to be M = 153.91; SD = 34.576, M = 6.08; SD =.744, M = 52.16; SD = 36.156 and M = 3.15; SD = 2.360 respectively. These figures indicate that the paraphrased texts varied considerably in their length in comparison with the ST. Most of the TS range was, however, closer to the mean. On the other hand, PP and MI were unevenly dispersed among the corpus of the paraphrased texts and thus illustrated why most of the paraphrases were not closer to the word length of the ST i.e. 207 words. Spearman’s rho (rs) failed to identify any statistically significant association between the variables except for between the TL and PP, rs =700; p =.01; the statistically negative one between TL and MI, rs =-.712; p =.01, and PP and MI, rs = -.435; p =.01. The results for the Text Length Range 1(TLR1) 50 to 149 words per paraphrase, TLR1, Test Score Range 1 (TSR1), Paraphrasing Range 1 (PPR1), and Missed Ideas Range 1 (MIR1) had M = 124.73; SD = 22.326, M = 6.15; SD =.822, M = 27.73; SD = 15.517 and M = 4.58; SD = 2.230 respectively. Spearman’s rho test of correlation failed to find any statistical relationship among these variables. On the other hand, the descriptive statistics for the same variables

Teachers’ assessment 169

in Text Length Range 2 (TLR2) with 150 or more words were found to be M = 180.07; SD = 19.007, M = 6.02; SD =.675, M = 74.72; SD = 35.425 and M = 1.86; SD = 1.642 respectively. Spearman’s rho was negatively significant between the TLR2 and Missed Ideas Range 2 (MIR2), rs = -.602; p =.01. The results indicated that the word length and the plagiarized text did not affect the test scores in the two groups; however, texts with shorter word range had more missing ideas than the texts with 150 or more words. Similarly, the missing ideas did not seem to determine the text scores as there was a fraction of a difference between the mean scores of the two groups. The results for the second group also revealedthat the higher the number of words, the lesser the number of missing ideas.

Discussion This section of the study focuses on the discussion about the relevance of the assessment criteria, rubrics, test scores, and teachers’ performance as assessors to figure out the implications for LAL. The results of the paraphrase analysis reveal serious shortcomings both in the assessment criteria and the assessment process. These findings support Mellati and Khademi (2018) that teachers’ assessment literacy affects students’ writing performance results. The standards set as performance descriptors are both vague and ambiguous to the extent that they do not permit uniform and reliable assessment of the students’ paraphrases. There is no descriptor category for the score range of 5.5 and it is not clear if this score should be considered ‘Proficient’ or ‘Exemplary’ in terms of performance description. The same is true of the score range for 7.5 and above. In addition, the labelling of the descriptors into ‘Standard not met’, ‘Progressing’, ‘Proficient’, and ‘Exemplary’ may well describe linguistic competence but not paraphrasing as an academic literacy skill. There is no such explanation for paraphrasing in the research studies done on the subject. Following Shi (2012) and Sun and Yang (2015), paraphrasing is either substantial, or patchwriting, or superficial, or inappropriate. The following excerpts in Table 9.2 from students’ paraphrasing illustrate the point: Table 9.2: Examples of Sample Paraphrasing Source Text

Paraphrasing

Classification

Critical care nurses function in a hierarchy of roles. This person oversees the hourby-hour functioning of the unit as a whole On each shift a nurse assumes the role of a resource nurse. Critical care nurses function in a hierarchy of roles.

Nurses play on important role in the staff structure of clinics. The role of this person is to monitor the hour-by-hour functioning of the unit as a whole Role of resource nurse can be assumed by nurse in every shift. The hierarchy of roles in critical care nurses function.

substantial patchwriting

superficial inaccurate

170 Zulfiqar Ahmad

The paraphrasing rules of ‘order’, phrasing’,and ‘ideas’ do not adequately provide for a reliable assessment of the students’ ability to paraphrase. Since the students are formally taught how to paraphrase, they maintain the order of the ST intact, as all the participants did in this study. Itis the same case with ‘ideas’, which possibly refers to the content of the ST. A missed idea does not reveal students’paraphrasingability, though it may reflect the task completion component of the exam. The most relevant descriptor is the ‘paraphrasing’,which should have been further segregated into assessment scales to properly ascertain students’paraphrasingability. There is no mention of, for example, any pointer that measures students’ understanding of the ST. The grade description is vague in the sense that the use of words like ‘awkward’,‘fairly uncontrolled’, ‘smooth’, etc. are open to rater bias at the cost of reliable assessment. The exam rubrics carry a separate criterion for ‘grammar and spelling’and do not provide for other features of the written discourse. It is also not clear how these types of errors ‘interfere with the audience’s understanding of the process’ as well as what ‘the process’ is (Appendix A). There was no provision in the measurement scale to distinguish a paraphrase from a summary. Texts ofmuch smaller word lengththan the ST were assessed on the same criteria as the texts with higher word length without any significant effect on the exam scores. Following the assessment criteria, the paraphrased texts should have been marked for the individual descriptors with clear differentiation of points for the grammar and spelling component. Conversely, all the texts were assessed in clear violation of the assessment criteria as the teachers resorted to subjective grading by allocating a rounded score to every paraphrase. These findings collate with Popham’s (2009) study which reported teachers using subjective marking while assessing productive skills such as the writing. The findings also support Öz and Atay (2017, p.39) whose participants did not use any ‘table of specification’ to assess in-class performance and depended solely on their ‘instinctive judgments’. This not only challenges the teachers’assessment practices but also makes the test scores unreliable. The main issue with assessment standards, as revealed by the results, is that the teachers could not differentiate between paraphrasing and summary writing. This overlooking of an important aspect of paraphrasing also affects the correlation of the four assessment descriptors with the exam scores. In most cases, there is no statistical correlation between the degree of completeness of the paraphrases and the exam scores awarded. Also,in some cases a text with a shorter length has a higher score than the texts with longer word length. Despite the fact that there is an absence of a provision for text-length as an assessment descriptor in the assessment criteria, it is evident from the analysis that text length affected the grades awarded, and thus, teachers did not follow the rubrics on an analytical or criterion-based framework. Paraphrasing as an academic literacy skill entails that teachers have cognizance of what constitutes a paraphrase in terms of text length. Similarly, the teachers do not seem to appropriately identify issues which, though not explicitly stated in the rubrics,

Teachers’ assessment 171

form an essential feature of any writing product, especially the paraphrasing. For example, in many instances, the paraphrased texts deviated from the ST in terms of meaning, which reflects students’ poor understanding and interpretation of the ST. However, these texts with comprehension issues do not differ in their test score from those who show better understanding of the ST in the paraphrased version. The lack of quality and uniformity in assessment procedures as witnessed in the sample texts has serious implications for the assessment systems, especially from the LAL perspectives. Incompetence in introducing and processing a transparent and reliable assessment design can adversely affect not only students’ grades but also pedagogic input, institutional policies, and wider educational objectives. The samples of teacher assessment for the present study cannot be used to provide feedback to the students; however they can be used as a very effective tool of feedback to ascertain students’ academic literacy development and language proficiency, as well as the course evaluation processes which may include appraisal of the teaching quality and instructional materials. The assessment results for the present study offer a few suggestions for the training of academic writing teachers who are also involved in the assessment process in the dynamics of LAL. Following Mellati and Khademi (2018, p.15), the training programmes in LAL should base themselves in imparting knowledge related to assessment objectives, content and methods, measurement, identification of errors, teacher-led feedback, interpretation and communication of results, learners’ involvement in the assessment process, and assessment ethics. However, simply exposing teachers to the theoretical underpinnings of LAL does not equip them with the skills essential for satisfactory assessment. Very rigorous training in the practical matters of assessment must go hand in hand with theoretical knowledge. The teachers should be engaged in developing test rubrics and measurement scales, designing test items, grading and reporting the results, and evaluating the course outcomes, all under the supervision of experts in the field of language assessment. Importantly though, the ‘interpretative framework’ should determine any LAL training programme prioritizing ‘the teacher’s teaching context, social perspectives, beliefs, and understandings’ (Scarino cited in Sultana, 2019, p.12). The study is not without its limitations. First, it was conducted in one institution with a limited number of sample texts. A larger sample size from a diverse and extended population may produce more generalizable results. Second, it is likely that teachers in other English as a Foreign Language (EFL)/English as a Second Language (ESL) contexts will have much more sophisticated exam rubrics and training in LAL. They may produce different results from that of the present study. Third, the participants were all EFL student writers who were receiving training in academic literacy and learning how to write. Analyses of paraphrases by expert writers, or students with higher levels of academic literacy and discourse competence, may yield different results, and thereby different levels of assessment practices. The samples

172 Zulfiqar Ahmad

from the present study were examination scripts and the paraphrases were treated as an exam activity. A study which collects samples of paraphrases from, for instance, research articles or term papers may reflect a different response both in terms of student performance and rater assessment. The study also did not include teachers’ and students’ perceptions. The relationship between teachers’ and students’ beliefs can provide further insights into the matters encompassing LAL.

Conclusion Paraphrasing is an important academic literacy skill, and following Ahmad (2019, p.279), students’ exposure to ‘the contemporary practices in the domain of academic literacy’ helps them to ‘gain membership of their specific discourse community’– one of the very basic aims of academic writing programmes. Such aims cannot be materialized if the language assessment system is not properly supported by background training of the assessors in LAL. Lack of competence in LAL can affect teachers’ judgment and decisions, which in turn can challenge the academic veracity of students’ results, course objectives, course assessment and evaluation, and broader institutional, social, and national policies. A lot is expected from teachers in terms of course delivery and assessment. They must be facilitated through awareness-raising and practical training programmes in LAL, both at the pre- and in-service levels for the benefit of the learners and the academia.

Appendix A: Source text Critical care nurses function in a hierarchy of roles. In this open heart surgery unit, the nurse manager hires and fires the nursing personnel. The nurse manager does not directly care for patients but follows the progress of unusual or long-term patients. On each shift a nurse assumes the role of a resource nurse. This person oversees the hour-by-hour functioning of the unit as a whole, such as considering expected admissions and discharges of patients, ascertaining that beds are available for patients in the operating room, and covering sick calls. Resource nurses also take a patient assignment. They are the most experienced of all nurses. The nurse clinician has a separate job description and provides for quality of care by orienting new staff, developing unit policies, and providing direct support where needed, such as assisting in emergency situations. The clinical nurse specialist in this unit is mostly involved with teaching in orienting new staff. The nurse manager, nurse clinician, and clinical nurse specialist are the designated experts. They do not take patient assignments. The resource nurse is seen as both a caregiver and a resource to other caregivers … Staff nurses have a hierarchy of seniority…Staff nurses are assigned to patients to provide all their nursing care. (Chase, 1995 p.156)

Teachers’ assessment 173

References Ahmad, Z. (2017a). Academic text formation: Perceptual dichotomy between pedagogic and learning experiences. Journal of American Academic Research, 5(4), 39–52. Ahmad, Z. (2019). Analyzing argumentative essay as an academic genre on assessment frameworks of IELTS and TOEFL. In S. Hidri (Ed.), English language teaching research in the Middle East and North Africa: Multiple perspectives (pp. 279–299). Palgrave Macmillan. Ahmad, Z. (2017b). Empowering EFL learners through a needs-based academic writing course design. International Journal of English Language Teaching, 5(9), 59–82. Bailey, K. M., & Brown, J. D. (1996). Language testing courses: What are they? In A. Cumming, & R. Berwick (Eds.), Validation in language testing (pp. 236–256). Multilingual Matters. Campbell, C., Murphy, J. A., & Holt, J. K. (2002). Psychometric analysis of an assessment literacy instrument: Applicability to preservice teachers. Paper presented at the Annual Meeting of the Mid-Western Educational Research Association, Columbus, OH. Coombe, C., Davidson, P., O’Sullivan, B., & Stoynoff, S. (Eds.), (2012). The Cambridge guide to second language assessment. Cambridge University Press. Davies, A. (2008). Textbook trends in teaching language testing. Language Testing, 25(3), 327–347. Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment Quarterly, 9(2), 113–132. Giraldo, F. (2018). Language assessment literacy: implications for language teachers. Profile: Issues in Teachers’ Professional Development, 20(1), 179–195. Herrera, L., & Macías, D. (2015). A call for language assessment literacy in the education and development of teachers of English as a foreign language. Colombian Applied Linguistics Journal, 17(2), 302–312. Hidri, S. (2016). Conceptions of assessment: Investigating what assessment means to secondary and university teachers. Arab Journal of Applied Linguistics, 1(1), 19–43. Howard, R. M. (1995). Plagiarism, authorships, and the academic penalty. College English, 57, 788–806. Inbar-Lourie, O. (2013). Language assessment literacy. In C. A.Chapelle (Ed.), The encyclopedia of applied linguistics (pp. 2923–2931). Blackwell. Janatifar, M., & Marandi, S. S. (2018). Iranian EFL teachers’ language assessment literacy (LAL) under an assessing lens. Applied Research on English Language, 7(3), 307–328. Kalajahi, S. A. R., &Abdullah A. N. (2016). Assessing assessment literacy and practicesamong lecturers. Pedagogika/Pedagogy, 124(4), 232–248. Keck, C. (2006). The use of paraphrase in summary writing: A comparison of L1 and L2 writers. Journal of Second Language Writing, 15, 261–278. Kennedy, C., & Thorp, D. (2007). A corpus-based investigation of linguistic responses to an IELTS academic writing task. In L. Taylor, & P. Falvey (Eds.), Studies in language testing: IELTS collected papers – Research into speaking and writing assessment (Vol. 19, pp. 316–379). Cambridge University Press. Lin, D. (2014). A study on Chinese middle school English teachers’ assessment literacy [Unpublished Doctoral Dissertation]. Beijing Normal University. Lin, D., & Su, Y. (2015). An investigation of Chinese middle school in-service English teachers’ assessment literacy. Indonesian EFL Journal, 1(1), 1–10.

174 Zulfiqar Ahmad López, A., & Bernal, R. (2009). Language testing in Colombia: A call for more teacher education and teacher training in language assessment. Profile: Issues in Teachers’ Professional Development, 11(2), 55–70. Mai, D. T. (2019). A review of theories and research into second language writing and assessment criteria. VNU Journal of Foreign Studies, 35(3), 104–126. Malone, M. E. (2013). The essentials of assessment literacy: Contrasts between testers and users. Language Testing, 30(3), 329–344. Mayor, B., Hewings, A., North, S., Swann, J., & Coffin, C. (2007). A linguistic analysis of Chinese and Greek L1 scripts for IELTS academic writing task 2. In L. Taylor, & P. Falvey (Eds.), Studies in Language Testing: IELTS collected papers – Research in speaking and writing assessment (Vol. 19, pp. 250–314). Cambridge University Press. Mellati, M., & Khademi, M. (2018). Exploring teachers’ assessment literacy: Impact on learners’ writing achievements and implications for teacher development. Australian Journal of Teacher Education, 43(6), 1–18. Mertler, C. A. (2004). Secondary teachers’ assessment literacy: Does classroom experience make a difference? American Secondary Education, 33(1), 49–64. Nunan, D. (1988). The learner centred curriculum. A study in second language teaching. Cambridge University Press. Ölmezer-Öztürk, E., & Aydın, B. (2018). Investigating language assessment knowledge of EFL teachers. Hacettepe University Journal of Education, 34(3), 602–620. Ölmezer-Öztürk, E., & Aydın, B. (2019). Voices of EFL teachers as assessors: Their opinions and needs regarding language assessment. Eg˘itimde Nitel Aras¸tırmalar Dergisi– Journal of Qualitative Research in Education, 7(1), 373–390. Oshima, A., & Hogue, A. (1999). Writing academic English (3rd edition). Addison-Wesley Publishing Company. Oshima, A., & Hogue, A. (2006). Writing academic English (4th edition). Longman. Öz, S., & Atay, D. (2017). Turkish EFL instructors’ in-class language assessment literacy: perceptions and practices. ELT Research Journal, 6(1), 25–44. Plake, B. S., & Impara, J. C. (1997). Teacher assessment literacy: What do teachers know about assessment? In G. D. Phye (Ed.), Handbook of classroom assessment: Learning, achievement, and adjustment (pp. 53–68). Academic Press. Popham, W. J. (2006). All about accountability: A dose of assessment literacy. Improving Professional Practice, 63(6), 84–85. Popham, W. J. (2009). Assessment literacy for teachers: Faddish or fundamental? Theory Into Practice, 48, 4–11. Rahimi, F., Esfandiari, M. R., & Amini, M. (2016). An overview of studies conducted on washback, impact and validity. Studies in Literature and Language, 13(4), 6–14. Roig, M. (1999). When college students’ attempts at paraphrasing become instances of potential plagiarism. Psychological Reports, 84, 973–982. Shi, L. (2012). Rewriting and paraphrasing source texts in second language writing. Journal of Second Language Writing, 21, 134–148. Shi, L. (2004). Textual borrowing in second language writing. Written Communication, 21, 171–200. Shi, L., Fazel, I., & Kowkabi, N. (2018). Paraphrasing to transform knowledge in advanced graduate student writing. English for Specific Purposes, 51, 33–44. Stiggins, R. J. (1995). Assessment literacy for the 21st century. The Phi Delta Kappan, 77 (3), 238–245.

Teachers’ assessment 175 Sultana, N. (2019). Language assessment literacy: An uncharted area for the English language teachers in Bangladesh. Language Testing in Asia, 9(1), 1–14. Sun, Y. C. (2012). Does text readability matter? A study of paraphrasing and plagiarism in English as a foreign language writing context. The Asia-Pacific Education Researcher, 21, 296–306. Sun, Y. C., & Yang, F. Y. (2015). Uncovering published authors’ text-borrowing practices:Paraphrasing strategies, sources, and self-plagiarism. Journal of English for Academic Purposes, 20, 224–236. Thomas, J., Allman, C., & Beech, M. (2004). Assessment for the diverse classroom: A handbook for teachers. Florida Department of Education, Bureau of Exceptional Education and Student Services. http://www.fldoe.org/ese/pdf/assess_diverse.pdf Verma, S. (2015). Technical communication for engineers. Vikas Publishing. Yamada, K. (2003). What prevents ESL/EFL writers from avoiding plagiarism? Analyses of 10 North-American college websites. System, 31, 247–258.

Chapter 10

Reliability of classroom-based assessment as perceived by university managers, teachers, and students Olga Kvasova and Vyacheslav Shovkovy Introduction After Ukrainian higher education joined the Bologna process in 2005, decision-makers report that all university curricula have been redesigned in compliance with modules and credits. Generally, such redesign is accompanied by the development of a national quality assurance system to ascertain that the level of education is of the required standard. Therefore, the introduction of modules and credits has critically increased the role of the summative assessment of levels attained by students at the end of each academic course and at graduation. The recent British Council (BC) report on the state of teaching English for specific purposes in Ukraine reveals that the Bologna requirements to define English language curriculum modules and credits have been implemented partially, whereas the development of a meaningful quality assurance system is still pending (Bolitho & West, 2017). One aspect of the issue is that the evidence of students’ achievements is not based on the external (standardized) tests, which impedes comparability of results across various institutions in the country. Another such aspect refers to the quality of internal, institutional assessment, which in actual fact substitutes external quality assurance, and is therefore setting-specific. Focusing on institutional quality assurance, the experts concluded that there were generally poor standards of tests and examinations resulting from a lack of testing and assessment expertise in those who prepare assessment materials. In Ukraine, test preparation is solely the responsibility of instructors since no test development units, with specially trained staff, have been included in universities as yet. Nationwide, summative tests are constructed by teachers whose major function is to teach and implement assessment for learning; this allows us to refer to teacher-constructed summative tests as ‘classroom-based summative tests’. The authorship of summative tests raises concerns about the reliability of information regarding attained language levels, which is primarily required of assessment. Decision-making based on inaccurate information may have farreaching consequences on education policies in the national scope.

Reliability of classroom-based assessment 177

This inference is confirmed by the BC researchers who argue that ‘there is a pressing need for training in modern, valid testing and assessment procedures to enable teachers to feel confident in assessing their students against international standards, and to assure the Ministry that standards are being achieved’ (Bolitho & West, 2017, p. 77). Since language testing courses have been only recently introduced in the curricula of several teacher-training and classical universities, an apparent lack of a national psychometrician cadre will remain critical in the short and medium terms. The issue of serving teachers’ assessment literacy (TAL) is therefore viewed as an imperative under the circumstances (Hidri, 2016). The authors of this chapter, who work for the Department of Language Teaching Methodology, is engaged in the development and implementation of pre-service teacher training programmes. The Department has pioneered standalone courses on language testing and assessment (LTA) to undergraduate and postgraduate students (Kvasova, under review), and has an aspiration to promote establishment of a national scientific school of LTA. The Department’s staff make up a core of the Ukrainian Association for Language Testing and Assessment (UALTA) which promotes TAL across the country by organizing workshops involving international and local experts. With the view to contribute to the development of a reliable system of assessment in higher education, we undertook an examination of summative assessment practices in several Ukrainian universities. The survey involved central stakeholders of assessment – university managers (organizers and supervisors), teachers (assessors) and students (assessees). The analysis of the responses aimed to reveal the areas allowing for the compatibility of perceptions and the identification of threats to reliability of measurement, as well as to diagnose what resources to improve reliability are available today, and how they could be enhanced in the foreseeable future.

Review of literature Summative assessment has the purpose of reporting on learning achieved at a certain time, therefore its accuracy and objectivity cannot be questioned. In Western tertiary education systems, summative assessments have long been used to meet the increased demands for accountability. These assessments are regular, systematic, rational, and formalized, and have provided plentiful reasons for appraisal and critique. The mandatory character of summative assessment is opposed to the continuous, informal character of formative assessment and its emphasis on promoting better learning. As Houston & Thompson (2017) point out ‘[f]ormative (feedback) assessment is intended to help students with future learning, whereas summative (feedout) assessment warrants or certifies student achievements to others, including potential employers’ (p.2). Lau (2017) challenges the artificially created dichotomy of ‘summative’ and ‘formative’ assessment wherein summative is bad and formative is good,

178 Olga Kvasova and Vyacheslav Shovkovy

and advocates for the idea that formative and summative assessment need to work in harmony without being opposed to each other. Brown (2019) goes so far in his argument by asserting that formative assessment – assessment for learning – is a meaningful teaching framework rather than assessment the major function of which is ‘verifiability for its legitimacy as a tool for decision-making’. How fair are all these claims? It is worthwhile considering the purposes of assessment specified in official documents on assessment practices in the contexts where quality assurance is well-established. Among the four purposes of assessment classified in the code of practice for assessment offered by the UK Quality Assurance Agency (QAA), the pedagogy-related purpose of ‘providing students with feedback to promote their learning’ is given priority; this is largely pertaining to formative assessment. The next two purposes – measurement (‘evaluation of student skills’) and standardization (‘providing a mark or grade to establish the level of a student’s performance’) – seem to serve both formative and summative assessments. The certification purpose (‘communicating to the public the level of individual achievement as reflecting the academic standard’) is overtly pertinent to summative assessment (QAA, 2006 as cited in Norton, 2009 p.134). Despite the seeming balance amongst assessment purposes, in Western higher education, assessment of learning predominates over assessment for learning. However, more attention has been recently placed on the complementary characteristics of formative and summative assessments (Houston & Thompson, 2017), on the synergy of these two types of assessment differing in form and function (Carless, 2006), as well as transition to alternative forms of summative assessment that are better compliant with the requirements of 21th century education (HEA, 2012). In Ukraine, on the contrary, formative assessment has always been deeprooted in classroom practices and is currently being implemented through a variety of traditional and innovative methods (Dovgopolova, 2011; Shadrina, 2014; Olendr, 2015). In their dedication to assessment for learning (Kvasova & Kavytska, 2014), Ukrainian teaches share beliefs revealed in Muñoz et al.’s study (2012): According to this research, teachers view assessment as a means of improving students’ performance and teaching methods rather than a route towards accountability and certification. Nevertheless, a teacher’s professional duties of evaluating learners’ achievements at the end of learning a subject or course have always been implemented in Ukrainian education. The shift of focus to assessment for reporting has considerably increased teachers’ workload and responsibilities, while lack of hands-on recommendations on the development of these high stakes summative tests have raised concerns of teachers, who are the immediate actors of assessment. The difference in function and use of the two types of assessment is explained by Harlen (2007). She argues that in the course of assessment for learning the major goal pursued by educators is promoting students’ learning. In this case, evidence of progress is frequently expressed in grades. Reliability of

Reliability of classroom-based assessment 179

formative assessment is important but not paramount since teachers’ judgements are informed in multiple ways. When it comes to making decisions about students’ achievements, with the grades affecting educational paths and opportunities of students, the reliability becomes crucial. Harlen (2007) reiterates that ‘the use of assessment results for external purposes demands that they are seen to be as “fair” as possible. Fairness in this context refers to technical reliability or the extent to which results can be said to be of acceptable consistency or accuracy for a particular use’ (p.4). In standardized testing, reliability is universally defined as the consistency of scores or test results and is viewed as a fundamental criterion of a good test (Alderson et al., 1995; Bachman & Palmer, 1996; McNamara, 2000; Hughes, 2003; Fulcher & Davidson, 2007; Douglas, 2010). The bulk of language testing theory discusses the interdependence of reliability and validity wherein validity refers primarily to the quality of a test and reliability is ‘rather a characteristic of test scores obtained from a given test administration or administrations’ (Chapelle, 2013, p.4918). It is universally acknowledged that a good test is the one that provides reliable information on language achievements and/or proficiency. However, we assume that in classroom contexts, reliability of judgements is directly dependent on aspects of test administration and scoring. Fulcher and Davidson (2007) question whether the concept ‘reliability’ adopted in standardized testing and viewed as an attribute of norm-referenced assessment may be directly applied in classroom-based, criterion-referenced assessment. They associate reliability in the latter with ‘trustworthiness’, thus opposing technical and everyday meanings of the word ‘reliability’. They further specify the differences between: Large-scale and classroom assessments, defining the former as construct-irrelevant and the latter as ‘non-construct irrelevant, but directly relevant to assessment of learners in a particular setting’ (p.25). Test tasks included in traditional tests (requiring numerous scores on independent tasks to arrive at a decision about candidate’s proficiency level) and those in classroom assessment whose role is primarily to assess the current abilities of the learners. The roles of an impartial tester and a person immediately involved in teaching and concerned with learners’ achievements. Purposes of these assessment (to certify a proficiency level and award a grade to ascertain a learner’s place in the cumulative achievement of the class). Type of evidence collected in each case (independent in large-scale testing and performance-based, mostly collaborative in the classroom setting).

180 Olga Kvasova and Vyacheslav Shovkovy

Generalizability of score meanings (calculation of reliability coefficient using statistics in large-scale testing vs establishing if decisions made about learners’ achievements are proper and fair (Fulcher & Davidson, 2007). This discussion leads to an understanding that, within education reliability, what helps ensure that assessments accurately measure student language level is a key test property. Clearly, standardized and teacher-constructed tests belong to different measurement paradigms; however, it is assessment of learning where the locus of tensions between them is, as Green argues (2014). He maintains that classroombased assessment depends on a plenitude of variables, which in itself impedes objective measurement. None the less, he further agrees with Harlen (2007) that the results obtained on summative classroom tests may become as reliable as on standardized tests provided scoring criteria are clear, teachers trained, and their implementing assessment is properly moderated. To date, there exists evidence suggesting that validity and reliability of assessments developed by teachers are rather low (Harlen, 2004; Gareis et al., 2015). Keeping that in mind, we find it reasonable to consider the factors that have negative effects on the reliability of assessments. These are the factors related to the quality of test (consistency of test formats, content of questions, length of test, timing), administrative factors (appropriacy of settings and test administration procedure), and affective factors (students’ ability to perform well despite test anxiety/fatigue, personal characteristics) (Coombe et al., 2007). Recommendations on raising reliability of assessments are summed up by Green (2014). These recommendations mostly concern standardized or locally standardized testing: i.e. prepare longer tests, focus on limited scope of skills in one test, employ multiple measures, employ more scorers and raters, assess learners with varied levels of skills. There are, however, recommendations which are not only workable in classroom context, but which seem to present a rigorous requirement of it: Make tasks clear and unambiguous, standardize the test taking conditions, and control how the assessments are scored. As was noted above, validity and reliability are closely interwoven, although a universal maxim tells us that a valid test is always reliable, but a reliable test is not necessarily valid. In classroom conditions, validating a test to ensure its quality as an instrument of measurement requires certain expertise and intellectual effort, and consumes the time of teachers who are notoriously overworked. Although theoretical and practical investment in the enhancement of teachers’ assessment literacy has recently increased (Taylor, 2009; McNamara & Hill, 2011; Fulcher, 2012; Inbar-Lourie, 2013; Pill & Harding, 2013; Vogt & Tsagari, 2014; Green, 2016; Giraldo, 2018; Tsagari et al., 2018), our own experience suggests that language testing remains, as Alderson put it, an ‘arcane’ and ‘scary’ area to most practicing teachers (Alderson, 1999). Green (2014) asserts that following a few basic principles can help improve assessment

Reliability of classroom-based assessment 181

practices and raise teacher assessment literacy. These principles are ‘Planning and Reflection [that] lead to Improvement, when supported by Cooperation and informed by Evidence’ (Green 2014, p. 21). In other words, in classroom conditions where the assessment literacy of teachers is not generally very high, it is mandatory to collaborate on all stages of test development, administration, and analysis. Following this line of thought, and stimulated by Coombe et al.’s (2007) suggestion that ‘it is easier to assess reliability than validity’ (p. xxiv), we conceived of a research project examining issues of the reliability–trustworthiness of summative assessments in university conditions. Examining reliability through sophisticated statistical analysis is hardly possible in Ukraine, where LTA is still in its infancy; despite this, an empirical, qualitative study of reliability of classroom summative assessments is quite feasible. In the following section of this chapter, we will provide the research rationale and methods of investigation employed in our study.

Current study Research rationale The ultimate goal of this research was to explore the practices of summative assessment adopted in Ukrainian universities with the view to establish the degree of reliability–trustworthiness of assessment results. To this end, the study intended to survey the experiences of three central groups of stakeholders – university managers, teachers, and students – involved in organization, implementation, and decision making based on summative assessment. We adopted a working hypothesis that reliability of summative tests in universities may be directly dependent on the actual level of TAL. It was also expected that the survey would offer insights into the ways to maximize teacher training in LTA. Our initial task, in this respect, was to determine all aspects of classroom summative assessment that are empirically observable and assessable. We proceeded from the assumption that reliable university summative assessment should first be uniform for specific groups of learners (year of studies, specialty). It means it should aim at measuring the level of the same skills that are determined in curricula, use the same testing techniques, be collected within the same procedure (written test paper, timing), and be graded based on the same criteria. Second, the process should adhere to the test development cycle the major stages of which are planning (defining the test construct) and the collegial choice of testing techniques, followed by item/tasks writing, pre-testing, and test modification. The quality of the developed test should be necessarily assured by those managers or teachers whose level of assessment literacy is higher than average. Third, test administration procedure should be strictly followed as far as real-life educational context allows. We refer to maintaining test transparency (informing students about what is going

182 Olga Kvasova and Vyacheslav Shovkovy

to be measured on the test) and test security (preparing more than one variant of test papers, conducting assessments within relatively close dates, ensuring academic honesty). No less important are accuracy and timeliness of grading test papers, documenting and reporting test results, and analysis of the evidence collected. The post-administration assurance of the test quality is also viewed as advisable, if not mandatory, in terms of promoting the development of better, valid tests in the future. Fourth, since the summative assessment that we explore is classroom-based or teacher-constructed, it should be complemented by feedback provided to learners particularly in terms of mid-term assessment. Feedback should be timely and effective, otherwise it is useless (Coombe et al., 2007). Feedback may and should impact students’ determination to learn better. Fifth, washback, or feedback from assessees on their satisfaction with the fairness of test results, should be monitored and regulated. So far, the studies show that the lowest level of student satisfaction refers to grades and feedback (Norton, 2009; HEA, 2012). Finally, the evidence collected through summative assessment should be supported by multiple measures, such as alternative forms of assessment. This possibility puts classroom-based, context-relevant assessment at an advantage over large-scale testing which is absolutely context-free. Given internationally determined perspectives to involve alternative types of assessment in the function of summative assessment (HEA, 2012), these forms of assessment should also become integrated in the summative classroom-based assessment in Ukraine. The above reflections resulted in distinguishing pre-requisites to reliable summative assessment, such as: 1

2 3 4 5

Uniformity of requirements towards summative assessment. In Ukrainian education, this is understood as coordination and consistency amongst the regularity of test administration and the reporting of test results, equivalence of tests set to all cohorts of students that were taught the same curriculum, and uniformity of test administration procedure. Adherence to the principles that ensure development of a valid test (test development cycle). 3) Providing feedback. Monitoring washback. Use of alternative assessment. Prospects of enhancing quality of summative tests, aka teachers’ assessment literacy. The research questions in this study are:

1 2

To what extent do the perceptions of reliability–trustworthiness by central stakeholders overlap? To what extent do the actual procedures of test development and administration enable reliable summative assessment?

Reliability of classroom-based assessment 183

3

What ways of enhancing TAL are perceived as effective in today’s Ukrainian higher education?

To resolve these questions, we employed a qualitative method of investigation – surveying the perceptions of summative assessment by three groups of respondents. The three questionnaires included questions that were centred around the pre-requisites to reliable summative assessment identified by us. In each questionnaire, the questions reflected the actual conception of assessments typical of and relevant to different groups of respondents. We aimed to elicit information related to all specified prerequisites from different perspectives, therefore, some of the questions overlapped in two or three types of questionnaire, which allowed for comparability of the perceptions. Implementation Methods of research The questionnaires were prepared in consultation with specialists from the Academy of Higher Education of Ukraine; before administering the survey, we pre-tested the questionnaires with the help of university managers (3), fellow teachers (6), and students (15). Questionnaire 1 was intended for university managers. It purported to elicit information and personal perceptions of the aspects that immediately reflected responsibilities of the organizers and supervisors of summative assessment within their departments, as well as the person accountable for assessment results. This questionnaire consisted of 22 questions including 20 with the option ‘own answer’, 1 open-ended, and 1 requesting rankordering. Several questions concerned uniformity, test development, and quality assurance issues. There were also questions focused on the development and administration of mid-term tests and end-of course tests, which are fairly high-stake assessments for many students. The two later questions inquired about the prospects of improving the quality of summative tests through enhancing TAL. Questionnaire 2 was designed for teachers and included 28 questions (27 with the option ‘own answer’ and 1 requesting rank-ordering). Part of the questions coincided with those aimed at university managers (uniformity of test papers and administration procedures, test preparation procedure) although they were formulated from a somewhat different perspective – of the staff responsible for maintaining uniformity of tests and administration, as well as for test development and ensuring its quality/validity. Another part of the questions reflected teachers’ practices in terms of feedback provision and use of alternative assessments. Teachers were also invited to share their perceptions of possible washback. Question 27 inquired about the forms of

184 Olga Kvasova and Vyacheslav Shovkovy

training in assessment literacy actually received by respondents. Question 28 inquired about the prospects of enhancing TAL and the formats preferred by respondents. Questionnaire 3, consisting of 20 questions with the option ‘own answer’, was meant to elicit information about summative assessments from students as major stakeholders in the learning and assessment process. This questionnaire included the questions that could elicit students’ perceptions of the uniformity of test papers and test administration. Other questions inquired about effectiveness of feedback, use of alternative assessments and possible washback. The focus of some questions was placed on students’ satisfaction with their results on summative tests. So, the questionnaires were designed in a way that allowed us to correlate and cross-check the information about the major pre-requisites to fair and equitable summative assessment. Participants The participants in the survey were three distinct though interconnected groups of respondents: 1) university managers (UMs), 2) teachers (Ts), and 3) students (Sts). Eleven institutions from all regions of the country were involved in the survey (Western, Southern, Eastern, and Central parts of Ukraine), which made our sample fairly representative in terms of reflecting local practices. Although we collected many more responses than initially planned, we had to exclude a considerable number of inaccurately completed questionnaires. In the end, we processed 10 questionnaires of UMs, 50 of those collected from Ts, and 50 questionnaires completed by Sts. Participation in the survey was anonymous (excluding the UMs) and voluntary. To ensure full comprehension of the questions by all groups of respondents, questionnaires were formulated in Ukrainian, with the metalanguage excluded. Data collection The survey was conducted in paper-and-pencil format in the autumn of 2018. The sets of responses that arrived from each university contained responses provided by: one university manager (head of department), 5–10 teachers working for those departments and 5–10 students who were taught by those teachers. Consequently, the responses collected from three groups of respondents in each of the ten institutions allowed the researchers to note salient features of the assessment practices adopted in each local context. Although it would not be difficult to align all three groups of evidence and arrive at certain conclusions, for ethical reasons we did not do that, thus leaving the original scope of the study unchanged.

Reliability of classroom-based assessment 185

Results and discussion In this section of the chapter we will present and interpret the findings in line with the previously determined pre-requisites: uniformity of requirements towards summative assessment; adherence to the principles that ensure development of a valid test (test development cycle); feedback; washback; alternative assessment; actual TAL; and preferred formats of enhancing quality of summative tests and TAL. Uniformity of requirements to summative assessment is viewed by us as the major issue that promotes fairness of assessments across all cohorts of assessees in the local context; therefore, the issue was considered from three perspectives of the stakeholders. While the UMs and Ts were asked direct questions about the uniformity, the students were asked in an indirect way which could cross-check the perceptions by the two other categories of the respondents (educators). As had been expected by us, both UMs and Ts confirmed the existence of uniform requirements; at the same time, a much smaller percentage of Sts (61%) revealed their sensitivity towards tests being identical or different for all cohorts of assessees, whether tests existed in several variants and were conducted on the same day for all students. A more specific question, related to uniformity of test administration procedure, was targeted at educators only: 100% UMs and 79% Ts confirmed that tests were administered in the equally proper way across all courses. When asked if they thought that the procedure was followed by all teachers, both respondent groups’ beliefs appeared much more moderate, with a slight variance. UMs believed that 50% of the staff never violated the procedure while 50% did. In Ts’ opinion, 52% of their fellow teachers always administered tests by the rules, with only 6% (cf. 50% mentioned by UMs) violating them and 39% doing so occasionally. We may imply that the educators perceived the requirement to ultimately conform to the rules yielded their confident responses to a direct question; while answering an indirectly formulated question they expressed their personal, non-prejudiced point of view, thus revealing the existing practices. Sts’ perceptions of the uniformity of assessment were not affected by the educators’ established views on job discipline, which suggested that the issue of uniformity was essential for them to a certain extent. As was discussed in Research rationale, it is more likely for any test, especially a high-stakes summative one, to be valid if its preparation proceeded via all mandatory stages of the test development cycle. Therefore, in the educator questionnaires, a special focus was placed on aspects of collaborative test design and its further quality assurance. Table 10.1 Perceptions of uniformity of requirements to summative assessments Managers

Teachers

Students

100%

97%

61%

186 Olga Kvasova and Vyacheslav Shovkovy

According to UMs, the construct of mid-term tests (MTT) and end-of-course tests (ECT) was discussed and determined by the department staff in 60% and 80% of incidences, respectively. We explain the bigger amount of collegial effort required for ECT preparation by the higher degree of educators’ responsibility in decision-making. Furthermore, we specifically aimed to find out who exactly participated in construct defining. According to UMs, these people most frequently included unit leaders (70%), as well as all staff of the department (30%). As far as the Ts are concerned, they claimed that 79% of them participated in discussing and determining the construct of either the MTT or ECT. The involvement of all staff in defining the test construct, in our view, served a sound precondition to ensuring construct relevance to curricula. Moreover, in case some teachers had not covered certain aspects of the curriculum in their teaching, they still had time to bridge that gap in order to eliminate possible construct under-representation; otherwise, both test validity and reliability would be put at stake. Like defining the test construct, test development is another stage of the testing cycle that needs collaboration. The responses collected from UMs suggested that the test development process in 80% of incidences engaged unit leaders, whereas 20% of respondents admitted that tests were prepared by each teacher independently; the data are quite compatible with the responses of Ts who claimed they were engaged in MTT (91%) and ECT (70%) development. Nonetheless, the practice of each teacher independently developing tests, as mentioned in 20% of UMs’ responses, is in obvious conflict with their previous claims about uniformity of summative tests. It was also interesting to find out whether teacher-constructed tests underwent quality assurance and what staff were involved in that. As UMs admitted, the quality of an MTT was assured in 50% of incidences, and the quality of an ECT in 70%. These percentages do not seem sufficient for ensuring validity and sustained quality of high-stakes tests although we cannot compare these percentages with the data of any other research. Furthermore, the responses collected from UMs revealed that quality assurance was carried out almost totally by individuals: Head of department (60%), Deputy Head (10%); in 30% of incidences the quality was monitored by individual unit leaders. Ts’ participation was limited to 30% of permanent and 24% of occasional engagement in pre-testing and further discussion of the required improvements. A similarly low level of collaboration was observed regarding ECT quality assurance although it appeared more representative; apart from Heads, Deputy Heads, and unit leaders, experienced teachers, external reviewers were also involved. Regrettably, half of the respondent teachers had never contributed to quality assurance of an ECT, whereas 27 % were engaged occasionally, with only 21% participating in an ECT quality check. These data, to our mind, predominantly reflect aspects of UMs’ malpractices, such as: lack of assessment literacy, disregard of test quality owing to test developers’ credibility, or an

Reliability of classroom-based assessment 187

attempt to keep tests secure until their administration. Presumably, enhanced TAL would help resolve this situation irrespective of its specific cause/causes. The question in Table 10.2 was a direct one; it was aimed at all groups of respondents and purported to elicit the degree of satisfaction with the test quality. The responses revealed that the lowest degree of satisfaction was voiced by UMs – 30%. The majority of Ts were totally (70%) and partially (27%) satisfied, and only 3% totally dissatisfied with the test quality. Sts appeared the most loyal stakeholder group: 58% of them were totally and 42% partially satisfied with the test quality which reflects their overall belief in actual validity of the summative tests that were set for them. Another direct question aimed to reveal the respondents’ perceptions of the tests as valid or not; the question was formulated differently for Ts (if ‘tests measure what is defined by the curriculum’), and for Sts (if ‘tests measure what you were taught’). The responses from the two groups were considerably different: 73% of teachers and 54% of students thought that all tests set at the department were valid whereas 27% of teacher respondents and 44% of students were inclined to question the validity of all tests. Could such mismatch be accounted for by the gap between the teachers’ perceived confidence in the construct relevance and the actual construct overrepresentation perceived by the test takers? Does this imply the necessity to build assessment literacy of both categories of the assessment agents, as Pill & Harding (2013) state? These questions could become a focus of further studies. Evidence of the educators’ understanding of test quality and ways to enhancement it was complemented by the responses to the following two questions. When asked if the quality of the tests needed in-depth analysis, 70% of UMs confirmed this need, whereas 52% Ts noted they were quite content with the present situation. While all UMs stated that the test quality should be improved, less than a half of Ts agreed with them. Here we raise the issue of the teachers’ possible indifference, if not apathy, for the improvement of assessment instruments. However, the teachers managed to provide clear answers to the question about tests’ obvious drawbacks: In their view, tests had irrelevant task difficulty (61%), scoring criteria (36%), and task validity (33%), as well as inconsistent structure and an irrelevant number of tasks (21%). Again, UMs’ responses were more focused: 67% claimed tests

Table 10.2 Satisfaction with the test quality Respondent groups

Always

Not always

Managers

30%

70%

Teachers Students

70% 58%

27% 42%

Never 3%

188 Olga Kvasova and Vyacheslav Shovkovy

lacked relevant scoring criteria, 60% noted mismatches in difficulty of tasks, and 50% questioned task validity. Here we need to note that in Ukrainian universities summative tests are not necessarily locally developed. In considerable number of cases, Ukrainian teachers use ready-made tests offered by the coursebook authors with or without adapting them, or compile tests from available test tasks which, in their view, are relevant to the learners’ language level. The use of such tests entails certain risks, e.g. irrelevant test construct, structure, timing, etc. From this perspective, use/adaptation of ready-made tests may also be demanding for teachers (Hasselgreen et al., 2004; Vogt & Tsagari, 2014) who should be specifically trained in it. The question about the test results was addressed by us from three perspectives. As far as UMs’ perception is concerned, only 20% of the respondents believed that the measurements were objective, with the majority (70%) taking a different view. Ts and Sts displayed almost identical perceptions: 58% in both groups of respondents believed that the assessments were objective and reliable, with 36% and 34% respectively admitting that tests did not always yield fair results. However, 50% of Sts claimed they were not always satisfied with their test scores obtained either on an MTT or an ECT. Frequent dissatisfaction with the test results was also expressed by the majority of UMs (70%). This group of respondents, as interpretation of the data suggests, was the most consistent in their judgement about summative assessments of which they were in charge. The data revealing Ts’ satisfactions with the test results could be interpreted as possible discontent with the test scores obtained through an unreliable instrument of assessment, or as a result of mismatch between the amount of effort put into teaching and the documented outcomes of learning, or of displeasure with students’ not meeting their expectations (e. g. underperformance of advanced learners, multiple evidence of dishonesty/cheating). Such pre-requisites to implementing reliable assessments as feedback, washback, and use of alternative assessment were surveyed primarily from the perspectives of Ts and Sts. The responses to the majority of questions revealed identical perceptions. For instance, we managed to determine that feedback tended to be timely in most incidences; teachers provided both oral explanation and detailed advice on improving performance. Surprisingly, a lower Table 10.3 Perceptions of test objectivity vs satisfaction with test results/grades (%) Respondent groups

Always satisfied test objectivity

Not always satisfied

Managers Teachers

20 58

test results test objectivity 20 70 55 36

Students

58

40

34

test results 70 37 50

Never satisfied test objectivity 10

test results 10 8 10

Reliability of classroom-based assessment 189

percentage of Ts (21%) than Sts (34%) mentioned written comments on tests. Similar favourable responses testified to the absence of a negative impact of the assessment results on students’ further attitudes to learning. Yet conversely, we admit that alternative assessments had not become a widely adopted practice in all surveyed institutions. The challenges of implementing alternative assessments in the classroom are recognized and experienced globally (Carless, 2007; Douglas, 2010; Meletiadou & Tsagari, 2016), which reduces the prospects of employing them as universal tools of summative assessment in the near future. The findings of the survey allowed us to identify the ways towards enhancement of reliability–trustworthiness of summative assessments in universities. The responses to open-ended questions obtained from UMs resonated with one another in that they stressed the necessity of raising the teachers’ awareness in LTA, of motivating teachers’ willingness to enhance their own assessment literacy, of encouraging team work/collaboration across test design, of developing and pretesting, as well as of engaging teachers in assessment modification based on the analysis of performance, and feedback from other teachers and students. Responding to the question about their actual level of assessment literacy, the Ts indicated all opportunities which promoted their training in LTA. Among them the most frequent were participation in the seminars held at departments by their staff (88%) and in workshops by visiting international experts (64%), self-study (64%), participation in conference(s) on LTA (48%), distant courses (39%), traineeship in Ukrainian universities (30%), etc. Figure 10.1 visually represents that the teachers had a variety of opportunities to enhance their AL, although in the majority of cases (88%) they gained

Other

StaffÉ

Short-É

TraineeshÉ

TraineeshÉ

DistantÉ

Self-study

WorkshoÉ

StaffÉ

100 90 80 70 60 50 40 30 20 10 0

Figure 10.1 Training in LTA received by respondent teachers (%)

190 Olga Kvasova and Vyacheslav Shovkovy

knowledge about LTA via the staff seminars conducted by teachers at the same rank as themselves. 64% of the respondents managed to participate in the workshops by visiting international experts and another 64% improved their TAL through self-study. All other opportunities had been taken by less than 50% of respondents. The training experiences of the respondents, although being far from systematic, allowed the Ts to express their considered opinion about the most effective formats. Below is the ranking of formats in order of effectiveness: Table 10.4 Ranking of training format in order of effectiveness Rank

Format

Score

1 2 3 4 5 6 7 8 9 10 11

Traineeship abroad Workshops by international experts Workshops by national experts Short-term courses Staff seminars LTA conferences Traineeship in Ukraine Longer courses Self-study Distant courses Other

4,97 4,2 3,59 3,25 3,23 3,19 2,8 2,54 2,19 1,76 0

As is seen in Table 10.4, the respondents associated the most effective way to enhance TAL was with traineeship abroad; the 2nd and 3rd preferred formats included workshops conducted in Ukraine by experts in LTA, as well as workshops led by Ukrainian experts. We explain such appreciation of these formats by looking at ongoing processes in Ukrainian education. The increased job responsibilities related to frequent summative assessment put numerous questions to teachers without clear answers provided by national policy makers. At the same time, four recent years witnessed a growth of opportunities for teachers to participate in the training events organized by UALTA; it is probably due to the high standards set by the invited experts that heightened teachers’ expectations for international collaboration. Staff seminars, which had been mentioned in the responses to the previous question as the most accessible format, and conferences, which had been sustainable leaders among scholarly conventions, were considerably downgraded by the respondents. The issue of whether or not this resulted from Ts’ discontent with the staff seminars’ insufficient informativeness, or the scarcity and insufficient quality of research into LTA reported at conferences, as well as the conference format in itself, needs to be ascertained by a special survey of academia.

Reliability of classroom-based assessment 191

None the less, we believe it is quite within our grasp to account for an increased rating of short-term courses in LTA; moreover, we hypothesized it. Although only 20% of the respondents had an experience of short-term courses in LTA, Ts assumed that intensive short-term training in LTA would be effective and ranked it 4th. On the one hand, this is indicative of the recently established practices of conducting week-long winter/summer schools for teachers in the country. On the other hand, the idea of training in LTA meets urgent demands for implementing reliable–trustworthy assessments in higher education. Additionally, such shortterm courses have been piloted by us and were found quite effective (Kvasova, 2016). On the contrary, longer-term courses, as well as traineeship in Ukrainian universities, distant courses, and self-study, did not meet the respondents’ expectations in terms of their effectiveness, placing them 7th–10th in the rating. We can assume, however, that these formats were perceived as quite demanding when it comes to material and human resources (time, effort, cost). We also had a look into UMs’ perceptions of the effectiveness of the ways to enhance TAL and correlated them with those discussed above. As is seen in the diagram, a total agreement is observed in placing short-term courses on LTA in the middle, and ranking distant-courses the lowest. In two other incidences, regarding long-term courses and traineeship in Ukrainian universities, the indices have a larger variance, since managers are responsible for the smooth flow of instructional process in their departments whereas absence of staff at their workplace could impede it. In all other incidences, the graph reveals a similar tendency with both groups of the respondents, although Ts provided more generous scores to the preferred formats of training events. The interpretation of the data allowed us to arrive at conclusions regarding all questions of this research. It appeared that the perceptions of reliability– trustworthiness by the three groups of informants diverged considerably wherever they could be compared. The curve indicating the major stakeholders’ (Sts) data stretches steadily along medium indices, which points to the respondents’ undetermined perceptions of all aspects excluding the most meaningful for them – their satisfaction with the

3.25 2.25

4.20 3.4

3.59 3.1

3.19 1.87

3.23 3.1

Teachers

Figure 10.2 Preferred formats of training in LTA

2.54

4.97 3.66

0.91

Managers

2.80 0.87

2.19 1.58

1.76 1.7

192 Olga Kvasova and Vyacheslav Shovkovy 100 97 61

70 58

70 58

58 55 40

30

Uniformity

Test quality Managers

Objectivity Teachers

Test results

Students

Figure 10.3 Comparison of perceptions of summative assessment reliability

obtained grades. The educators’ responses agree only once, in respect to the uniformity of summative assessment; their indices of satisfaction with the test results, however, are quite close to each other but are somewhat higher than that of the Sts. While the Sts’ and Ts’ perceptions of objectivity coincide on mark 58, the UMs reveal a greater degree of certainty in test reliability; in fact, this contradicts their overall rigorous stance on summative assessment. On the whole, the UMs revealed the most critical and consistent evaluation of all aspects of test development, administration, and analysis of test quality. We attribute such perceptions primarily to great responsibilities of the managerial job they perform, as well as their generally relevant capacity to organize and monitor summative assessments. The data also suggest that UMs have a fairly good control over summative assessment implementation although the degree of collaboration on some stages of test development (e.g. quality assurance), in our view, needs reconsidering. Additionally, UMs should be credited for the effort invested in TAL enhancement, in particular, for organizing staff seminars on LTA issues at the departments they head. The data obtained from Ts enabled a more detailed view on assessments procedures. The practices testify to assessments being developed and administered in compliance with setting-specific requirements although we noted that some relevant principles of test aspects were seriously compromised. The most obvious reason for this lies in the lack of solid, specialized training in LTA; however, even if teachers had a proper level of TAL, they would evidently be in need of UMs’ support in terms of creating conditions conducive to collaborative test development and quality assurance. Nevertheless, what is clearly positive in the assessment practices surveyed by us is the agreement in the Ts’ and Sts’ perceptions of feedback efficiency and the absence of any negative impact of assessment on further learning. The limitations of the study relate to subjectivity of a survey as a method of investigation. However, the focus on the same concepts from different perspectives allowed us to collect compatible data and enabled insights into real-life

Reliability of classroom-based assessment 193

practices in various local contexts. Drawing on these insights, we formulate the practical implications for implanting summative assessment in universities. First and foremost, department staff and UMs need to achieve coordination in defining the test construct and its relevance to the curricula. It is desirable that the test development should involve a more representative team of teachers. Our practice suggests that test preparation that engages, along with unit leaders, some other motivated teachers, enables more staff members to feel ownership of the test, as well as responsibility for its proper quality, administration, and reliable results. Additionally, working on tests together, the teachers share their knowledge about LTA and enhance TAL in their setting. However, test developers should not confine themselves to their department/institution, but network with colleagues from other departments/institutions. So far, such networking has been promoted by UALTA; the principal goal pursued by the authors of the chapter is to run short-term courses in LTA for practicing university teachers. The courses, apart from enhancing TAL, would expand research into alternative assessment that has found fairly limited use in Ukrainian classroom along with students’ assessment literacy and assessment agency.

Conclusion Reliability–trustworthiness of summative assessments in Ukrainian universities has been considered in the study from the perspective of central stakeholders. The information obtained from all groups of respondents shed light on the assessment practices typical of universities from across the country. The results confirm the existing views on TAL as a cornerstone of objective and equitable summative assessment. Of particular interest for the authors of this chapter are the educators’ suggestions about possible ways of improving reliability of summative assessments; both groups of educator respondents found it critical to raise TAL, identifying the preferred formats of training in LTA – traineeship, workshops, and shortterm courses. To conclude, we received salient evidence of some progress in building TAL in the surveyed universities and identified the most meaningful formats of TAL enhancement; this stimulates follow-on studies as well as practical organizational steps.

References Alderson, J. C. (1999, May). Testing is too important to be left to testers. Plenary address to the Third Annual Conference on Current Trends in English Language Testing, United Arab Emirates University. Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. Cambridge University Press.

194 Olga Kvasova and Vyacheslav Shovkovy Bachman, L., & Palmer, A. (1996). Language testing in practice: Developing useful language tests. Oxford University Press. Bolitho. R., & West, R. (2017). The internationalisation of Ukrainian universities: the English language dimension. British Council. https://www.teachingenglish.org.uk/sites/tea cheng/files/Pub-UKRAINE-REPORT-H5-EN.pdf Brown, G. T. L. (2019). Is Assessment for Learning Really Assessment? Frontiers in Education, 4. doi:10.3389/feduc.2019.00064 Chapelle, C. A. (2013). Reliability in Language Assessment. In C. A. Chapelle (Ed.), The Encyclopedia of Applied Linguistics (pp. 4918–4923). Blackwell/Wiley. Carless, D. (2006, September 6–9). Developing synergies between formative and summative assessment. Paper presented at the British Educational Research Association Annual Conference, University of Warwick. http://www.leeds.ac.uk/educol/documents/ 159474.htm Carless, D. (2007) Learning-oriented assessment: conceptual bases and practical implications. Innovations in Education and Teaching International, 44(1), 57–66. Coombe, C., Folse, K., & Hubley, N. (2007). A Practical Guide to Assessing English Language Learners. The University of Michigan Press. Dovgopolova, I. V. (2011) Vprovadzhennia testovoi metodyky v protsess navchannia u vyshchyh navchalnyh zakladah. [Integration of language testing into language learning in higher education institutions]. Vyscshia shkola, 2(20), 41–50. Douglas, D. (2010). Understanding language testing. Routledge. Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment Quarterly, 9(2), 113–132. Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource book. Routledge. Giraldo, F. (2018). Language assessment literacy: Implications for language teachers. Profile: Issues in Teachers’ Professional Development, 20(1), 179–195. Green, A. (2016). Assessment literacy for language teachers. In D. Tsagari (Ed.), Classroom-based Assessment in L2 Contexts (pp. 8–29). Cambridge Scholars Publishing. Green, A. (2014). Exploring language assessment and testing: Language in action. Routledge. Gareis, C. R., & Grant, L. W. (2015). Teacher-made assessments: How to connect curriculum, instruction, and student learning (1st edition). Routledge. Harlen, W. (2007). Designing a fair and effective assessment system. Paper presented at the BERA Annual Conference: ARG Symposium Future Directions for Student Assessment. University of Bristol, Bristol. Harlen, W. (2004). A systematic review of the evidence of reliability and validity of assessment by teachers used for summative purposes. EPPI-Centre, Social Science Research Unit, Institute of Education. https://eppi.ioe.ac.uk/cms/Portals/0/PDF%20reviews%20and% 20summaries/ass_rv3.pdf?ver=2006-03-02-124720-170 Hasselgreen, A., Carlsen C., & Helness, H. (2004). European survey of language testing and assessment needs: Report: Part one – general findings. European Association for Language Testing and Assessment. http://www.ealta.eu.org/documents/resources/survey-rep ort-pt1.pdf Hidri, S. (2016). Conceptions of assessment: Investigating what assessment means to secondary and university teachers. Arab Journal of Applied Linguistics, 1(1), 19–43. Houston, D., & Thompson, J. N. (2017). Blending formative and summative assessment in a capstone subject: ‘It’s not your tools, it’s how you use them’. Journal of University Teaching & Learning Practice, 14(3).

Reliability of classroom-based assessment 195 Hughes, A. (2003). Testing for language teachers. Cambridge University Press. Inbar-Lourie, O. (2013). Language assessment literacy: What are the ingredients? Paper presented at the 4th CBLA SIG Symposium Programme, University of Cyprus. Kvasova, O. (2016). A case of training university teachers in developing and validating classroom reading test tasks. In D. Tsagari (Ed.), Classroom-based Assessment in L2 Contexts. (pp. 54–74). Cambridge Scholars Publishing. Kvasova, O. (under review). Will a boom lead to a bloom? Or how to secure a launch of language assessment literacy in Ukraine. In D. Tsagari (Ed.), Language Assessment Literacy: From Theory to Practice. Cambridge Scholars Publishing. Kvasova, O., & Kavytska, T. (2014). The assessment competence of university foreign language teachers: A Ukrainian perspective. Language Learning in Higher Education, 4 (1), 159–177. Lau, A. M. S. (2017). ‘Formative good, summative bad?’ – A review of the dichotomy in assessment literature. Journal of Further and Higher Education, 40(4), 509–525. McNamara, T. (2000). Language testing. Oxford University Press. McNamara, T., & Hill, K. (2011). Developing a comprehensive, empirically based research framework for classroom-based assessment. Language Testing, 29(3), 395–420. Meletiadou, E., & Tsagari D. (2016). The washback effect of peer assessment on adolescent EFL learners in Cyprus. In D. Tsagari (Ed.), Classroom-based Assessment in L2 Contexts (pp. 138–160). Cambridge Scholars Publishing. Muñoz, A. P., Palacio, M., & Escobar, L. (2012). Teachers’ beliefs about assessment in an EFL context in Colombia. PROFILE, 14(1), 143–158. Norton, L. (2009). Assessing student learning. In H. Fry, S. Ketteridge, & S. Marshall (Eds.), A handbook for teaching and learning in higher education: Enhancing academic practice (3rd edition, pp. 132–149.) Routledge. Olendr, T. M. (2015). Deiaki problemy vykorustannia innovatsiynyh tehnologiy kontroliu y otsiniuvannia znan studentid na zaniattiah z inozemnoi movy pf professiynym spriamyvanniam. [The issues of implementing innovative methods of assessment in ESP classroom]. Pedagogichnia nauky: teoriia, istoriia, innovatsiyni tehnologii, 4(48). http s://repository.sspu.sumy.ua/bitstream/123456789/1622/1/Deiaki%20problemy% 20vykorystannia%20.pdf Pill, J., & Harding, L. (2013). Defining the language assessment literacy gap: Evidence from a parliamentary inquiry. Language Testing, 30(3), 381–402. Shadrina, T. (2014). Testuvannia v umovah kredytno-modulnoi systemy: perevagy ta nedoliky (na prykladi vyvchennia ukrainskoi movy iak inozemnoi. [Testing Ukrainian as L2 in ECTS-compliant context: advantages and drawbacks]. Teoriia ta praktyka vykladannia ukrainskoi movy iak inozemnoi, 9, 60–67. Taylor, L. (2009). Developing assessment literacy. Annual Review of Applied Linguistics, 29, 21–36. The Higher Education Academy (HEA). (2012). A marked improvement. Transforming assessments in higher education. The Higher Education Academy. https://www.heaca demy.ac.uk/system/files/A_Marked_Improvement.pdf Tsagari, D., Vogt, K., Froehlich, V., Csépes, I., Fekete, A., Green A., Hamp-Lyons, L., Sifakis, N., & Kordia, S. (2018). Handbook of Assessment for Language Teachers. Teachers’ Assessment Literacy Enhancement. http://taleproject.eu/pluginfile.php/ 2129/mod_page/content/12/TALE%20Handbook%20-%20colour.pdf Vogt, K., & Tsagari, D. (2014). Assessment literacy of foreign language teachers: Findings of a European study. Language Assessment Quarterly, 11(4), 374–402.

Part 4

Language assessment literacy: Interfaces between teaching and assessment

Chapter 11

To teach speaking or not to teach? Biasing for the interfaces between teaching Diana Al Jahromi

Introduction As social creatures, people need to communicate in social settings in order to survive and maintain relationships of friendship, enmity, comradeship, and acquaintanceship by means of oral and/or written language. More than 340 million people speak English, which functions as the lingua franca around the globe (Celce-Murcia, 2013; Tarone, 2005; Ounis, 2017; Ur, 2012). Accordingly, globalization and the marketplace have necessitated oral and written English proficiency and it is considered to be one of the key graduate attributes (Koo, 2009). Given the current status of English and the major role it has been playing in education and commerce in the Arab countries and, more specifically, in the Arabian Gulf region since the beginning of the 20th Century, overall English proficiency is considered a pressing prerequisite for swift employment and successful career (Ministry of Labour and Social Affairs in Bahrain, 2017). Being able to successfully communicate in English in corporate-level conversations is a demanding criterion to which the educational ecosystem, and more specifically higher education institutions, need to pay additional attention when designing curricula. The significance of oral performance proficiency and speaking as a productive skill have been highly stressed by curriculum specialists, educationists, and researchers (Celce-Murcia, 2013; He, 2013; Hismanoglu, 2013; Liu, 2012; Lu & Liu, 2011; Ounis, 2017, Yaikhong & Usaha, 2012). However, looking at the regional and local status of English instruction reveals some adversarial realities, as the educational systems place high importance on reading, writing, and the teaching of form and function, while less attention is given speaking. Thus, it is considered the most difficult skill to master due to a number of factors caused by negligence such as the fear of negative evaluation by peers, low lexical richness, and lack of formation skills and practice (Al Asmari, 2015; Al Hosni, 2014). This chapter aims to explore two dimensions of the assessment of the speaking skill: 1) the current practice of assessing speaking in tertiary education in Bahrain with reference to some pedagogical structures and to a number of cognitive variables related to Second Language Acquisition (SLA) and language

200 Diana Al Jahromi

learning, and 2) an analysis of the possible provisional means of teaching and assessing the speaking proficiency of L2 learners with the intent to empower them and equip them with the skills they need to succeed in the job market and in their careers. According to statistics from the Ministry of Labour and Social Affairs in Bahrain, the majority of graduates with good oral skills in English are those who get jobs and climb up the career ladder faster than those who have poor oral proficiency in English. Bygate (1987) considers speaking ‘the vehicle par excellence of social solidarity, of social ranking, of professional advancement and business’ (p.vii).

Research questions The present study aims to answer the following questions: 1 2 3 4 5

Do L2 university learners have speaking anxiety? How is speaking being taught and assessed in tertiary L2 programs? What are L2 students’ perceptions of the teaching and assessment of the speaking skill? What is the relationship between students’ speaking anxiety and their perceptions of the teaching and assessment of the speaking skill? What is the relationship between students’ speaking anxiety and their exposure to extracurricular spoken English?

Review of the literature Richards (2008) defined speaking as being engaged in ‘meaningful interaction’. He also stated that ‘maintaining comprehensible and ongoing communication despite limitations’ in the communicative competence of speakers (p.14), involves the appropriate use of spoken discourse in different social contexts in which purposeful interactions amongst interlocutors take place at different levels. In addition to being interactional, speaking can be used for transactional purposes in order to communicate information (Celce-Murcia, 2013; Harmer, 2010; Kingen, 2000; O’Sullivan, 2006). Kingen (2000) enumerates 12 functions of speaking: the personal, descriptive, narrative, instructive, questioning, comparative, imaginative, predictive, interpretative, persuasive, explanatory, and informative. Literature in the area of SLA and English as a Foreign Language (EFL) has persistently considered speaking proficiency imperative for the cultivation of L2 learners’ ability to communicate adaptably in cross-cultural contexts in which the exchange of information is mandated (Al Hosni, 2014; Hidri, 2017; Talley & Hui-ling, 2014). The significance of the speaking skill in L2 settings, and its mastery as a key performance indicator of overall language proficiency, has been widely researched and echoed in the literature. Turning to regional literature, it is evident that awareness of the importance of investigating the status

Biasing for the interfaces between teaching 201

of the speaking skill has inspired a lot of researchers in that region, given the number of studies in the field during the last decade (Bashir, 2014; Heng, Abdulla, & Yusof, 2012; Hidri, 2018; Mahmoodzadeh, 2012; Mak, 2011; Yahya, 2013). The majority of these studies acknowledged the disposition of the teaching and assessment of speaking and related it to the influence of speaking anxiety in EFL settings and to inadequate teaching methodologies and practices (Hidri, 2017). Acknowledging the factors behind the lack of speaking proficiency paves the path for appropriate pedagogical implications and recommendations (Gebril & Hidri, 2019).

Foreign language speaking anxiety One of the reasons that speaking continues to be one of the most difficult skills to be mastered by L2 learners (Elmenfi & Gaibani, 2016; Ounis, 2017) could be related to anxiety (Hanifa, 2018). Anxiety is defined as the ‘subjective feeling of tension, apprehension, nervousness, and worry associated with an arousal of the automatic nervous system (Spielberg, 1983, p.111). Anxiety has three types: trait anxiety related to the individual character, state anxiety associated with temporary moments, and situation-specific anxiety. It is often based on two models: the retrieval model and the interference model (Woodrow, 2006). The first refers to the inability to retrieve previously-learnt knowledge at the time of sending the message out, while the second focuses on the lack of skills and knowledge during the learning process. The correlation between SLA and anxiety has long been a subject of research. Similarly, given how it may impede language proficiency in general and speaking proficiency in particular, foreign language speaking anxiety has attracted a sufficient number of studies. Foreign Language Anxiety (FLA), was first defined by Horwitz, Horwitz, and Cope (1986) as ‘a distinct complex construct of self-perceptions, beliefs, feelings, and behaviours related to classroom language learning arising from the uniqueness of language learning processes’ (p.128). Young (1994) defines FLA as the ‘worry and negative emotional reaction aroused when learning or utilizing a second language’ (p.27). Literature has accordingly shown the negative impact speaking anxiety can have on the linguistic input and output of learners and on students’ academic performance and social behaviour in L2 settings (Arnaiz & Guillen, 2012; Ezzi, 2012; Hanifa, 2018; Liu, 2012; Park & French, 2013; Zhou, 2016). For instance, Gregersen’s (2003) study showed how having a high level of FLA negatively affected the process of language learning. Some other studies have investigated the negative correlation between foreign language anxiety and a number of variables such as age, gender, L2 prior experience, etc. (Fakhri, 2012; Mohammadi & Mousalou, 2013; Wang, 2010). With particular reference to gender, these studies argue that a person’s spoken discourse is affected by gender and society. While males, for instance, tend to project more confidence and energy while communicating (Yousif, 2016);

202 Diana Al Jahromi

females tend to be more introvert and cautious. Coates (2014) acknowledges that while females are often communicating silently when among males, they ‘talk as if there is no tomorrow’ in male-free communicative situations (p.45). Other differences include levels of politeness, interruption, turns, and loudness (Coates, 2014). Gender dominance and proclamation of power is evident in mixed-gender conversations (Al Qahtani, 2013). Saville-Troike (2013) refers to the effect one’s religion and culture can have on their idiosyncrasy. Interlocutors with the same cultural or religious ‘inside’ backgrounds speak to each other differently than they do when communicating with ‘outsiders’. While Dörnyei (2005) confirms that the psychological state of L2 learners determines the success or failure of their language learning process, Horwitz, Horwitz, and Cope (1986) also refer to second language speaking anxiety and consider it ‘a specific anxiety reaction’ that, when combined with performance anxieties such as test anxiety, fear of being negatively evaluated, and communication apprehension, could lead to catastrophic learning experiences (p.125). Communication apprehension is ‘an individual’s level of fear or anxiety associated with real or anticipated with another person or persons’ (McCroskey, 1978, p.1). It is claimed that people with oral apprehension often feel anxious in L2 settings (Horwitz, Horwitz & Cope, 1986; McCroskey, 2016.). Different studies have used different measures and scales to measure FLA. Horwitz, Horwitz and Cope (1986) developed the Foreign Language Classroom Anxiety Scale (FLCAS), which has been widely used in a plethora of studies (Mahmoodzadeh, 2012; Mak, 2011). Mahmoodzadeh (2012) found a heightened foreign language speaking anxiety among more females than males, caused by their interlanguage meaning system. Mak (2011) used the same scale with 313 Chinese freshmen and revealed that fear of negative evaluation, negative self-evaluation, and speaking with native speakers were among the main factors triggering speaking anxiety in classroom settings. That said, other studies have used adopted versions of FLCAS such as Heng, Abdulla, and Yusof (2012) who investigated the speaking anxiety of 700 undergraduate Malaysian students. They found no notable correlation between gender and speaking anxiety, which was reported to be at a medium level. In addition to FLCAS, a number of other scales have been employed in several studies. Yaikhong and Usaha (2012) introduced a Public Speaking Class Anxiety Scale (PSCAS) aimed at measuring public speaking class anxiety in Thailand after adopting and modifying previous scales’ items. Another scale of speaking anxiety is the Second Language Speaking Anxiety Scale (SLSAS) developed by Woodrow (2006). Woodrow’s study used a confirmatory factor analysis (CFA) and developed a speaking anxiety scale to attest 275 Australian students’ speaking anxiety. Findings suggested that speaking in class during oral activities places additional pressure on students and adversely affects the level of speaking anxiety (Kayoaglu & Saglamel, 2013).

Biasing for the interfaces between teaching 203

Turning to the regional and local contexts, Arab learners of English seem to have high speaking anxiety in L2 settings (Alhamadi, 2014; Al Jahromi, 2012; Al-Shaboul, Ahmad, Nordin, & Rahman, 2013; Rabab’ah, 2016 Taha & Wong, 2016; Yahya, 2013. Yahya (2013 revealed that fear of negative evaluation was the key factor triggering speaking anxiety among Palestinian undergraduate students. Similar findings were reported by Elmenfi and Gaibani (2016). Rabab’ah (2016) claims that Arab learners encounter speaking difficulties because of inadequate teaching methodologies, and lack of practice and listening tasks, while Al Asmari (2015) attributes such difficulties to lessened motivation and strict evaluation techniques. Locally speaking, a review of the literature shows that there has been no published research on the status of speaking and FLA in the Kingdom of Bahrain. However, a number of unpublished undergraduate graduation studies have shown that university and school students consider the speaking skill more difficult to master than reading and writing skills. In one of these studies, only one third of high school students reported having oral fluency. While more than half of these studies reported having good communication skills, the majority of them expressed increased apprehension when randomly assigned to speak in in-class participation (Abbas, 2017). In Yousif’s (2016) study, more than 30% of the respondents who were undergraduate students reported increased anxiety when speaking in class because their L2 teachers did not teach or assess speaking or provide them with ‘real’ speaking opportunities. However, almost half of the respondents reported feeling at ease when communicating with family members and friends. Similarly, 93% of the respondents in Salman’s (2016) study revealed that they faced difficulties in speaking and attributed them to skill deficiency due to teaching (47%) and lack of confidence (46%). Taleb (2017) used the PSCAS to measure university students’ speaking anxiety and found a moderate level of anxiety. Categorically, students in these local studies attributed low proficiency in the speaking skill to a number of variables that are attested in the current study: the lack of 1) confidence, 2) native-like speaking opportunities, 3) speaking and communication skills courses, and 4) proper teaching and assessment practices and processes. Firstly, speaking anxiety could be instigated by a number of factors; one of the most fundamental could be related to students’ language competence and personal traits such as lack of confidence, fear of peer evaluation or teacher evaluation, shyness, etc. (Abbas, 2017; Al-Nasser, 2015; Elmenfi & Gaibani, 2016; Gan, 2012; Kayoaglu & Saglamel, 2013; McCroskey, 2016; Pathan, Aldersi & Alsout, 2014; Tanveer, 2007). In Abbas’ (2017) study, almost 60% of university students acknowledged having speaking anxiety, a large proportion of whom were high-achieving students while conversely low-achieving ones claimed having low anxiety during speaking. A similar number of students placed a high value on the impact having good speaking skills on their language proficiency. Speaking anxiety in this regard can be directly related Krashen’s Affective Filter Hypothesis (1987) and Vygotsky’s (1986) Social Constructivist Theory

204 Diana Al Jahromi

and Social Interaction Hypothesis. These theorists posit that L2 learning can only be successful when the learner is in anxiety-free learning settings and proximal zones of interaction. L2 learners of English often tend to feel apprehensive during speaking attempts or requests and tend to avoid situations in which they are expected to communicate orally by skipping classes, minimizing participation, and refusing to take part in oral presentations. When forced to speak, their avoidance defensive strategy is triggered, resulting in apprehension. Additionally, school and tertiary curricula design and education systems seem to comprehensively focus on reading and writing skills at the expense of the speaking skill (Celce-Marcia, 2013; Koran, 2015). This means students lack oral proficiency and become anxious when having to orally communicate. It seems that L2 speaking anxiety is not seriously addressed by practitioners and decisionmakers, who fail to gauge the effect of speaking anxiety on students’ academic performance and classroom behaviour (Basic, 2011).

Teaching and assessment of the speaking skill An influential cause of speaking anxiety is categorically related the traditional pedagogical practices, which are lecture-based and teacher-centred with limited or no L2 interaction or in-class speaking opportunities for students (Abbas, 2017; Tanveer, 2007; Tarone 2005). Speaking, a productive skill, is one of the most neglected skills in L2 instruction as it is rarely taught or assessed. If assessed, students are often assigned a 5–10 minute summative presentation task without receiving proper practice or instruction on speaking and oral presentation techniques. While interaction in English is a key ingredient of overall language proficiency, Lindsay and Knight (2006) refer to the impact weak curricula, which disregard the role of speaking in their instruction or assessment, have on such proficiency. The negligence of teaching and assessing speaking in L2 settings can be attributed to a number of factors. Time constraints are one factor, as overwhelmed teachers are often confined to 50–60 minutes of classroom time, which they believe should be assigned to the teaching of grammar, reading, and writing. Also, teachers who teach speaking often rely on assigning tasks of drilling and repeating sentences in class, or using language labs to memorize pronunciation or practice producing oral expressions in artificial contexts (Al Jahromi, 2012; Koran, 2015). Additionally, the assessment of the speaking proficiency in language courses or individual courses is significantly scarce in local and regional EFL settings (Elmenfi & Gaibani, 2016). If assessed, it is often a summative assessment without prior teaching or practice; teachers often do not employ clear criteria or rubrics of assessment. What is worse is that students are often not provided with curricular or extracurricular opportunities to speak in the target language (Koran, 2015). Local tertiary language programs offer language courses that for the most part do not teach or assess speaking. The speaking segments in these textbooks for these courses are skipped and more focus is given to the teaching of grammar, reading, and writing.

Biasing for the interfaces between teaching 205

Programmes might offer a phonology course in addition to an occasionally offered elective course in public speaking, but this phonology course does not allow for oral use of learnt pronunciation rules except at a drilling level. In a local study (Abudrees, 2017), almost 60% of students taking these courses revealed that these courses do not enhance their speaking skills, while more than a quarter expressed their uncertainty of any positive impact of these courses over their speaking proficiency. More than 80% of student respondents revealed the importance of having speaking-related courses during their first years at the program. Around 70% of students reported increased anxiety and loss of confidence while speaking in English, due to their fear of negative evaluation. The lack of in-class speaking opportunities such as discussions, debates, and oral presentations was reported by the majority of students (74%). Consequently, their academic performance and grades were negatively influenced (64%) in the summative assessment they did at the end of the semester. This corresponds well with Ur (2012) who conversely advocates the use of well-designed classroom activities that augment the enhancement of students’ speaking skill of. In addition, 64% of the students in Abudrees’ study revealed that they found communicating with native speakers of English problematic and apprehensive. However, in a vivid acknowledgment of the importance the job market places on speaking in English, more than 67% of these students believed that low speaking proficiency could negatively affect their employment opportunities. A similar report on the negligence of speaking could also be found in high school instruction. In AlRashid’s (2017) study, high school students expressed that speaking is the least preferred skill and the most difficult (70%) and they attributed that to the lack of teaching and assessment of speaking. What was interesting was that students reported that they practiced speaking in English outside of school more often than in class. More than twothirds reported not having speaking as part of the curricula or assessment. Teacher interviewees in this study listed a number of challenges faced by L2 learners that discourage teachers from teaching speaking such as speaking anxiety, lack of confidence, fear of peer evaluation, insufficient vocabulary, and pronunciation issues. In sum, although students experienced varied teaching methodologies, it seems that educational systems in schools and universities do not categorically make use of sufficient opportunities for speaking teaching, practice, and assessment.

Method Sample The study sample consisted of 82 L2 university students (75% = female, 25% = male) enrolled in language learning programs in public and private higher education institutions in Bahrain (91% = public university, 9% = private universities). In addition to majoring in English, the majority of these students

206 Diana Al Jahromi

were doing a minor in Translation (43%), French (13.5%), and American Studies (12%), while another 10% were doing a single major. The vast majority of these students were mature, given that 90% of these students were between 21 and 23 years old, while the rest were older than that. In addition, almost 80% of these respondents were senior, about-to-graduate 4th -year students. Data collection An online questionnaire of 55 items was administered to private and public tertiary-level students enrolled in EFL programs. The questions aimed to measure the relation between a number of variables such as gender, speaking anxiety, teaching and learning practices, and language exposure outside classrooms. In addition to items examining the demographic backgrounds of the respondents, speaking anxiety was measured using five-point Likert scale items that were adapted from the FLCAS developed by Horwitz, Horwitz, and Cope (1986), but modified to speaking. This scale identifies three levels of anxiety: high, moderate, and little or no anxiety. Out of the 33 items that FLCAS uses, this study used only 15 items (see Appendix A, Question items 8–33) that measure students’ oral anxiety using three dimensions: 1) fear of negative evaluation, 2) communication apprehension, and 3) test anxiety. In addition, 10 more question items were added to measure speaking anxiety. Following these items, the questionnaire contained 16 questions that inquired about the status of speaking in academic curricula and instruction (see Appendix A, Question items 34–47), while five more questions investigated the extracurricular exposure to spoken English outside classrooms (see Appendix A, Question items 49–53). Finally, interviews with a focus-group of 30 students were conducted to verify the status of the teaching and assessment of the speaking skill, and to verify the effects of the variables mentioned above. Data collection procedures The questionnaire was administered online to 82 L2 undergraduate students enrolled in language programs in public and private universities. The link to the questionnaire was sent to 150 students using an academic portal and the response rate was 55%, which is a representative rate. The questionnaire contained items inquiring about respondents’ demographic information such as age, gender, academic level, and type of higher education institution. Data analysis Data from the questionnaire was coded and analyzed using descriptive statistics and measures of central tendency (means, standard deviations, and percentages) were used to identify the levels of anxiety. Given that only 15 items were selected and modified out of the 33, the original measures were not used.

Biasing for the interfaces between teaching 207

Conversely, the mean scores were used to measure responses to the question items, using the following scale: low FLCAS = the mean scores between 1.00–2.00; moderate FLCAS = 2.01–3.50; high FLCAS = 3.51–5.00. A similar scale was used to measure students’ satisfaction with the academic curricula and teaching practices: high satisfaction = the mean scores between 1.00–2.00; moderate satisfaction = 2.01–3.50; low satisfaction = 3.51–5.00. As to measuring students’ extracurricular exposure to spoken English (question items 49–53), the mean scores of the responses were analyzed using the following scale: no or limited exposure = 0.00–1; moderate exposure = 1.01–2.00; high exposure = 2.01–3.00). The Statistical Package for the Social Sciences (SPSS) was used to examine the correlations and differences among variables such as the level of speaking anxiety, gender, academic curricula and teaching practices, and the effect of extracurricular language exposure and use on speaking anxiety. Pearson correlations and paired sample t-tests were used to confirm such correlations. Results This section presents the findings of the survey and the interviews with the focus group. First, based on students’ responses to the survey, the mean score of students responses to FLCAS items (items 8–33) was 2.95 with a standard deviation of 1.32. This indicates that students have a moderate level of anxiety when speaking English in class (see Table 11.1). Second, the level of anxiety was correlated with gender using a paired sample t-test. Findings, which are presented in Table 11.2, illustrate that no significant differences were found between males and females in the level of anxiety when speaking in EFL settings (sig.=0.333). Third, in relation to measuring students’ satisfaction with the academic curricula and teaching practices (items 34–47), the mean score of students responses was 3.58 with a standard deviation of 1.11 (see Table 11.3). This Table 11.1 Mean scores of the foreign language classroom anxiety scale (FLCAS)

FLCAS

N

Minimum

Maximum

Mean

SD

82

1.00

5.00

2.95

1.32

Table 11.2 Correlation between FLCAS and gender Male (17)

FLCAS

Female (65)

M

Std. D.

M

Std. D.

2.83

0.61

2.98

0.57

t

df

Sig.

0.975

68

0.333

208 Diana Al Jahromi Table 11.3 Mean scores of students’ satisfaction with the academic curricula and teaching practices N Students’ satisfaction with the academic curricula and teaching practices

82

Min. 1.00

Max. 5.00

Mean 3.58

SD 1.11

signifies that students have a low level of satisfaction with the curricula and the pedagogical practices related to the teaching and assessment of speaking as a productive skill. More than two-thirds of the students reported that their L2 curricula do not include speaking courses and that their teachers do not provide them with in-class speaking opportunities or tasks. 62% denied receiving any instruction related to public speaking or giving oral presentations, while the majority of them (81%) demanded having speaking courses in their L2 programs. When the focus group was interviewed, a number of students revealed more details related to their dissatisfaction with the teaching and assessment of speaking, as exhibited in the following extracts. During these interviews, students reported heightened anxiety when speaking in class. Interviewed students reported a number of factors that were major causes of their anxiety; the first and foremost was the lack of speaking courses

Figure 11.1 Students viewpoints regarding the teaching and assessment of speaking (1)

Figure 11.2 Students viewpoints regarding the teaching and assessment of speaking (2)

Biasing for the interfaces between teaching 209

combined with traditional teaching methodologies. However, a Pearson correlation was carried out between FLCAS and students’ satisfaction with the academic curricula and teaching practices, and it showed no significant relationship between FLCAS and students’ satisfaction with the academic curricula and teaching practices (see Table 11.4). *Correlation is significant at the 0.05 level (2-tailed). Finally, results of the survey items measuring students’ extracurricular exposure to spoken English (see Table 11.5) revealed that students have high exposure to spoken English outside EFL classrooms through means of watching English movies and listening to audio material such as songs and podcasts. An interesting finding appeared in the correlation between FLCAS and extracurricular exposure to spoken English. Findings show a significant negative relationship between both (see Table 11.6). It seems that having extracurricular exposure to spoken English lessens the level of anxiety among EFL students when speaking in English. It also indicates that students with a high level of anxiety do not often have an adequate level of such exposure to spoken English.

Discussion A closer look at the results reveals that although L2 university students have moderate levels of speaking anxiety, they are unsatisfied with their academic curricula and with their teachers’ teaching practices related to speaking. Hence, Table 11.4 Correlation between FLCAS and extracurricular exposure to spoken English Correlations FLCAS

FLCAS

students’ satisfaction with the academic curricula and teaching practices

Pearson Correlation Sig. (2-tailed)

1

students’ satisfaction with the academic curricula and teaching practices -.189 .118

N 82 Pearson Correlation -.189 Sig. (2-tailed) .118

82 1

N

82

82

Table 11.5 Mean scores of students’ extracurricular exposure to spoken English

Extracurricular exposure to spoken English

N

Min.

Max.

Mean

SD

82

1.00

3.00

2.69

0.52

210 Diana Al Jahromi Table 11.6 Correlation between FLCAS and extracurricular exposure to spoken English Correlations FLCAS FLCAS

Extracurricular exposure to oral English

Pearson Correlation Sig. (2-tailed)

1

Extracurricular exposure to spoken English -.244* .042

N Pearson Correlation Sig. (2-tailed)

82 -.244* .042

82 1

N

82

82

*Correlation is significant at the 0.05 level (2-tailed).

it is imperative that the educational system makes changes to allow for the adequate teaching and assessment of speaking by means of introducing speaking courses and availing formative and summative oral, interactive, and collaborative learning tasks and activities. Practitioners and curriculum specialists should be called upon to undertake drastic changes in the academic programs at the school and tertiary levels to render speaking a core skill to be taught, learnt, and assessed. According to the National Research Council (1996), assessment and learning ‘are two sides of the same coin’ (p.5). Consequently, assessment as learning emanates from the idea that learning involves the students in an active and interactive process of cognitive restructuring (Earl & Katz, 2006). Ur (2012) argues that it is often L2 learners’ principal objective to be able to communicate orally and fluently in formal and informal interaction, and hence L2 teachers need to enable them to achieve such an objective. A number of recommendations in this regard have been suggested by numerous educationists. Hamzah and Ting (2010) reported that teaching speaking in groups enhances motivation and lessens speaking anxiety and fear of peer criticism among individuals. In addition, diagnostic tests need to be undertaken to pinpoint anxious students and provide them with assistance (Woodrow, 2006). What is more, the teaching of phonology and more particularly pronunciation needs to be introduced at the early school cycles in order to gauge learners’ accent, stress, rhythm, and intonation in the early stages of learning the target language (Shively, 2008) and equip them with the oral skills needed to bridge the gap between academic school levels and tertiary education and workplace requirements (Lindsay & Knight, 2006). Hence, interactive classroom activities need to be implemented for the production of a consistent and meaningful output by means of introducing the practice of real-life speaking in classroom settings to reduce speaking apprehension and help students identify the areas in which they need enhancement to augment their oral fluency (Harmer, 2010;

Biasing for the interfaces between teaching 211

Koran, 2015). Koran argues that a good teacher is the one who assesses their students speaking skill by means of both observations, quizzes, or exams designed to evaluate their oral proficiency. For perfecting students’ speaking competence, teachers have to provide constructive feedback, facilitate in-class discussions and debates, and provide students with listening material (Harden & Crosby, 2000). First and foremost, a holistic reformation of the academic curricula needs to be proximately implemented in order to ensure that speaking is incorporated as an important segment of any academic L2 program and that it is fused and assessed as a key intended learning outcome. Longitudinal future studies that address the pedagogical practices in the pre-tertiary educational cycles and that measure the status of teaching and assessment of the speaking skill are required in order for the educationists and decision-makers to be able to rectify the long-term negligence of speaking and incorporate it in the L2 curricula.

Appendix A Questionnaire on the status of English language speaking in L2 settings in Bahrain This questionnaire aims at exploring the status of the speaking skill in Bahraini public and private universities in academic L2 programs from students’ perspectives. Please answer the following questions by choosing the option that best describes your opinion. Kindly be informed that all the information provided is handled with confidentiality and utmost privacy. Thank you in advance for your time and effort. 1

1. How old are you?    

2

Are you a …?  

3

Public university student Private university student

Gender  

4

15–17 18–20 21–23 More than 23

Female Male

If you are a university student, what academic year are you doing now?*  

Foundation Year 1

212 Diana Al Jahromi

     5

If you are a university student majoring in English, what is your minor?*     

6

Translation French American Studies No minors Other: (please specify) _________________

How would you rate your speaking skill?*     

7

Year 2 Year 3 Year 4 Not applicable Other: (please specify) _________________

Excellent Good Fair Somewhat poor Very poor

How would you rate your English language skills in general?*     

Excellent Good Fair Somewhat poor Very poor

A. Speaking competence: Kindly read the following statements and provide your opinion with reference to your speaking competence.

Statement 8. I never feel quite sure of myself when I am speaking in my English class. 9. I don’t worry about making mistakes when speaking in my language class. 10. I tremble when I know that I’m going to be called on to speak in class. 11. It frightens me when I don’t understand what the teacher is saying during class.

SD

D

N

A

SA

Biasing for the interfaces between teaching 213 Statement 12. During English class, I find myself thinking about things that have nothing to do with the course. 13. I keep thinking that the other students are better than me in English. 14. I am usually at ease during oral tests. 15. I start to panic when I have to speak without preparation in my language class. 16. I don’t understand why some people get so worried about giving oral presentations. 17. While speaking in class, I can get so nervous I forget things I know. 18. It embarrasses me to volunteer answers in my language class. 19. I do not get nervous speaking in English with native speakers of English. 20. I get upset when I don’t understand why I got bad marks in my oral test. 21. Even if I am well prepared for the oral tasks, I feel anxious about it. 22. I often feel like not going to my language class when there is an oral activity. 23. I feel confident when I speak in the English class. 24. I am afraid that my language teacher is ready to correct every mistake I make when I speak. 25. I am afraid that students in my English class are ready to correct every mistake I make when I speak. 26. I can feel my heart pounding when I’m going to be called on to participate in the English class. 27. I always feel that the other students speak English better than I do. 28. I feel self-conscious about speaking English in front of other students. 29. I get confused when I am speaking in my English class. 30. I feel overwhelmed by the number of rules you have to learn to speak good English. 31. I am afraid that the other students will laugh at me when I speak in English.

SD

D

N

A

SA

214 Diana Al Jahromi Statement

SD

D

N

A

SA

32. I would probably feel comfortable speaking around native speakers of English. 33. I get nervous when the teacher asks questions which I haven’t prepared for in advance.

B. Academic Curricula and Teaching Practices Kindly read the following statements and provide your opinion with reference to your speaking competence. Statement 34. My program curriculum includes a general speaking course. 35. My program curriculum includes a public speaking course. 36. Our language instructors encourage us to speak in class. 37. I have given oral presentations during the course of my study. 38. I have given more than three oral presentations during the course of my study 39. I have been taught how to give good oral presentations. 40. Assessment in language courses includes speaking. 41. Language courses allow for in-class speaking activities. 42. Our instructors engage us in in-class debates and discussions. 43. Our language department provides us with opportunities to practice speaking. 44. Course activities and assignments require the use of media. 45. Our L2 instructors are fluent in English. 46. Our L2 instructors teach us in English. 47. Speaking courses need to be introduced into the program.

SD

D

N

A

SA

Biasing for the interfaces between teaching 215

48. What are the courses which promote speaking and/or giving oral presentations? (You can select more than one): a) b) c) d) e) f)

Major courses Minor courses Language courses Literature courses Linguistics courses Other: (please specify) ____________________________

Extracurricular exposure to spoken English: Please answer the following Yes/No questions. 49. Do you practice speaking English outside the classroom?a) Yes b) No c) Not sure 50. Do you watch English movies? a) b) c)

Yes No Not sure

51. Do you listen to English audio material (e.g. songs, podcasts, audiobooks, etc.)? a) b) c)

Yes No Not sure

52. Do you think watching English movies helps enhance students’ speaking skill? a) b) c)

Yes No Not sure

53. Do you think listening to English audio material helps enhance students’ speaking skill? a) b) c)

Yes No Not sure

216 Diana Al Jahromi

Recommendations 54. Would you like to recommend ways to enhance students’ speaking skill? a) b)

Yes No

55. Kindly use the space below to provide us with your recommendations or comments, if any.

References Abbas, M. (2017). English classroom speaking anxiety among English major students (Unpublished undergraduate thesis). University of Bahrain. Abudrees, T. (2017). The differences in applying the aspects of connected speech between first-year and fourth-year non-native speaking students at the University of Bahrain (Unpublished undergraduate thesis). University of Bahrain. Al Asmari, A. (2015). Communicative language teaching in EFL university context: Challenges for teachers. Journal of Language Teaching and Research, 6(5), 976–984. Al Hosni, S. (2014). Speaking difficulties encountered by young EFL learners. International Journal of Studies in English Language and Literature (IJSELL), 2(6), 22–30. Al Jahromi, D. (2012). A study of the use of discussion boards in L2 writing instruction at the University of Bahrain (Unpublished doctoral thesis). University of Sheffield. Alhamadi, N. (2014). English speaking learning barriers in Saudi Arabia: A case study of Tibah University. AWEJ, 5(2), 38–53. Al-Nasser, A. S. (2015). Problems of English language acquisition in Saudi Arabia: An exploratory-cum-remedial study. Theory and Practice in Language Studies, 5(8), 1612–1619. Al-Qahtani, M. F. (2013). Relationship between English language, learning strategies, attitudes, motivation, and students’ academic achievement. Educ. Med. Journal, 5, 19–29. AlRashid, N. E. (2017). The effectiveness of teaching English speaking and writing in Bahraini government secondary schools (Unpublished undergraduate thesis). University of Bahrain. Al-Shboul, M. M., Ahmad, I. S., Nordin, M. S., & Rahman, Z. A. (2013). Foreign language reading anxiety in a Jordanian EFL context: A qualitative study. English Language Teaching, 6(6), 1–19. Arnaiz, P., & Guillen, F. (2012). Self-concept in University-level FL Learners. The International Journal of the Humanities: Annual Review, 9(4), 81–92. Bashir, S. (2014). A study of second language-speaking anxiety among ESL intermediate Pakistani learners. International Journal of English and Education, 3(3), 216–229. Basic, L. (2011). Speaking anxiety: An obstacle to second language learning? (Unpublished doctoral thesis). University of Gävle. Bygate, M. (1987). Speaking. Oxford University Press. Celce-Murcia, M. (2013). Teaching English in the context of world Englishes. In M. Celce-Murcia, D. M. Brinton, & M. A. Snow (Eds.), Teaching English as a second or foreign language (4th edition, pp. 2–14). National Geographic Learning/Cengage Learning. Coates, J. (2014). Women, men and language: A sociolinguistic account of gender differences in language. Taylor and Francis.

Biasing for the interfaces between teaching 217 Dörnyei, Z. (2005). The psychology of the language learner: Individual differences in second language acquisition. Routledge. Earl, L., & Katz, S. (2006). Rethinking classroom assessment with a purpose in mind. Western and Northern Canadian protocol for collaboration in education. Manitoba Education, Citizenship, and Youth. https://digitalcollection.gov.mb.ca/awweb/pdfopener?smd=1& did=12503&md=1 Elmenfi, F., & Gaibani, A. (2016). The role of social evaluation in influencing public speaking anxiety of English language learners at Omar Al-Mukhtar University. Arab World English Journal, 7(3), 496–505. Ezzi. N. A. (2012) Foreign language anxiety and the young learners: Challenges ahead: Rethinking English language teaching. In TESOL Arabia conference proceedings: Proceedings of the 17th TESOL Arabia Conference (Vol. 16, pp. 56–62). TESOL Arabia Publications. Fakhri, M. (2012).The relationship between gender and Iranian EFL learners’ foreign language classroom anxiety. International Journal of Academic Research in Business and Social Sciences, 2(6), 147–156. Gan, Z. (2012). Understanding L2 speaking problems: Implications for ESL curriculum development in a teacher training institution in Hong Kong. Australian Journal of Teacher Education, 37(1), 43–59. Gregersen, T. S. (2003). To err is human: A reminder to teachers of language-anxious students. Foreign Language Annals, 36(1), 25–32. Hamzah, M. H., & Ting, L. Y. (2010). Teaching speaking skills through group work activities (A case study at form 2ES1 SMK Damai Jaya Johor). https://core.ac.uk/download/files/ 392/11785638.pdf Hanifa, R. (2018). Factors generating anxiety when learning EFL speaking skills. Studies in English Language and Education, 5(2), 230–239. Harden, R. M. & Crosby, J. (2000). The good teacher is more than a lecturer – the twelve roles of the teacher. Medical Teacher, 22(4), 334–347. Harmer, J. (2010). How to teach English. Pearson Longman. He, D. (2013). What makes learners anxious while speaking English: A comparative study of the perceptions held by university students and teachers in China. Educational Studies, 39(3), 338–350. Heng, C. S., Abdullah, A. N., & Yosaf, N. B. (2012). Investigating the construct of anxiety in relation to speaking skills among ESL tertiary learners. 3L: The Southeast Asian Journal of English Language Studies, 18(3),155–166. Gebril, A. & Hidri, S. (2019). Language assessment in the Middle East and North Africa. [special issue: The status of English language research in the Middle East and North Africa: An introduction]. Arab Journal of Applied Linguistics, 4(2), i–vi. Hidri, S. (2018). Assessing spoken language ability: A many-facet Rasch analysis. In S. Hidri (Ed.), Revisiting the assessment of second language abilities: From theory to practice (pp. 23–48). Springer. Hidri, S. (2017). Introduction: State-of-the-art of assessing second language abilities. In S. Hidri (Ed.), Revisiting the assessment of second language abilities: From theory to practice (pp. 1–19. Springer. Hismanoglu, M. (2013). Does English language teacher education curriculum promote CEFR awareness of prospective EFL teacher? Procedia – Social and Behavioral Sciences Journal, 93, 938–945.

218 Diana Al Jahromi Horwitz, E. K., Horwitz, M. B., & Cope, J. A. (1986). Foreign language classroom anxiety. The Modern Language Journal, 70(2), 125–132. Kayaog˘ lu, M. N., & Sag˘ lamel, H. (2013). Students’ perceptions of language anxiety in speaking classes. Tarih Kültür ve Sanat Aras¸tırmaları Dergisi, 2(2), 142–160. Kingen, S. (2000). Teaching language arts in middle schools: Connecting and communicating. Lawrence Erlbaum Associates. Koo, Y. L. (2009). Mobilising learners through English as lingua franca (ELF): Providing access to culturally diverse international learners in higher education. Research Journal of International Studies, 3(9), 45–63. Koran, S. (2015). Analyzing EFL teachers’ initial job motivation and factors affecting their motivation in Fezalar Educational Institution in Iraq. Advances in Language and Literary Studies, 6(1), 72–80. Krashen, S. (1987). Second language acquisition. Oxford University Press. Lindsay, C., & Knight P. (2006). Learning and teaching English: A course for teachers. Oxford University Press. Liu, H. J. (2012). Understanding EFL undergraduate anxiety in relation to motivation, autonomy, and language proficiency. Electronic Journal of Foreign Language Teaching, 9 (1), 123–139. Lu, Z., & Liu, M. (2011). Foreign language anxiety and strategy use: A study with Chinese undergraduate EFL learners. Journal of Language Teaching and Research, 2(6), 1298–1305. Mahmoodzadeh, M. (2012). Investigating foreign language speaking anxiety within the EFL learners’ interlanguage system: The Case of Iranian learners. Journal of Language Teaching and Research, 3(3), 466–476. Mak, B. (2011). An exploration of speaking-in-class anxiety with Chinese ESL learners. System, 39, 202–214. McCroskey, J. C. (2016). Introduction to rhetorical communication: A Western rhetorical perspective. Routledge. McCroskey, J. (1978). Validity of the PRCA as an index of oral communication apprehension. Communication Monographs, 45(3), 192–203. Ministry of Labour and Social Affairs in Bahrain. (2017). Workplace requirements for better and faster employment. Paper presented at Media, Tourism, and Fine Arts Stakeholders’ Forum. University of Bahrain, Bahrain. Mohammadi, M., & Mousalou, R. (2013). Emotional intelligence, linguistic intelligence, and their relevance to speaking anxiety of EFL learners. Journal of Academic and Applied Studies, 2(6), 11–22. National Research Council. (1996). National science education standards. National Academy Press. O’Sullivan, B. (2006). Modelling performance in oral language tests: Language testing and evaluation. Peter Lang. Ounis, A. (2017). The assessment of speaking skills at the tertiary level. International Journal of English Linguistics, 7(4), 95–112. Park, G. P., & French, B. F. (2013). Gender differences in the foreign language classroom anxiety scale. System, 41, 462–471. Pathan, M., Z. Aldersi, & E. Alsout (2014). Speaking in their language: An overview of major difficulties faced by the Libyan EFL learners in speaking skill. International Journal of English Language & Translation Studies, 2(3), 96–105.

Biasing for the interfaces between teaching 219 Rabab’ah, G. (2016). The effect of communication strategy training on the development of EFL learners’ strategic competence and oral communicative ability. Journal of Psycholinguistic Research, 45(3), 625–651. Richards, J. C. (2008). Teaching listening and speaking: From theory to practice. Cambridge University Press. Salman, F. (2016). The power of words (Unpublished undergraduate thesis). University of Bahrain. Seville-Troike, M. (2012). Introducing second language acquisition (2nd edition). Cambridge University Press. Shively, R. L. (2008). L2 acquisition of [β], [δ], and [γ] in Spanish: Impact of experience, linguistic environment and learner variables. Southwest Journal of Linguistics, 27(2), 79–114. Spielberg, C. (1983). Manual for the state-trait anxiety inventory. Consulting Psychologists Press, Inc. Taha, T. A., & Wong, F. F. (2016). Foreign language classroom anxiety among Iraqi students and its relation with gender and achievement. International Journal of Applied Linguistics and English Literature, 6(1), 305–310. Taleb, S. M. (2017). Second language speaking anxiety among English manor students at the University of Bahrain (Unpublished undergraduate thesis). University of Bahrain. Talley, P. C., & Hui-ling, T. (2014). Implicit and explicit teaching of English speaking in the EFL classroom. International Journal of Humanities and Social Science, 4(6), 38–46. Tanveer, M. (2007). Investigation of the factors that cause language anxiety for ESL/ EFL learners in learning speaking skills and the influence it casts on communication in the target language (Unpublished MA Thesis). University of Glasgow. Tarone, E. (2005). Speaking in a second language. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 485–502). Lawrence Erlbaum. Ur, P. (2012). A course in English language teaching (2nd edition). Cambridge University Press. Vygotsky, L. S. (1986). Thought and language. MIT Press. Wang, T. (2010). Speaking anxiety: More of a function of personality than language achievement. Chinese Journal of Applied Linguistics, 33(5), 95–109. Woodrow, L. (2006). Anxiety and speaking English as a second language. RELC Journal, 37(3), 308–328. Yahya, M. (2013). Measuring speaking anxiety among speech communication course students at the Arab American University of Jenin (AAUJ). European Social Sciences Research Journal, 1(3), 229–248. Yaikhong, K., & Usaha, S. (2012). A measure of EFL public speaking class anxiety: Scale development and preliminary validation and reliability. English Language Teaching, 5(12), 23–35. Young, D. J. (1994). New directions in language anxiety research. In C. A. Klee (Ed.), Faces in a crowd: The individual learner in multisection courses (pp. 3–46). Heinle & Heinle. Yusof, R. (2016). Common factors that affect L2 students’ usage of language (Unpublished undergraduate thesis). University of Bahrain. Zhou, M. (2016). The roles of social anxiety, autonomy, and learning orientation in second language learning: a structural equation modeling analysis. System, 63, 89–100.

Chapter 12

Planning for positive washback The case of a listening proficiency test Caroline Shackleton

Introduction Test washback has been defined as the effects of tests on teaching and learning; consequently, any introduction of a new test should plan for positive washback (Wall, 2013). Assessment tasks, whether summative or formative, should therefore be designed in a way that engages students in the necessary knowledge, skills, and abilities (KSAs) needed to perform effectively in the real world beyond the confines of the classroom. Arguably, such a focus is particularly true for high-stakes proficiency tests, such as those used for school leaving or university entrance, where governments and education department – especially those within the European Union – have been obliged to take the Common European Framework of Reference for Languages (CEFR) into account. As a result, new educational initiatives have been plentiful as policy makers attempt to incorporate competence-based language education (Lim, 2014). The main purpose of such initiatives is both to promote learning and bring about a shift in language pedagogy – from knowledge-based to more communicative practices – and to validly interpret what has been learned. Despite the increasing pressure on teachers to be instigators of such changes, for many, a lack of language assessment literacy (LAL) prevents them from successfully fulfilling this role (Fulcher, 2012; Hidri 2019, 2018, 2014). Indeed, a lack of LAL amongst teachers has been widely reported, which is arguably particularly true in the case of standardized tests (Tsagari & Vogt, 2017). In Tsagari and Vogt’s study, teachers did not feel that they had the correct training to help students prepare for tests, and at most, test preparation took the form of administering past papers without critically evaluating them. Ultimately, it is teachers who will need to prepare their students for any standardized test through the provision of support for learning outcomes, classroom assessments to measure and track students’ progress, and other feedback. Students need to be given the tools to reflect on their learning, understand their strengths and weaknesses, and develop learner autonomy in order to develop life-long learning strategies. Teachers act as mediators between the language class and the test. As such, it is arguable that the first step in instigating

Planning for positive washback 221

reforms would be a fully comprehensible description of any new test and how it relates to the present curriculum, together with supporting construct validity evidence. Tests with good construct validity promote positive washback, and the move from classroom activities to test tasks should be fluid (Messick, 1996). A new test must therefore be based on a clear definition of language proficiency and have a strong relationship to the curriculum, and this information must be provided to teachers. Teachers not only need to understand the curriculum standards and test constructs but be able to relate this knowledge to their professional practices if they are to bring about the desired washback effect on student learning. The present study is situated in the context of one such initiative in Spain, where education reform laws have been introduced together with a new communicative, competence-based curriculum. This new curriculum comes largely in response to the growing demand in Europe for the implementation of CEFR-related, competence-based curriculums, and as an attempt to improve the poor results of Spanish students (European Commission, 2012), and follows years of academic criticism of the previous system. The main criticism has been that no oral component has, until now, been included in the exam.1 Furthermore, it has been extensively reported that teachers do indeed teach to the test, and that consequently, a narrow form of the curriculum is regularly taught in the classroom, with listening and speaking being largely ignored (e.g., Amengual Pizarro, 2009; García Laborda & Fernández Álvarez, 2011). Yet, listening is an essential component of communicative competence (adults spend nearly 50% of their time listening) and plays a key role in successful language acquisition (Wagner, 2014), thus contributing to academic success. This is especially true in the context of university entrance, where universities are increasingly offering courses taught in English. The situation in Spain is therefore ripe for change; in order to achieve a positive impact on teaching and learning, any new assessment should not only clearly evaluate the competencies outlined in the new curriculum, but also provide evidence that this is the case. While tests have been shown to bring about changes in educational systems in many different contexts (Cheng, Sun, & Ma, 2015), positive change can only be brought about if a test accurately reflects the aims of the curriculum (Wall, 2013). It is hoped that this study will be a timely contribution to just such an outcome.

Theoretical background A central issue in language testing is the question of the theory-defined construct: Before test development can begin, it is essential that a theoretical stance on the nature of language ability first be taken (Chapelle, 2012). Addressing the continual debates concerning language proficiency constructs, Bachman (2007) concludes that both competence and task-based perspectives should be taken into account. Such an approach resonates well with the CEFR, which provides

222 Caroline Shackleton

proficiency scales outlining both a description of abilities and domains of use. That is to say, language proficiency is represented in the CEFR by both quality and quantity of language use, with the language user becoming more proficient as contexts become more complicated and require more complex language skills. It is argued that the complexity of KSAs involved in L2 oral comprehension make it almost impossible to provide a global, comprehensive definition of the construct (Wagner, 2014). Some authors have centred on a sub-skills approach (e. g., Munby 1978), yet this approach has been criticized because of the lack of empirical investigation and the fact that sub-skill separation would be incredibly difficult to operationalize in test development (Buck, 2001). Other models (e.g., Anderson 2009; Buck, 2001) describe listening as a cognitive activity which takes place online as the listener attempts to process input and – in the context of a language test – to complete some sort of task. Field (2008a, 2013a) outlines just such a process-based approach, which draws on research into L1 listening. Here, listening is described as consisting of five levels of processing: (i) aural input is decoded, (ii) input is parsed, (iii) propositional meaning is established, (iv) a mental model is built, and (v) a situation or discourse model is finally constructed. Input decoding and parsing are bottom-up activities which require the application of linguistic knowledge (Vandergrift & Goh, 2012). The words in connected speech need to be identified and segmented, often by relying on phonetic and phonological clues given by stress and intonation patterns. As phonological knowledge increases, decoding routines become more automated, making the task of establishing propositional meaning easier. Propositions are then integrated and transformed into mental models, which are continuously adjusted as the listener constantly forms and revises hypotheses (Field, 2008b). However, the complete meaning of input cannot be derived solely from the decoded information (Field, 2013a); instead, top-down semantic processing allows for the listener to draw on their personal schemata and world knowledge in context, and draw on pragmatic and discourse knowledge stored in the long-term memory (Vandergrift & Goh, 2012). As such, every utterance must be interpreted in its particular real-life communicative situation; as Buck argues, ‘meaning is not something in the text that the listener has to extract but is constructed by the listener in an active process of inferencing and hypothesis building’ (2001, p.29). By interpreting the interrelated ideas which make up mental models, listeners are able to link information together and form a discourse representation of the input. Communicative purpose for listening is also an important consideration affecting how we listen, and a competent listener will be able to select the most appropriate type of listening for the task at hand (Field, 2008a). After all, ‘in teaching or in testing, the only way we can establish if “comprehension” has taken place is to ask some kind of question’ (Field, 2017). Here, there exists a clear distinction between local and global understanding (Field, 2008a). Besides the skills and competences outlined above, another essential component in any communicative language ability model is that of strategic competence, as

Planning for positive washback 223

evidenced by its inclusion in the CEFR proficiency scales. Specifically, Macaro, et al., (2007) identify the following meta-cognitive strategies as highly relevant to the construct definition: 1 2 3

Predicting content. Monitoring comprehension. Making inferences.

These strategies mediate between trait and context and, because task specific behaviours are context relevant, listeners must develop ‘real world strategies’ in order to achieve comprehension (Field, 2008a). Figure 12.1 shows a representation of the proposed theoretical construct for listening ability.

Discourse Construction Prior knowledge: world, pragmatic, discourse, cultural

Meaning Construction

Parsing

Linguistic Knowledge

Lexical Search Phonological string Input Decoding

Linguistic processing

Word string

Metacognitive strategies Planning Monitoring Inference

Representation of speech in memory

Semantic processing

Response

Speech Input

Prediction

Figure 12.1 Proposed model of listening ability (based on Field, 2008a, 2013a)

224 Caroline Shackleton

Not only must any theory-based process model of listening ability be represented in the construct of a new test, but it must be clearly shown that candidates use the same KSAs as they would in the target language use domain (TLU). Context-specific features of test tasks are normally outlined in the test specifications, and evidence should be provided that tasks do indeed represent the proposed TLU. These context specific features should include elements such as the source of input texts, channel of delivery, number of plays, and the response format. Most importantly, we need to consider the characteristics of the input passages and how these will relate to the TLU. A key debate here is that regarding the authenticity of the audio used. At present, most language tests use scripts, i.e., written texts which are then read aloud (Buck, 2018; Wagner, 2014). These texts are often revised and edited before being produced in a studio by actors; ‘far too often listeners are expected to be able to understand texts that are meant to be read’ (Vandergrift & Goh, 2012, p.167). Here, construct under-representation is an obvious threat, as a scripted text lacks many of the characteristics of natural speech (Field, 2008a, 2013b, 2017; Vandergrift & Goh, 2012). Indeed, several studies highlight the differences between spoken and written discourse (for review, see Wagner & Toth, 2017). Natural, connected speech is very different from the written word and can include grammatical mistakes, shorter idea units, and ellipses. Furthermore, it tends to be less logically organized as a consequence of its unplanned nature (Wagner, 2014). Not only can spoken language be more colloquial, containing fillers and repetition, but its intonation patterns carry substantial meaning (Buck, 2018). In contrast, Field (2013b) argues that actors mark commas and full stops, there are no hesitations or false starts, and voices rarely overlap. Furthermore, test developers often put in scripted distractors, making a recording much more informationally dense and placing too great a strain on the working memory (Field, 2013a). Consequently, there are many calls for a move towards more authentic input texts, both for teaching and assessment purposes (e.g., Field, 2008a, 2013a; Gilmore, 2011; Vandergrift & Goh, 2012; Shackleton, 2018a; Wagner, 2014; Wagner & Toth, 2017). As Field (2013a, p.143) states, ‘if a test is to adequately predict how test takers will perform in normal circumstances, it is clearly desirable that the spoken input should closely resemble that of real-life conversational or broadcast sources’. In the case of school leaving/university entrance tests, the range of genres found in the TLU should be sampled and these would represent a continuum of aurality (Shohamy & Inbar, 1991; Vandergrift & Goh, 2012), from a planned talk to a spontaneous conversation. A related issue concerns questions of accent, English as a lingua franca (ELF), and the ongoing debate about the status of the native speaker as an ideal model for assessment. Most international tests still limit themselves to accents drawn from the major native-speaker varieties (British English, American/Canadian English, and Australian English). However, the relevance of standard nativespeaker varieties has more recently been brought into question, and the lack of ELF-based examples in most language tests has been the subject of criticism

Planning for positive washback 225

(Jenkins & Leung, 2014). We live in a world in which English is increasingly used for global communication and where university entrance tests are often used to gain access to English as a Medium of Instruction (EMI) courses; in order to reflect this international context, it is becoming ever more apparent that a wider range of accents needs to be included in assessment procedures. Indeed, it is worth pointing out here that the CEFR itself has recently removed the notion of ‘native speaker’ from its proficiency scales (Council of Europe, 2018).

Research problem The proposed new test must take the above debates into account; if positive washback is to be encouraged in the present context, I would argue that both authentic input texts and a range of both native and non-native speaker varieties be included in the test construct. There are further reasons why this should indeed be the case: If a new test is to be used for university admissions, it should be clear just what competencies are being assessed, and here a CEFR-related test can provide test users with a well-defined description. I would also contend that there are a number of reasons why any CEFR-based test for the context under discussion must be aimed at a B2 CEFR level; not only do the B2 descriptors most resemble current Spanish curriculum requirements, but B2 is currently the required university entrance level for most other European countries, as it is considered to be the most appropriate level for basic academic study and work insertion. While there have been some doubts expressed as to whether a higher level may be necessary (Taylor & Geranpayeh, 2011), B2 is generally felt to be a feasible minimum. For example, Carlsen (2018) reported that students entering Norwegian universities with a proficiency level lower than B2 lacked the necessary language skills for success on their courses. Such findings would suggest that for the Spanish education system to keep in line with other European countries, an ideal scenario would see students leaving upper secondary school with a B2 level minimum (Deygers, Zeidler, Vilcu, & Hamnes Carlsen, 2018; Lim, 2014). In light of the above issues, the motivation for the present study can be framed as the need to develop a test which engages the proposed listening ability model, which incorporates authentic discourse (including a variety of accents), and which adequately reflects those CEFR B2 competences relevant to the context of school leaving/university entrance demands.

Rationale The correct identification of the TLU is clearly a key factor in the successful operationalization of any test construct and, as such, the test specifications should reflect those CEFR B2 abilities which students would be expected to employ beyond the confines of the test in a school leaving/university entrance context. Accordingly, the current study drew upon the following sources in order to develop its specifications:

226 Caroline Shackleton

1 2 3

A survey of topics taught during baccalaureate. CEFR B2 listening descriptors. Types of listening based on specific purpose for listening.

Following the previous discussion, it was decided to use only authentic audio files sourced from the internet or produced as a natural response to prompts in order to obtain samples of non-adapted natural discourse which would include a variety of accents (including one L2 speaker). Four different tasks were chosen, so as to create adequate construct coverage and to minimize the task effect by including a mix of task types. Each sound file lasts between three and five minutes and is to be heard twice. In order to develop tasks that correspond to real-world communicative events, purposeful items based on expert behaviour need to be developed. To this end, a textmapping protocol (see Green, 2017) was followed. This process makes no reference to a transcript; instead, after the purpose for listening has been decided, a group of experts notes down the salient ideas taken away from a given audio in order to replicate the real-world listening process as faithfully as possible. In this way, an attempt is made to replicate specific types of listening (Weir, 2005, p.101) by reaching a consensus on meaning, thereby modelling the activity on expert cognitive processing behaviour as suggested by Field (2008a). The development of all subsequent items is then based on this understanding of the audio material in question. Table 12.1 gives a breakdown and brief description of the four tasks based on audios which were considered to be suitable for exploitation in accordance Table 12.1 Test description Task Description

Type of listening and response mode

Task 1 Opinions about sport: Audio developed using controversial questions related to sport. The utterances collected are propositionally and linguistically complex and contain abstract as well as concrete ideas. Task 2 Moving to the USA: A talk sourced from the internet about moving to the USA from Mexico given by a Mexican. The speaker explains his move to the USA and includes opinions and attitudes as well as cause and effect links. Task 3 Text messaging: A radio interview sourced from the internet with an academic about her research into language use in text messaging. Task 4 Geography trip: Constructed using prompts to produce a lecturetype audio conveying quite informationally dense instructions about a forthcoming school trip.

Gist/Main ideas (G/MI). Multiple Match (MM).

Main ideas with supporting details (MISD)/Listening to infer (propositional) meaning (IPM). Multiple choice (MCQ) Main ideas with supporting details (MISD)/Listening to infer (propositional) meaning (IPM). Multiple choice (MCQ) Selective listening/Specific information and Important details (SI/ID). Note form (NF)

Planning for positive washback 227

with the test specifications. A small pilot study was carried out and items which appeared to discriminate badly or be too easy/difficult for the pilot population were removed. In order to discover if the test represents the proposed cognitive processing view of listening, the following research question was addressed: To what extent does the behaviour elicited from a test taker correspond to the relevant knowledge, skills and abilities that would be required of him/ her in a real-world context?

Method Verbal protocol methodology In order to discover whether a test engages the abilities it intends to assess, one extremely useful research tool is verbal protocol methodology (both ‘think aloud’ and retrospective methods). Although concurrent ‘think alouds’ can be used at the pre-listening stage to collect information about metacognitive planning and prediction strategies, the fact that listening is an online process makes it impossible to collect think aloud data while a participant is performing the task, and retrospective methods must instead be used. Participants are asked to recall their thought processes immediately after completing the task, whilst they are still in the short-term memory. The question paper acts as a ‘prompt’ for aiding recall and further probing questions can be asked to encourage participants to give more useful information. Once the verbal protocol recordings have been collected, they must be transcribed, segmented, and coded, following a coding scheme which represents the data. Here, the present study is based on a theoretical framework of listening ability, and therefore the framework itself can be used as a base for the coding scheme (Gu, 2014). The data can then be analyzed both qualitatively and quantitatively using frequency counts, thereby providing a rich description of just how test items are solved in order to discover if the relevant KSAs are being used. In this way, construct-irrelevant processes not revealed by the test score themselves may subsequently be investigated. However, this methodology does have some limitations: data collection is time-consuming and so, typically, only small samples are used (Green, 1998); reports may be incomplete or inaccurate due to the heavy reliance on memory (Banerjee, 2004); and some strategies may not be reported by proficient users who have more automated abilities (Phakiti, 2003). Notwithstanding these criticisms, the methodology may still be considered superior to other research tools currently used to investigate process and strategy use, such as questionnaires – which only report on those strategies participants themselves think they have used – or expert judgements (see Alderson (1993) for a more detailed discussion on limitations).

228 Caroline Shackleton

Data collection After piloting the methodology with two participants, seven volunteers estimated to have a CEFR B2 listening proficiency level were enrolled (male = 4, female = 3), and the following two-stage design was employed: 1

Concurrent ‘think alouds’.

Participants were asked to verbalize their thought patterns whilst preparing to do each task in order to allow for a qualitative analysis of planning and prediction strategies. 1

Retrospective verbal reports.

Whilst finalizing their answers, participants explained how they had reached the answer to each item. Each of the four tasks was completed separately in order to reduce the time lag between doing the test and reporting it to a minimum. Participants were given the option of reporting in their L1 in order to reduce the cognitive load when expressing their thoughts (Banerjee, 2004), although in the event only one participant chose to actually do this. Once collected, the reports were transcribed in preparation for coding using Qualitative Data Analysis (QDA) software Minor Lite. The data was coded separately for each item on the test as a representation of the level of processing reached in order to correctly solve the item. These levels of processing were drawn directly from the listening ability model and are as follows: L – Lexical recognition: The understanding of isolated vocabulary from the audio input. IU – Idea unit: A proposition, which could be as little as a noun phrase (Buck, 2001, p.27–28), is used to answer the item. This is understanding at a very literal level and includes local factual information. MR – Meaning representation: The listener relates a proposition to the context and uses prior knowledge in order to interpret meaning. DR – Discourse representation: The listener is able to integrate information into a wider semantic representation, including speaker intention. In order to generalize from the results, reliability checks must be carried out. In the present study, the researcher re-coded one entire protocol six months after the original coding. The resulting intra-coder agreement between both sessions was 87% exact agreement and Cohen’s Kappa, which takes into account agreement by chance, was 0.782 (p 7 > 5 > 1–4, or at least I can say that these seem to be the ecological stages that appear to be happening to students in my classes. We start by learning from experiences (6) in which we are directed to ask others (e.g. the activities described below and the repeated testing). The teacher and the students need to learn to be persistent in asking (7) and allow ‘asking’ to generalize to other realms in their lives (as noted in several student quotes below in the qualitative data). This requires a lot of bravery to take action (5). The results of these three steps, I believe, will be numbers 1–4: You will better know what you want; you will believe you are worthy of someone else’s help; you will believe you can get it; and you will become passionate about asking and learning.

Review of the literature The art of asking In The art of asking, Amanda Palmer (2015) looks closely at how people ask, or don’t dare to. Prior to the publication of The art of asking, Palmer had become famous for her TEDTalk (Palmer, 2013) of the same title which was about her experiences as a street performer (an eight-foot tall white bride statue in Harvard Square) asking for tips, but it was also about her recent crowd funding from fans to make a new musical album. In The art of asking, she says that

244 Tim Murphey

‘Asking is, at its core, a collaboration’ (Palmer, 2015, p.47). But more poetically she writes: Asking for help with shame says: You have power over me. Asking with condescension says: I have power over you. But asking with gratitude says: We have the power to help each other. (p.48) Later on, she cites Hyde (1983) who ‘explains the term “Indian Giver”, which most people consider an insult: someone who offers a gift and then wants to take it back…’ (p.57). Hyde (1983) tackles the subject calling it ‘the commerce of the creative spirit:’ But the origin of the term – coined by the Puritans – speaks volumes. A Native American tribal chief would welcome an Englishman into his lodge and, as a friendly gesture, share a pipe of tobacco with his guest, then offer the pipe itself as a gift. The pipe, a valuable little object, is – to the chief – a symbolic peace offering that is continually regifted from tribe to tribe, never really ‘belonging’ to anybody. The Englishman doesn’t understand this, is simply delighted with his new property, and is therefore completely confused when the next tribal leader comes to his house a few months later, and, after they share a smoke, looks expectantly at his host to gift him the pipe. The Englishman can’t understand why anyone would be so rude to expect to be given this thing that belongs to him. Hyde concludes: The opposite of ‘Indian giver’ would be something like ‘white man keeper’ … that is, a person whose instinct is to remove property from circulation … The Indian giver (or the original one at any rate) understood a cardinal property of the gift: whatever we have been given is supposed to be given away again, not kept …The only essential is this: The gift must always move.

Methods of learning asking (in regular classroom activities) If you asked my students (from the last few years) what their teacher’s most common phrase was, they would probably say ‘Ask you partners…’ is used at least 20 times in every class. Below are brief descriptions of activities in which students ask questions as an integrated part of those activities.

Testing abilities to understand 245

Songlets/speed dictations (Murphey, 1990, 1992, 2018a, 2018b) In nearly every class, I give my students a speed dictation in which I ask them to help each other in pairs, with one person writing the first half and the other the second half of the dictation. I say it quickly (speed) so that neither one can easily get it all; they are thus ‘forced’ to help each other and must ask, ‘What is the first/second part?’ For example, Part 1 might be ‘Super, happy, and optimistic’ and Part 2 ‘joyful and prodigious’. Thus, the seeds are laid for collaboration, asking, and helping. But they are also challenged by the speed dictation becoming a songlet (short song) sung in response to a question. For the above speed dictation, the question becomes, ‘How are you?’ I say to them (5–10 times) in the class in which it is taught ‘Ask your partners “How are you?”’ and in later classes once or twice for review. They have to respond, singing to the tune of Mary Poppins’ ‘supercalifragilisticexpiallidocious’ the correct answer: ‘Super, happy, optimistic, joyful, and prodigious’. They are instructed to memorize it and to use it respond to anyone who asks them ‘How are you?’ in any language, in and out of class, in order to generalize it in their everyday lives. Many of them tell me in their action logs that they end up teaching the songlets to their friends and family.

Action logging: Comments on class (Murphey, 1993) I have been hooked on reading action logs for nearly 30 years and cannot imagine teaching well without them. Many other teachers look at me as I go through pile after pile of action logs (notebooks) sitting in our cafeteria and think I am working too hard, when actually, it is joyful work that helps me to figure out what I need to do in the following classes. I ask my students to tell me ‘What do we need to review or correct in the next class?’ Action logging brings many advantages for students as well. First and foremost, they get to ask the teacher questions in a kind of private conversation, and in response get tailored answers to their individual questions. In most classes, teachers cannot have a conversation with every student, but in an action logs it is possible. The main parts of the class are listed on the board for students to copy into their action logs, and to evaluate and comment on. They can also do their homework in the action logs as discussed below.

Asking during ‘call report’ homework In their action logs, students write a short description of their telephone homework (Call report) with their in class partner that day; they are asked to change partners every class. Typically, they ask about information they learned in the class (speed dictation questions, etc.), but it is also an opportunity to ask about personal topics and make friends. They are also supposed to write down

246 Tim Murphey

how many minutes they talked and what percentage was in the target language. Calling up someone who you may have met for the first time that day can be a scary thing for many people, but they do it, and they get used to it, and many end up calling each other for test reviews and other tasks as well. (Students put their phone numbers beside their names on an attendance list I pass out in class and I give them all copies so they can call each other easily, and also so they can call each other when they are absent in order to catch up with what they missed).

Asking during ‘teaching report’ homework Another regular daily homework assignment is to teach someone out of class (who is not in our class) something students learned in class and to write a Teaching report. Many end up asking family members and friends if they can teach them things they learned in class that day (e.g. songlets, stories, new vocabulary, information, etc.). Teaching something you just learned to others helps you learn it better yourself (Murphey, 2017c) and can include multiple asking cascades.

Action log share: Asking in class Students are asked to have new partners each class and to exchange action logs which have an introduction page at the front that describes the owner, and that is followed by all the dated class logs. Students read the introduction page and ask detailed questions. They then review recent classes and activities, asking questions of their partner as to how they liked and did certain activities. They are asked to write a short comment in their partner’s action log before giving it back to them (e.g. ‘Nice to meet you. I look forward to working with you and calling you tonight’, ‘Reading your action log was fun. It gave me good ideas for my action log.’). They are asked to sign their name, anchoring their words in time and space for the reader.

Story asking I ask my students to tell many stories about themselves and to write some of them down in their action logs; sometimes their fellow students will read them and ask questions about them to deepen the conversation. For example, they write a Language Learning History that details when they first began learning foreign languages up until now. They also write about glory, embarrassment, regret, and mistake stories, which students discuss to show that no one is perfect, we all make mistakes, and we can learn to laugh at them sometimes.

Lecture pre-asking Most lecturers enthusiastically dive into their material, but most information is wasted like water washing over rocks (brains). Priming the students at the

Testing abilities to understand 247

beginning of a class with a number of questions about the content that will be covered in the lecture creates curiosity and gets students to form a possible neural network for an answer. Research shows that even if they come up with a wrong answer, they are still capable of easily replacing wrong answers with new answers due to the curiosity network already formed (Roediger & Finn, 2009). Such questioning also shares class time democratically with students so they feel more empowered. As Donald Graves (2002) wrote in Testing is not teaching: What should count in education: Perhaps the problem [of learning well] is best understood in the context of power within relationships. Understanding is best reached when power is shared. In most cases teachers are in the power position when working with their students. They have the power of assignments, corrections, and grades. The best teachers know how to share this power; indeed, they give it away. They are constantly uncovering where the student’s heart is situated in the writing. Through the skills of teaching they know how to add power to the student’s intentions. (p. 11)

Formative assessment All the activities above are about learning through assessing, and learning what we know and don’t know, thus promoting language assessment literacy (LAL) not just among teachers, but also among students. Wormeli (2018, p. 284), a great advocate of formative assessment, defines it as: ‘Frequent and ongoing ways to check students’ progress toward mastery; the most useful assessment teachers can provide for students and for their own teaching decisions’. Thus, I was led to make regular tests, like formative assessment, through socializing the procedures, starting about six years ago with social testing (Murphey, 2013a, 2013b).

Asking in social testing For the remainder of this paper I will be focusing upon social testing as a method and incentive for students to ask each other questions and become language assessment literate. Let me begin by providing an abstract of a recent article on social testing published in Critical Inquiry in Language Studies: A conception of social testing is described in which students are directed to give themselves grades at two moments: first, after filling in answers that they recall alone; second, after asking others in the class for mediating help during social interaction. The first grade is an estimate of individual efforts, without social connections. The second grade represents a situated person in a community with developing connections, something neurologists, sociologists, and anthropologists see as an ecological step towards species well-being. Social testing is

248 Tim Murphey

one step toward changing an epidemic trend in our societies and schools toward increasing individualization and isolation (III). (Murphey, 2019a) The bottom of each test looks something like Figure 13.1 (also see Appendix A):

1st score____/100% 2nd score____/100% 3rd score____. Who helped you? WHY?___________________________________ Who did you help? WDYH?_________________________________ What do you think of this test? WDYTOTT?___________________ Figure13.1: Bottom of each test

As you can see in Figure 13.1, I ask students to give themselves their own grades at two separate times on the tests, first after a certain period of doing it alone (1st score) and then after allowing them to ask others for help and give help to others who ask for it orally (no copying; all oral). John Hattie (2012) showed through his meta-analyses of 150 classroom activities that self-reported grades (#1), formative evaluation (#4), feedback (#10), and reciprocal teaching (#11) are all highly effective for learning, with the last three being highly social, and all are included in the social testing protocol aligned with LAL. I wish to propose a form of testing that allows students to interact more and learn more at the same time. Although this way of testing will not solve all our problems, it is a way to help students become more social, and to teach the worth of social interaction and its benefits. In Murphey (2019a), I cite mostly Vygotskian researchers who claim that: the origin of intelligence (both phylogenetically and ontogenetically) [is] social in nature; [such] studies of intelligence tend to treat the social nature of its development as implicit. Social definitions of intelligence have in this sense remained mere postulates, for while the social nature of intelligence is recognized, intelligence is never explicitly studied as social … How are we to move beyond this situation and develop the investigation of intelligence while adequately incorporating its social nature and origins. In our opinion the solution lies in the theoretical elaboration of a definition of intelligence which not only embodies its social nature explicitly, but which can also be empirically investigated. Mead, Piaget, and Vygotsky have all given us the ideas, but neither the paradigms nor techniques necessary to substantiate their belief that intelligence is essentially social in nature. (Doise & Mugny, 1984 p. 22) In previous articles, I have offered data to show how students highly rate such tests, and have suggested ways that a graduate student might study them for their

Testing abilities to understand 249

long-term impact. For the remainder of this chapter, I would like to look closely at some qualitative data on social testing and how it seems to liberate students’ learning. I will first look at some undergraduates doing three tests in one semester and then look at some graduate students who did a social test as a final exam.

Some recent qualitative data: undergraduates Most recently (fall 2018) I taught a ‘Ways of Learning’ class, which is an elective open to all four years and all departments for a 30-class semester, meeting two times a week for 15 weeks, with about 100 students at a language/humanities university. We did three quizzes, approximately one every ten classes, from which I was able to collect their comments on the bottom of the tests and in their action logs (Murphey, 1993) in which they regularly give me feedback about the classes and our activities (see ‘Action log share’ above). The testing comments can be felt in three waves of socio-cognitive emotion, which I will describe after each set of comments. Space unfortunately limits me to looking at just a few comments for each quiz. QUIZ 1 Feedback at the bottom of the quiz (unedited except for emphasis) 1

2 3 4

This is my first time to take a test which encourages me to interact with others, so this is interesting for me. This makes me think I want to have a conversation with others more. I love this test because I felt we were doing test together. I knew the importance of cooperating. This test improve our ability to ask things of others. We have to be brave, so I like this way. I thought this test is more meaningful than the way as usual test because just remember something is not interesting but in this test not only remember, but we use English. This is the big point of this style, I think. QUIZ 1 Feedback from action logs (unedited except for emphasis)

1 2 3

Today we had a quiz, a new test that I never did before. We can learn from each other and ask other people for answers. It is a new chance to communicate. I really felt glad to see your words ‘Not knowing is OK’. And ‘Not asking is failing’. And ‘Helping and asking many people is your goal’. I really feel great to do today’s test. I told my family that my teacher gave us a time to teach each other for us to improve our skills, because it was the first time in my life that teacher gave us such a time. I taught many things to my classmates during that time. I enjoyed helping them.

250 Tim Murphey

Commentary on student feedback for quiz #1: After the first quiz, many students expressed being surprised (‘a test which encourages me to interact’ and ‘I never did before’) and excited (‘interesting’; ‘I love this test’) and provoked to act (‘makes me think I want to have a conversation with others more’; ‘I knew the importance of cooperating’; ‘improve our ability to ask … others’) about the new way of testing in which they could talk and learn from their partners. They also appreciated that they could help others and learn from them at the same time. Some even told their families about it (‘I told my family’) and others mentioned PowerPoint slides I showed them before the test saying ‘Not knowing is OK. Not asking is failure. Your grade is mostly about your ability to help and ask’. Not only did they enjoy it, they found it more meaningful (‘this test is more meaningful’) than simply regurgitating information alone on a paper. Thus, I will call this first stage SEP for surprised, excited, and provoked. Ten classes later (five weeks), we did quiz 2 (see Appendix A for all three quizzes.) QUIZ 2 Feedback at the bottom of the quiz (unedited except for emphasis) 1 2

3

4

I think this test help students to improve communication skill. We’ll have to communicate with other people in the future job, so it’s practical test. This test is really useful to think about the answer with peers. Giving hints to find a clue is the best way to know the answers … I thought my communication skill is getting up. I like this style of testing. I can talk to different people a lot & it feels good to help someone. But sometimes they went away just after they had their answers, not teaching me anything and it hurts. It also hurts when I tried to remind them the story (hint) and someone just said, ‘just tell me the main point!’ Today, I am very brave and do not hesitate to ask. So almost all of blanks are filled.

QUIZ 2 Feedback from action logs (unedited except for emphasis) 1 2

Last time I felt embarrassment to ask questions, but this time was not. I enjoyed asking and talking. Everyone’s so kind. I’m glad to have and be in this class. I really like this type of test. I always feel nervous or hate to take test but I feel relax and enjoy this class’s test. Because you said, ‘Not knowing is OK. Not asking is failure’. This phrase I like very much. And I really like part 2, because I can help classmates and be helped by classmates. This is very good communication I think because help each other is really good to learning. Today is the second time to take this type of test, so I feel more relax to take test and I could ask and help

Testing abilities to understand 251

3

4

many [more] class mates than first time. I feel really happy to talk a lot of people and help them. In today’s class I took a test. I like this test because I could talk a lot and communicate with many people. This test needs knowledge which I learned in class and we require to communicate positively in this test. In addition, we could discuss questions before the test. It is different from other tests. We can have opportunities to speak in this test style. And also, I learned scaffolding, it is difficult to give hints and help others understanding. I had fun to do test and ask. I could ask more people than before. I think I get used to ask people because of this class! I’m looking forward to next test! Please don’t make it harder.

Commentary on student feedback for quiz 2: I added into the pre-test explanation the idea of scaffolding (‘I learned scaffolding, it is difficult to give hints and help others understanding’) and hinting at answers rather than just telling them straight away. Some got it (‘Giving hints to find a clue is the best way to know the answers’) and some apparently did not get it (‘I tried to remind them the story (hint) and someone just said, “just tell me the main point!”’). Some felt they were getting used to the test style (‘Last time I felt embarrassment to ask questions’) and were appreciating it more by generalizing it to their regular lives (‘I could ask more people than before. I think I get used to ask people because of this class!’). They started to notice that this test style was more relaxing (‘I always feel nervous or hate to take test but I feel relax and enjoy this class’s test’), and again commented on the instructions delivered using PowerPoint slides that said ‘Not knowing is OK. Not asking is failure’ and identified them as helping them to perform better. Thus, I will call this second stage scaffolding, generalizing, and relaxing (SGR). Five weeks later, at the end of the semester, we took the third quiz. Due to time restrictions to get their action logs back to them, I was rushed with reading 100 tests as well as 100 action logs in just a few days. QUIZ 3 Feedback at the bottom of the quiz (unedited except for emphasis) 1

2

I could answer almost all questions compared to last test. And actually this is my last test in my university school life! I’m glad to take this class and this test! Thank you for all! I can help many person. I’m so happy! I have confidence because I have good classmates. This test is so fun!

252 Tim Murphey 3 4

This is the third time to take this [type of] test. I asked my classmates fluently and they feel glad to help each other. I enjoy this test. This time I could ask many people and they answered kindly. Though I’m powerless alone, I was happy that there were many people who helped me.

QUIZ 3 Feedback from action logs about the quiz (unedited except for emphasis) 1 2

3

4

I could ask students more than before. It’s proud that I can learn asking is not hesitating thing. I could learn what I never thought or I’ve never known. It was really useful. The remarkable thing is mistaking is not bad; trying not do is bad! I was so impressed that. So I’m always trying what I face first time. I don’t judge with prejudice anymore. I’ll never forget this class. (1st year) I am really excited and satisfied with this class, as this class gives me a lot of chance to speak English than any other class. Additionally, I could make new friends … I really enjoyed and I’ll miss this class. I am glad to choose this class because I enjoyed learning English and I met different grade student and we had a chance to talk. I felt happy when we talked.

Commentary on student feedback for quiz 3: The core learning by the third quiz was an increase in collaboration (‘This time I could ask many people and they answered kindly. Though I’m powerless alone, I was happy that there were many people who helped me’; ‘I have confidence because I have good classmates’) which was very gratifying, especially for 4th year students (‘this is my last test in my university school life! I’m glad to take this class and this test!’). Some described a crucial change in perspective (‘I could ask students more than before. It’s proud that I can learn asking is not hesitating thing’; ‘I could learn what I never thought or I’ve never known. It was really useful. The remarkable thing is mistaking is not bad; trying not do is bad! I was so impressed that’) They also confirmed that the act of talking with diverse others was a learning act (‘this class gives me a lot of chance to speak English than any other class’; ‘I could make new friends’; ‘I met different grade student and we had a chance to talk. I felt happy when we talked’). And they could notice a positive change in themselves and their classmates (‘This is the third time to take this [type of] test. I asked my classmates fluently and they feel glad to help each other’). Thus, the key words in this final stage are collaboration, asking freely, and emotional bonding (CAFEB). I believe that the testing had a lot to do with their close socialization and bonding by the end of the semester. Even I found it difficult to say goodbye for the last time.

Testing abilities to understand 253

Graduate students’ reactions to social testing In December 2018 I also did a social test with a small group of graduate students (4) in a socio-cultural theory (Vygotskian) class to end our fall semester on the last of four intensive Saturday afternoon meetings. Several of them responded very enthusiastically on Moodle and gave me permission to post their comments and names below: OSHIKA, Eriko – Tuesday, 18 December 2018, 9:32 PM As for the final exam, I felt that it was a lot more effective for internalizing what we have learned, compared to traditional styled exams in which we usually just get tested how much we ‘remember’ things. By discussing and exchanging ideas for answers, we can find something new and understand the topics even deeper during the test. I thought this was a very meaningful activity, not just a tool to evaluate students’ performance.

WADA, Jun – Sunday, 23 December 2018, 11:38 AM Hi, Eriko, Thank you for sharing your reflection. I agree with your idea about the final exam. Even if students get the score of the test, it is meaningless unless students learn from the feedback. In most cases, students tend to get the score and do nothing after that. It is more effective if they have opportunities to deepen their understanding in the test. As you mentioned in your posting, teachers need to understand why students do the activity. The purpose of the test is not only for getting the score for their evaluation, but it should be mediation for them to learn.

YOSHIEDA, Megumi – Thursday, 27 December 2018, 9:06 AM Hi everyone, I had the same joy as Eriko and Jun while having the final exam of SCT. I would like to add a comment. As some of us are scheduling final exams of the courses, we could plan a peer supporting exam style that we have learned in class. One big concern for me is that some students did not like the new evaluation ways I have tried. When it is a test, once they got very nervous and complained even though I explained the benefit. Thus, one big task for us teachers is to take time to explain well about the new ways as well. Have a great new year, everyone!

Commentary on graduate student feedback for their final test The graduate students noted that the social part of the exam allowed them to better internalize concepts and increase their understanding, and they regretted that many students are only worried about the score and not the learning; for

254 Tim Murphey

these students, learning is more important than a good score, even while taking a test. Still, teachers need to find good ways to explain such testing procedures to students to enlist their altruism and understanding of how to learn more on a deeper social level.

Conclusion To conclude, I would like to return to our long acronym in the title TATUNII-SIA-RASA (Testing Abilities – To Understand Not Ignorance or Intelligence – Socially Interactive (formative) Assessment – Receive, Appreciate, Shadow, and Ask). I believe that as educators, we should be testing and teaching ways of understanding, not simply information (ignorance or intelligence). I am convinced that socially interactive ways of assessing have great promise for helping students grasp more intellectual territory than simple solo exams. We need to be able to learn, even during assessments, and see that this is indeed part of language assessment literacy. Social testing is not only teaching asking but also altruism, as one of my students said a few years back: Because I had taken a test (#1) in this class and I knew how we would do the test #2, I tried to remember as much as possible not only for myself, but for my classmates. Last time I took the test, I was helped by others with answers, very helpfully. So, I wanted to help my classmates more than I did last time. In Test #2 it was interesting. I felt as if I was already working with classmates during my preparations for the test, and that motivated me to study. Although it was not so many people that I could help with the quiz, I was glad to hear ‘thank you’ from them and to see their smiles. Showing thanks to people really makes them happy. Another student referred to asking as part of ‘vital skills to live in real life’ and though most people will not be taking pen and paper tests at their work, they will need to be able to ask people for help: I really like this type of test. I’ve never done such a creative and interactive test, and I really think that I was required to get information and help people, and these are vital skills to live in real life! Please read Murphey (2017a) for a more detailed understanding of social testing or Murphey (2017b) for a short, four-page synopsis from a Stanford University blog. Finally, I wish to dare to talk more grandly beyond our classrooms and have a look at the questions concerning our survival and social well-being in our various societies. I believe we need to ask our schools, educational systems, businesses, communities, governments, our universe, and our gods more grandly for better understanding and well-being for all, and for a more just and ecological world in

Testing abilities to understand 255

which we all can live peacefully. I want my students to be able to ask for these things, for in asking we may indeed find the ways through LAL.

Postscript (10 March, 2019) I started reading Fourth generation evaluation (Guba & Lincoln, 1989) recently and it occurred to me that social testing has many elements that are described within Guba and Lincoln’s framework. Some of the consequences of fourth generation evaluation, especially, seem to be similar to social testing (p.256–258): accountability yields to shared responsibility; exploitation yields to empowerment; ignorance yields to comprehension and appreciation; and immobilization yields to action. Students take more responsibility in self-evaluation and asking for help rather than just looking for a grade. Rather than be exploited by a testing system, they are empowered to make the test socially transparent. Rather than remaining ignorant of answers they wish to know, they can comprehend with the help of others and appreciate and be appreciated for mutual aid. Lastly, they go from being frozen by unanswerable questions to taking action in order to learn from others and to help others. The potential of social testing to foster action-taking (asking and giving) scaffolds and builds agency in students that can support them for a lifetime; this is what LAL should be about. Teaching students to dare to ask opens one’s life up to the liberty of learning.

Appendix A: Three quizzes Full NAME (romaji) & student number WAYS Quiz 1, Mon, 15 Oct, 2018 Write quickly what you know! In 20-minutes. Write the song lyrics & info on the back & put them in small boxes with their number one song & its number: 1 2 3 4 5 6 7 8 9 10 11 12 13

How are you? Five strategies for memorizing a long line? Why do you smile? What’s asking? How do you succeed? How do you have a good life? Ways of Improvisation Ways to reduce stress What’s more beautiful than a bird sitting in a tree? How do you like it here? Are you young? What three things do we do when we read Newsletters in class: Last lines or IMPORTANT points of stories/Videos:

256 Tim Murphey

BBB Tim’s DAD Chez Joan Matsuyama Woman Paradigm Shift Denmark TV 2 Advertisement Ms Liz’s Class Talking Twins 14 How can you do environmental engineering to learn more English? (3 examples please) 15 Why are telling embarrassment/mistake stories good for us?(3 things) 16 What are the three SSSs in Chapter 2 for and how do they help you learn? 17 What would you do if you were language hungry? 18 What is self-regulation? 19 Cry to the world ‘I’m in love!’ when you read this line! Done/Not Done 20 Approximately, how many people’s names do you know in this class?

1st score /100% 2nd score /100% 3rd score Who helped you? WHY? Who did you help? WDYH? What do you think of this test? WDYTOTT?

Full NAME (romaji) & student number WAYS Quiz 2, Mon, 26 Nov, 2018, Class #19 Write quickly what you know! In 20-minutes. Write the song lyrics & info on the back & put them in small boxes with their number one song. Take … 1 2 3 4 5 6 7 8 9 10 11 12

How do you eat well? How do you learn? Where do you belong? How do you write well? 5 Ways to Happiness What are you going to do today? SPURR PVA NPRM Write the 10 idioms in sentences that show their meaning. Who do you love? Stories/video last lines/main points and why important: Beatles Marilyn King

Testing abilities to understand 257

Going My Way Ride and Read Turtle with a Straw Student Voice #1 LLHs Student Voice #2 Job H Going Abroad Roller Coaster 13 14 15 16 17 18

How does an effective helper help you? E t, A, S y u, Ref rather than C, & C How are A student strategies different from C/D student strategies? What are Tim’s most frequent three words in this class? What are the advantages of doing IPQs? What would be your rejoinder if I said, ‘I won the billion-dollar lottery!’? Find a person you have never talked to, ask a question, & write their whole name: 1st score /100% 2nd score /100% 3rd score Who helped you? WHY? Who did you help? WDYH? What do you think of this test? WDYTOTT?

Full NAME (romaji) & student number WAYS Quiz 3, Thurs, 26 Jan, 2019, Class #2 17 Jan Write quickly what you know! In 20-minutes. Write the song lyrics on the back & put them in small boxes with their number SONG #1 Take … 1 2 3 4 5 6 7 8 9 10 11 12

What are you going to do today? Are you content? Who are you? What do you like? What’s the weather like? How are you? #2 What do you love? How do you change the world? 3FRIMS VAK SPURR How can Good Students make Good Teachers? List at least 5 or more ways on the back. 13 How many words from the memory test can you remember by yourself (no helping)?

258 Tim Murphey

Short answers: 1 2 3 4 5 6 7 8 9 10 11 12

Main point of the Rat Story: WDWWM? What do women want most? Does Sir Gawain choose Night or Day? And what happens? The last line of Candide: The Clay Buddha story: How do the monks in this story say hello and goodbye to people and why? Would you like to go for a drink tonight? What two ways can you understand this phrase: opportunity is nowhere? Put these words in correct order: the the more more animals playful intelligent also ones were Describe how someone gets ‘Learned Helplessness’: Give two examples of how you can change the world everyday: ‘Words don’t have meanings; people have meanings for words’. Give an example: Cry ‘MERRY CHRISTMAS’ out loud. Find a marker-person! Done/ Not done 1st score /100% 2nd score /100% 3rd score Who helped you? WHY? Who did you help? WDYH? What do you think of this test? WDYTOTT?

References Canfield, J., & Hansen M. (1995). The Aladdin factor: How to ask for what you want–and get it. Berkley Books. Creese, A., & Blackledge, A. (2010). Translanguaging in the bilingual classroom: A pedagogy for learning and teaching? The modern language journal, 94(1), 103–115. Doise, W., & Mugny, G. (1984). The social development of the intellect. Pergamon Press. Dufva, H. (2013). Language learning as dialogue and participation. In E. Christiansen, L. Kuure, A. Mørch, & B. Lindström (Eds.), Problem-based learning for the 21st century: New practices and learning environments (pp. 51–72). Aalborg Universitetsforlag. Freeman, D. (1998). Doing teacher research: From inquiry to understanding. Heinle & Heinle. Freire, P. (1985). The politics of education: culture, power, and liberation. (D. Macedo, Trans.) Bergin & Garvey Publishers. (Original work published 1985). Graves, D. (2002). Testing is not teaching: What should count in education. Heinemann. Guba, E., & Lincoln, Y. (1989). Fourth generation evaluation. Sage Publications. Hattie, J. (2012). Visible learning for teachers: Maximizing impact on learning. Routledge. Hyde, L. (1983). The gift: Imagination and the erotic life of property. New York. Lewis, B. (2019, July 18). Teachers should design student assessments. But first they need to learn how. Education week. https://www.edweek.org/ew/articles/2019/07/ 19/teachers-should-design-student-assessments-but-first.html

Testing abilities to understand 259 Murphey, T. (2017b, June 30). A 4-page condensed version of Tim Murphey’s book chapter ‘Provoking potentials: Student self-evaluated and socially-mediated testing’. Tomorrow’s ProfessorSM eNewsletter. Stanford University. https://tomprof.stanford. edu/mail/1581# Murphey, T. (2017c). Asking students to teach: Gardening in the jungle. In T. Gregersen, & P. MacIntyre (Eds.), Exploring innovations in language teacher education (pp. 251–268). Springer. Murphey, T. (2003). Assessing the individual: Theatre of the absurd. Shiken: JALT Testing & Evaluation SIG Newsletter, 7(1) 2–5. Murphey, T. (2018a). Bilingual songlet singing. Journal of Research and Pedagogy of Otemae University Institute of International Education and Hiroshima JALT, 4, 41–49. Murphey, T. (2012). In pursuit of wow!Abax. Murphey, T. (2019b). Innovating with ‘The Collaborative Social’ in Japan. In H. Reinders, S. Ryan, & S. Nakamura (Eds.), Innovation in language learning and teaching; The case of Japan (pp. 233–255). Palgrave Macmillan. Murphey, T. (1992). Music and song. Oxford University Press. Murphey, T. (2019a). Peaceful social testing in times of increasing individualization & isolation. Critical Inquiry in Language Studies, 16(1), 1–18. Murphey, T. (2017a). Provoking potentials: Student self-evaluated and socially mediated testing. In R. Al-Mahrooqi, C. Coombe, F. Al-Maamari, & V. Thakur (Eds.), Revisiting EFL assessment: Critical perspectives (pp. 287–317). Springer. Murphey, T. (1990). Song and music in language learning: An analysis of pop song lyrics and the use of song and music in teaching English as a foreign language. Peter Lang. Murphey, T. (2018b). Songlets for affective and cognitive self-regulation. Bulletin of the JALT: Mind, Brain, and Education SIG, 4(12), 22–25. Murphey, T. (2013a). Turning testing into healthy helping and the creation of social capital. PeerSpectives, 10, 27–31. Murphey, T. (1993).Why don’t teachers learn what students learn? Taking the guesswork out with action logging. English Teaching Forum, 31(1), 6–10. Murphey, T. (2013b). With or without you and radical social testing. Po`okela (Hawai’i Pacific University Newsletter, 20(69), 6–7. Palmer, A. (2013, February). The art of asking [Video file]. https://www.ted.com/ta lks/amanda_palmer_the_art_of_asking?utm_campaign=tedspread&utm_medium= referral&utm_source=tedcomshare Palmer, A. (2015). The art of asking. Grand Central Publishing. Roediger, H., & Finn, B. (2009,October 20). Getting it wrong: Surprising tips on how to learn. Mind Matters. https://www.scientificamerican.com/article/getting-it-wrong/ Treasure, J. (2011, July). 5 ways to listen better. [Video file]. https://www.ted.com/ta lks/julian_treasure_5_ways_to_listen_better?utm_campaign=tedspread&utm_m edium=referral&utm_source=tedcomshare Wormeli, R. (2018). Fair isn’t always equal: Assessment and grading in the differentiated classroom (2nd edition). Stenhouse.

Conclusion Language assessment literacy: The way forward Sahbi Hidri

Language assessment literacy (LAL) has been approached from different perspectives in different contexts; all chapters highlighted that the absence of an effective LAL agenda for learners, teachers, and decision-makers will undoubtedly lead to harmful effects on the future of all these stakeholders, as well as on exams, curricula, language programs, and assessment policies. In addition, all the chapters demonstrated that, contrary to standardized assessment, classroom-based assessment can enhance effective test-taking strategies and assessment techniques and unveil the actual assessment performance of learners. That is, these interfaces can help test-takers develop their potential to handle standardized and high-stakes exams. The chapters also highlighted how the link between assessment and learning is approached by English Language Teaching (ELT) practitioners in different parts of the world, and how teachers conceive of this, especially when they are faced with the dilemma of using standardized assessment for so many practical reasons. The findings of this work might be conducive to some more research on investigating the interfaces between assessment and learning because, in contrast to previous work on these interfaces, this book unveiled the different strategies and techniques of approaching this relationship, and how. LAL plays a key role in shaping the final products of exams, learning, teaching, language programs, and textbooks. After having edited and co-edited books on evaluation, assessment, and ELT research, I have realized that LAL needs to be further investigated and that the interfaces between assessment and learning have not been given their due importance, whether in second or foreign language assessment. There is a dearth of international research on LAL as well as on the interfaces between assessment and learning. Many practitioners operating in different parts of the world need to revisit these research areas, since they have been implemented in their assessment policy. Apart from some landmark publications on LAL, this research area has been overlooked for some time now because of the widespread use of standardized forms of assessment. This publication will serve as an additional contribution to these discussions of assessment and learning and how LAL should be

The way forward 261

approached. International conferences carried out in different ELT contexts indicate that this research area has not been given its due momentum in second or foreign language assessment.

Author Index

Abbas, 203, 204 AbdelWahab, 86, 88, 89, 90, 91, 96, Abudrees, 205, Ahmad, 160, 161, 166, 172, Al Asmari, 199, 201, 203 Al Hosni, 199, 200, Al Jahromi, 203, 204, Al Qahtini, 202, Al-Issa, 70, Al-Nasser, 203, Al-Shaboul et al, 204 Alderson and Banerjee, 31, 34, 43, Alderson and Wall, 4 Alderson et al., 179, 193, Alderson, 3, 4, 5, 179, 180, 227238 Alhamadi, 203, Allen & Negueruela-Azarola (2010), 23 AlRashid, 205, ALTE 85, 87, 91, 95, 110, 147, 151 Amengual Pizarro, 221 Anderson (2009), 222, 238 Anderson, 222 Antoniou & James, 24 Arkoudis, & O’Loughlin, 23, Arnaiz, P. & Guillen, 201 Asador et al., 127, Asari, 79, Asoodar, 111, Ata, 107, 111, 128, Bachman, & Palmer, 87, 95,179, Bachman, 33, 37, 135, 221, 238 Badia, H. (2015), 20 Badia, H. (2015). Bailey, K. M., & Brown, J. D. 63 Baker, B. A., & Riches, C. (2017), 19 Banerjee, 227, 228 Barkaoui, K., & Valeo, A, 21

Bashir, S., 201 Bawarshi & Reiff, 109, 126, 127, Beaumont, C., O’Doherty, M., & Shannon, L. (2011). 81 Benesch, S. (2001), 43 Berg, E. C., (1999). 81 Berger, A. (2012). 136, Berk, 109 Berry, V. & Munro, S. (2017), 20 Berry, V., Sheehan, S, & Munro, S.), 20, 21, 138, 140, 150, 151, Bhowmik, Hillman, & Roy (2018), 46 Bolitho. & West, 176, 177, Borich, 79, 82 Bourdieu (1991), 43 Braine, 80, 81 Brennan, 135, Brewster et al., 109, 110, Bridgeman, B., & Carlson, S. (1983), 38 Brindley, G. (2001). 63 Brown (2004), 3, 11, 21, 60, 63, 69, 82178, 126 Brown, G. A., Bull, J., & Pendlebury, 82 Brown, J. D., & Bailey, K. M. (2008), 22, 23 Brown, J. D., & Bailey, K. M. (2008). 63 Brown, J.D., & Hudson, T.D. (1998), 44, 46 Buck (2001), 222, 224, 228, 229, 237, 238 Budimlic, 71, 82 Buri, 110, Burton, 110, Buyukkarci, K. (2014). 138, 139, 151, 152, Bybee, R. W. (1997). 63 Bygate 200

Author Index 263 Cameron, 126 Campbell, C., Murphy, J. A., & Holt, J. K. 162 Canagarajah, A.S. (2006), 42 Canagarajah, A.S. (2016), 46 Canfield & Hansen (1995), 242, 243, 258 Carless, D. (2007). 178, 189 Carlsen (2018), 225, 238 Carlsen, 225 Carr, N.T. (2011), 37 Cauldwell (2018), 236, 238 Cauldwell, 236 Celce-Murcia, 200, 204, Celce-Murcia, 2013 199, 200 Chalhoub-Deville (2016), 237, 237, 238 Chapelle (2012), 221, 238 Chapelle, C.A. (2013). 179, Chapelle, C.A., 221 Chase, 1995. 172, Chen, T. (2016). 82 Cheng et al., 4, 26, 221, 235, 23, 238 Chomsky, N. (1965), 41 Coates, 202 Coombe, C., Folse, K., & Hubley, N. (2007). 162,180, 181,182, Council of Europe (2011)., 225, 238 Creese & Blackledge (2010), 242, 258 Creswell, 114, 141, 143, Crusan, D., Plakans, L., & Gebril, A. (2016) Csépes, I. (2014). 136, 138, Darling-Hammond, L. (1994), 43 Davies, 7, 11, 13, 23, 24, 29, 33, 37, 38, 40, 44, 46, 56, 61, 63, 137, 150, 159, 235, 238 Davies, A., Brown, A., Elder, C., & Hill, K. (1999), 33, 37 DeLuca, C., & Klinger, D. A. (2010). 138, Deneen, C. C., & Brown, G. T. L. (2016), 23 Derewianka, 109, 125 Dewey, 109 Deygers et al. (2018), 225, 238 Deygers et. al., 225 Djoub, Z. (2017). 140, Djoub, Z. (2017). 152, Doise, W., & Mugny, G. (1984), 248, 258 Dörnyei, 202 Dörnyei, Z. (2005), 217

Douglas, D. (2010), 36, 45 Douglas, D. (2010). 179, 189 Dovgopolova, I. V. (2011) 178 Duboc, A. P. M. (2009). 138, 139, 150, Duboc, A. P. M. (2009). 152 Dufva, H. (2013), 242, 258 Dunn, 3, Earl & Katz (2006), 217 Earl & Katz, 210 East, M. (2015), 23, 24 Edgeworth, F.Y. (1888), 36 Education Council of Oman (2017). Education Council of Oman (2017). 82 El Massah & Fadly, 111, 112, 126, Ellis, 78, 82 Elmenfi & Gaibani (2016), 217 Elmenfi & Gaibani, 201, 203, 204, Emmitt et al., 109 Engelsen, K. S., & Smith, K. (2014), 13 Esnawy, S. (2016), 46 European Commision, 221 Evans, N. W., Hartshorn, J., & Allen Tuioti., E. (2010). 82 Ezzi (2012), 217 Ezzi, 201 Fakhri (2012), 217 Fard, Z. R., & Tabatabei, O. (2018). 138, 151, Fard, Z. R., & Tabatabei, O. (2018). 151, Ferris, D. (1999). 82 Field (2008a), 222, 223, 224, 226, 236, 238 Field (2013a), 222, 224, 238 Field (2013b), 224, 238 Field (2017), 222, 224, 235, 238 Field, 222, 223, 224, 226, 235, 236 Foucault, M. (1980), 44 Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2011). 141, 143, Frederiksen and Collins (1989), 4 Freeborn, D. (2006), 40 Freeman & Freeman (2001), 108, Freeman, 242, 258 Freimuth,(2014). 107, 111, 112, 126, Freire (1985), 241, 242, 258 Fulcher, 3, 6, 14, 15, 21, 22, 23, 24, 34, 35, 37, 38, 39, 54, 56, 57, 59, 60, 61, 87, 136, 137, 138, 140, 150, 152, 159, 160, 162, 163, 179, 180, 220, 236, 238

264 Author Index Gan (2012), 217 Gan, 203 Garcia Laborda & Fernadez Alverez, 221 Gareis, C.R., & Grant, L.W. (2015). 180, Gebril & Hidri (2019), 217 Gebril & Hidri, 201 Gebril, A. (2017). 140, 151, Gilmore (2011), 224, 236, 239 Gilmore, 224, 236 Giraldo Aristizabal, 137, 139, 150 Giraldo Aristizabal, F. G. (2018). 153, Giraldo, F. (2018), 21 Giraldo, F. (2018). 136, Giraldo, F. (2018). 159, 160,180, Giraldo, F. (2018). 63 Giraldo, F. 63 Gitsaki et al. 107, 108, 111, 112, 126, Goody, J., & Watt, J. (1963). 63 Gotch, C. M., & French, B. F. (2014). 137, Graves, D. (2002), 247, 258 Green (2017), 226, 239 Green, A. (1998), 227, 239 Green, A. (2013). 180, 181, Green, A., 227 Green, R., 226 Gregersen (2003), 217 Gregersen, 201 Groves (2010), 47 Gu, P. Y. (2014), 227, 239 Gu, P. Y. (2014), 23, Gu, P. Y., 227 Guba, E. & Lincoln, Y. (1989), 255, 258 Hakim, B. (2015). 139, Hakim, B. (2015). 153, Hamid, O.M. (2014), 46 Hamid, O.M., & Baldauf, R.B. (2013), 47 Hamp-Lyons, L. (1998). Hamp-Lyons, L., 23, 43 Hamzah & Ting (2010), 217 Hamzah & Ting, 210 Hanifa (2018), 217 Hanifa, 201 Harden & Crosby, 211 Harden Crosby (2000), 217 Harding, L., & Kremmel, B. (2016), 16 Harding, L., & Kremmel, B. (2016). 63 Harding, L.,& Kremmel, B. (2016). Harlen, W. (2004). 178,180, Harmer 200

Harmer, 110, 126, 127, Harmer, 200, 210, Harmer, 217 Harsch, C., Seyferth, S., & Brandt, A. (2017), 19, Hasselgreen, A. (2008). 63 Hasselgreen, A., Carlsen, C., & Helness, H. (2004), 18, 20, 22, 23, 24, Hasselgreen, A., Carlsen, C., & Helness, H. (2004). 188 Hasselgreen, A., Carlsen, C., & Helness, H. (2004). 64 Hatipoglu, C. (2010). 140, 151, Hatipoglu, C. (2015). 151, 152, 153, Hattie (2012), 248, 258 Hattie, J. (2009). 82 Hattie, J., & Timperley, H. (2007). 82 Hawkins, J.A., & Filipovic, L. (2012), 39 He (2013), 217 He, 2013; 199 Heaton, J. B. (2011). 135, 136, 140, Heng et al. (2012), 217 Heng, Abdullah & Yosaf, 201, 202 Hernández Ocampo, S. P. (2017), 21 Herrera, L., & Macías, D. 150, Herrera, L., & Macías, D. 154, Hickey, R. (2015), 40 Hidri, 4, 8, 20, 23, 25, 135, 136, 137, 154, 163, 177, 200, 201, 217, 220, 239, 260 Hildén, R., & Fröjdendahl, B. (2018), 26 Hilden, R., & Frojdendahl, B. (2018). 151, Hildén, R.,& Fröjdendahl,B. (2018). 140, 151, Hill, K. (2017), 26 Hill, K. (2017b). 137, 150, Hill, K., & McNamara, T. (2012), 24 Hillerich, R. L. (1976). 64 Hillocks Jr, G. (1986). 82 Hismanoglu, 2013 199 Hismanoglu, M. (2013), 217 Horwitz et al. (1986), 218 Horwitz, Horwitz & Cope, 201, 202, 206 Houston, D., & Thompson, J.N. (2017). 177, 178, Howard, R.M. 163 Howerton, A. M. (2016). 138, Huang, J., & He, Z. (2016). 137, Hudaya, D. W. (2017). 138, Hudaya, D. W. (2017). 152, Hughes, 107, 111, 126, 127, 128,

Author Index 265 Hughes, 179, Huhta, A. (2007), 44 Hyde, L. (1983), 244, 258 Hyland, 109, 128, Hyland, 64, 82 Hyland, K., & Hamp-Lyons, L. (2002). 64 IELTS, 9, 18, 36 Inbar-Lourie, O. (2008), 13, 14, 16, 17, 25, Inbar-Lourie, O. (2008), 235, 239 Inbar-Lourie, O. (2008). 136, 137, 150, Inbar-Lourie, O. (2008). 159, 180, Inbar-Lourie, O., 235, International Language Testing Association (ILTA) (2000), 40 Irons, A. (2007). Irons, A. (2007). 82 Janatifar, M. & Marandi, S. S. 163, Jannati, S. (2015). 139, Jannati, S. (2015). 153, Jenkins & Leung (2014), 225, 239 Jenkins & Leung, 225 Jenkins, J. (2006), 46 Jenkins, J. (2014), 35, 44 Jeong, H. (2013), 22, 23, 24, Jeong, H. (2013). 136, Jin, Y. (2010), 22, 23 Jin, Y. (2010). 135, Jin, Y. (2010). 162, Johnson, 109, 110, Johnston et al. 112, Kachru, Y., 39, 40, 41, 46 Kaiser, G., & Willander, T. (2005). 64 Kalajahi, S. A. R. & Abdullah A. N. 162, Kane (2001), 237, 239 Kane, 237 Karagul, B. I., Yuksel, D., & Altay, M. (2017). 139, Kayaog˘lu & Sag˘lamel (2013), 218 Kayoaglu & Saglamel, 202, 203 Keck, (2006). 163, Kennedy, C., & Thorp, D. 164, Khadijeh, B., & Amir, R. (2015). 137, Kim (2006), 47, Kim, et al., (2017), 19 Kingen (2000) 200 Kingen (2000), 218 Kingen, 200

Kiomrs, R., Abdolmehdi, R., & Naser, R. (2011), 19, 22, 25, Kirkpatrick, A. (2006), 46 Kirkpatrick, A., & Deterding, D. (2011), 47 Klinger, C. J. T. (2016). 138, 139, 140, 150, Klinger, C. J. T. (2016). 152, Koh, K., & DePass, C. (2019). 64 Koo (2009), 218 Koo, 2009. 199 Koran (2015), 218 Koran, 204, 211 Krashen, S. (1982). 82 Krashen, S. (1987), 218 Krashen, S., 203 Krekeler, 111, Kremmel, B., & Harding, L., 17, 24 Kremmel, B., Eberharter, K. & Harding, L. (2017), 17, 21, 25 Kremmel, B.,& Harding, L. (2017). Kumaravadivelu, B. (2006). 64 Kvasova, O. (2016).191 Kvasova, O., & Kavytska, T. (2014), 20, 21, 22, 23, 24 Kvasova, O., & Kavytska, T. (2014). 178, Kyriacou, 109, 110, Laborda & Alvarez (2011), 221, 239 Lado, R. (1961), 38 Lam, R. (2015), 23 Lam, R. (2015). 138, 140, 151, Lam, R. (2015). 151, 152, Landis & Koch (1977), 228, 239 Landis & Koch, 228 Lantolf, J., & Thorne, S. L. (2007). 82 Lau, A.M.S. (2016). 177, Lee, I. (2014). 82 Leech et al., 126, Leung, C. (2007), 44 Leung, T., & Mohan, B. (2004), 24 Lewis, B. (2019), 258 Lightbrown, P.M., & Spada, N. (1999). 110, 126, Lightbrown, P.M., & Spada, N. (1999). 82 Lim (2014), 220, 225, 239 Lim, 220, 225, Lin, (2014)159, Lin, D. & Su, Y. 161, 162 Lindsay & Knight (2006), 218 Lindsey & Knight, 204, 210

266 Author Index Linn, R.L. (2000), 35 Liu (2012), 218 Liu, 199, 201 Liu, 2012; 199 López, A., & Bernal, R. 160, Lu & Liu (2011), 218 Lu & Liu, 2011; 199 Lukin et al., 126, 127, Lynch, B., & Shaw, P. (2005), 44 Lynch, B.K. (2001), 41, 42, 44 Lyster, R. (2011). Lyster, R. (2011). 82 Lyster, R., Lightbrown, P.M., & Spada, N. (1999). 82 Macaro et al. (2007), 223, 239 Macaro et. al., 223, Mahlberg, 126, Mahmoodzadeh (2012), 218 Mahmoodzadeh, 201, 202 Mai (2019) 161 Mak, 201, 202, Malone, M. (2013). 159, Malone, M. (2013). 64 Malone, M., 18, 23, 27, Mark (2011), 218 Mauranen, A., Llantada, C.P., Swales, J.M. (2010), 44 Mayor, B., et al. 164 Mazandarani, O., & Troudi, S. (2017), 23, 24, 25, McCarthy & Carter, 109 McCord, M. B. (2012). 82 McCroskey (1978), 218 McCroskey (2016), 218 McCroskey, 202 McGuire et al., 125, 127, McNamara & Hill, 180, McNamara & Hill, 4, 180, McNamara & Roever, 4, 8, 40, McNamara & Roever, 40, McNamara, 4, 8, 24, 37, 179, Mede, E., & Atay, D. (2017), 21, 22, 139, 140, 151, 152, 153, Meletiadou, E., & Tsagari D. (2016). 189 Mellati, M., & Khademi, M. 163, 169, 171, Mendoza, A. A. L., & Arandia, R. B. (2009). 64 Menken, K. (2008), 33, 38, 43 Mertler, C. A. 162, Messick, 7, 37, 40, 221, 239

Messick, 7, 37, 43, 221, Miles, M. B., & Huberman, A. M. (1994). 144, Mills, 114, Mohammadi & Mousalou (2013), 218 Mohammadi & Mousalou, 201 Moore & Morton, 111, 112, Moss, P.A., Pullin, D., Gee, J.P., & Haerbel, E.H. (2005), 35, 44 Munby (1978), 222, 239 Munby, 222 Muñoz, A.P., Palacio, M., & Escobar, L. (2012). Muñoz, A.P., Palacio, M., & Escobar, L. (2012). 139, Muñoz, A.P., Palacio, M., & Escobar, L. (2012). 178 Murphey, T., 241, 245, 246, 247, 248, 249, 254, 259 National research council, 210 Nawab, 6, Newfields, T. (2006). 138, Noels, K. (2001). 82 Norton, L. (2009). 178, 182 Nunan, D. 160, O’Loughlin, K. (2006). 136, O’Loughlin, K. (2013), 18, 22, 23, 24, 27, O’Sullivan, B. (2006), 218 O’Sullivan, B. (2011). 200 Olendr, T.M. (2015). 178 Ölmezer-Öztürk, E., & Aydın, B. 163 Onalan, O., & Karagul, A. E. (2018). 139, Oshima, A., & Hogue, A. 163, 165, Ounis (2017), 218 Ounis, 200, 201 Oz, H. (2014). 139, Oz, H. (2014). 152 Özl, S. & Atay, D. 153, 162, 170, Palmer (2013), 243, 259 Palmer, A. (2015), 243, 244, 259 Panahi & Mohammaditabar, 111, 112, 126, 127, 128, Park & French (2013), 218 Park & French, 201 Pathan et al, 203 Pathan et al. (2014), 218 Patton, M. Q. (2002). 141, Paulus, T. (1999). 83

Author Index 267 Peacock, M. (2001) 83 Pennycook, A. (1994), 42 Phakiti (2003), 227, 239 Phakiti, 227 Pill, J., & Harding, L. (2013), 15, 16, 23, 24, 41, Pill, J., & Harding, L. (2013). 180, 187 Pill, J., & Harding, L. (2013). 64 Pizarro (2009), 221, 238 Plake, B. S., & Impara, J. C. 162, Plake, B. S., & James, C. I. (1993), 19 Poehner, 4, 5, Popham, 6, 23, 33, 50, 159, 170, Qin & Uccelli, 108 Quirk (1990), 41, 47 Quirke & Zagallo, 110, 128, Quirke et al., 2009, 126, Rabab’ah, 203 Rabab’ah, G. (2016), 219 Raddaoui R., & Troudi S., 42 Rahimi, F., Esfandiari, M. R., & Amini, M. 159, Rauf & McCallum, 47 Raven, 107, 111, 126, Restrepo, E., & Jaramillo, D. (2017), 21 Richards (2008) 200 Richards, J. C. (2008), 219 Richards, J., 200 Roediger, H. & Finn, B. (2009), 247, 259 Rogier, D. (2014). 137, Roig, M. 163 Sadler, D. R. (1989). 83 Sahinkarakas, S. (2012). 139, Saka, F. O. 139, Saka, F. O. 153, Salman, 203 Salman, F. (2016), 219 Sariyildiz, G. (2018). 152, 153 Sariyildiz, G. 140, Sauvignon, S. J. (2005). 83 Saville- Troike, 202 Scarino, A. 135, Scarino, A. 171, Scarino, A. 64 Scarino, A., 15, 16, 17, 23, 25, Schissel, L. J., Leung, C., & Chalhoub-Deville, M. (2019), 25 Seargeant, P. (2012), 40

Semiz, O., & Odabas, K. (2016). 138, 139, 140, 151, Semiz, O., & Odabas, K. (2016). 151, 152, Seville-Troike, M. (2012), 219 Shackleton, C., 224, 228 Shackleton, C., 224, 228, 239 Shadrina T. (2014). 178, Sheehan, & Munro, (2017). 152, 153, Sheehan, & Munro, 140, Shepard, 3, 11, Shi, L. 167, 169, Shi, L., Fazel, I. & Kowkabi, N. 164, Shively, 210 Shively, R. L. (2008), 219 Shohamy & Inbar (1991), 224, 229, 239 Shohamy & Inbar, 224, 229 Shohamy, 4, 33, 35, 36, 37, 41, 42, 43 Shohamy, 4, 33, 35, 36, 37, 41, 42, 43, 224, 239 Shohamy, E., & Or, I. G. (2013). 64 Siegel (2015), 236, 239 Siegel, 236 Slavin, 109, 128, Smith, J. A., Flowers, P., & Larkin, M. (2009). 143, Spada 74, 82 Spencer, 110, 126, 128 Spielberg, 201 Spielberg, C. (1983), 219 Spolsky, B., 33, 35, 36, 37, 38, 41, 44 Stabler-Havener, M. L. 150, Stabler-Havener, M. L. 64 Stiggins, R. (1991), 13 Stiggins, R. (1991). 159, 161, Sultana, N. 161, 163 Sun, Y. C. & Yang, F. Y.163, 167, 169, Sun, Y. C. 164, Swaffar, J., Romano, S., & Arens, K. (1998). 83 Taha & Wong (2016), 219 Taha & Wong, 203 Taleb, 203 Taleb, S. M. (2017), 219 Talley & Hui-ling (2014), 219 Talley & Hui-ling, 200 Talley & Hui-ling, 2014). 200 Tanveer, 203, 204 Tanveer, M. (2007), 219 Tarone, 2005. 199 Tarone, 204

268 Author Index Tarone, E. (2005), 219 Taylor & Geranpayeh (2011), 225, 239 Taylor & Geranpayeh, 225 Taylor, 6, 7, 13, 14, 16, 17, 26, 53, 57, 58, 59, 60, 180, 225, 239 Teasdale, A., & Leung, C. (2000), 44 Teasdale, A., & Leung, C. (2000). 51 Thomas, J., Allman, C., & Beech, M. 160, Thorndike, E.L. (1904), 37, 38 Thorndike, E.L. (1904). 51 Tomlinson, B. (2010). 51 Torrance, Pryor, 4, 71, Treasure (2011), 241, 259 Trede, F., & Higgs, J. (2010), 42 Trede, F., & Higgs, J. (2010). 51 Trinity College London (2017). 65 Trudgill, P., & Hannah, J. (2008), 40 Trudgill, P., & Hannah, J. (2008). 51 Tsagari & Vogt (2017), 220, 240 Tsagari & Vogt, 2017 151, 153 Tsagari & Vogt, 220 Tsagari & Vogt. 138, 139, 140, 151, Tsagari, D. (2012). 152, Tsagari, D. 138, 139, 150, Tsagari, D. 65 Tsagari, et al., 180, ., 240 Tsui, A. B., & Ng, M. (2000). 83 Tupas, F.R.T. (2010), 46 Tupas, F.R.T. (2010). 51 Turk, M. 139, 140, 151, Ukrayinska, O. 137, 139, Ukrayinska, O. 152, Ur, 2012). 199 Ur, 205, 210 Ur, P. (2012), 219 Valeo, & Barkaoui (2017), 21 Vandergrift (2003), 232, 240 Vandergrift & Goh (2012), 222, 224, 236, 240 Vandergrift & Goh, 222, 224, 236 Vandergrift & Tafaghodtari (2010), 236, 240 Vandergrift & Tafaghodtari, 236 Vandergrift, 232 Verma, S. Viengsang, R. 136, Viengsang, R.154, Villa Larenas, S. (2017), 21

Vogt, & Tsagari, 6, 20, 21, 22, 23, 24, 27, 54, 59, 60, 138, 139, 140, 151, 153, 180, 188, 220 Volante, L., and Fazio, X. 154, Vygotsky, 109 Vygotsky, 203 Vygotsky, L. S. (1986), 219 Wach, A. 135, 139, Wach, A. 152, 153, Wagner (2014), 221, 222, 224, 240 Wagner & Toth (2017), 224, 236, 240 Wagner & Toth, 224, 236 Wagner, 221, 222, 224 Wall (2013), 220, 221 Wall (2013), 4 Wall, & Alderson, 4 Wall, 220, 221, Wall, 220, 221, 240 Wall, 4, 220, 221, Wang (2010), 219, 235, 240 Wang, J., 235 Wang, T., 201 Warschauer, M. (2002). 83 Weigle cited in Ahmad,2019 161, Weir et al. (2013), 33, 35, 37 Weir, (2005), 226, 240 Weir, 226 White, E. 138 Wilson 86, 89, 90 Wilson, 110, 111, 128 Wilson, 127, Wingate & Tribble, 110, 126 Woodrow, 201, 202, 210 Woodrow, L. (2006), 219 Wormeli (2018), 247, 259 Xu, Y., & Brown, G. T. L. (2017), 8, 21, 24 Yahya, 201, 203 Yahya, M. (2013), 219 Yaikhong & Usaha (2012), 219 Yaikhong & Usaha, 2012 199 Yaikhong & Usaha, 202 Yamada, K. 164, Yan, J. (2010). 140, Yan, J. (2010). 65 Yan, X., Zhang, C., & Fan, J. J. (2018), 19, 22, Yan, X., Zhang, C., & Fan, J. J. (2018). 139,

Author Index 269 Yan, X., Zhang, C., & Fan, J. J. (2018). 152, 153, Yan, X., Zhang, C., & Fan, J. J. (2018). 65 Yang (2000), 235, 240 Yastibas, A. E., & Takkaç, M. (2018), 22, 139, Yastibas, A. E., & Takkac, M. (2018). Yi’an (1998), 235, 240 Young, 201,

Young, D. J. (1994), 219, 240 Yousif, 201 Yusof, R. (2016), 219 Zeichner & Liston, 109, 110, Zhang (2012), 236, 240 Zhang, 236 Zheng, Y. (2014). 51 Zhou, 201 Zhou, M. (2016), 219

Index

Academic 50, 51, 53, 64, 71, 81, 83, 97, 159, 160, 161, 162, 163, 164, 165, 166, 167, 171, 176, 178, 182, academic literacy 53, 71, 159, 160, 163,164, 165, 169,170, 171,172, academic writing, 44 Academic, 9, 10, 18, 34, 36, 38, 44, 46, 204, 211, 214, 216, 218, 221, 225, 226, 237, 239 Accent (native and non-native speaker varieties), 210, 224–26, Accountability 6, 7, 54, 56, 177, 178, 238, 255 Accuracy 45, 88, 98, 99, 177, 179, 182, action logging, 245, 259 Aladdin Factor, 242, 258 Alternative assessment 4, 9, 18, 33, 34, 39, 41, 44, 45, 91, 151, 154, 182, 183, 184, 185, 188, 189, 193 Anxiety 180, 200, 201–10, anxiety, 217, 218, 219 apprehension, 201, 202, 203, 204, 206, 210, 218 assessing speaking, 199–200, 204, assessment 3–28, 33–35, 44, 46, 47, 151, 152, 153, 154, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 187, 188, 189, 190, 191, 192, 193, 199, 200, 51–100, 217, 218, 220, 221, 224, 225, 236, 237, 238, 239, 240, 241, 247, 254, 258, 259, 260, 261 assessors 5–8, 10, 17, 58, 152, 153, 164, 169, 172, 177 assurance 85, 87, 96, 98, 99, 176, 178, 182, 183, 185, 186, 192,

Basic, 15, 21, 39, 151, 153, 160, 172, 180, 225, 241, call report, 245 candidates, 33, 37, 39, 46, 87, 224, 228, 230, 233, CEFR, 39, 217, 220, 221, 223, 225, 226, 228, 233, 235, 237, 238, 239, clarity 84, 94 classroom teachers 72, 203, 204, 205, 208, 209, 210, 211, 220, 221, 235, 236 cognitive 95, 199, 210, 222, 223, 226, 227, 228, 230, 232, 235, 237, 238, 239, 249, 259 coherence 93, 162, cohesion 93, community, 7, 36, 46, 53, 62, 70, 71, 74, 80, 82, 171, 200, 172, 218, 219, 225, 241, 250, 247, Competence-based curriculum, 220, 221 Components, 7, 15, 25, 26, 53, 55, 56, 58, 87 concept 7–10, 13–28, 37, 43, 47, 52–64, 73, 81, 92, 99, 151, 154, 160, 179, 192, 253 conceptions, 4, 9, 10, 14, 15, 17, 20, 21, 24, 28, 163, connected speech, 222, 224, 236 construct 51, 56, 64, 74, 80, 87, 96, 154, 159, 160, 161, 162, 163, 179, 181, 186, 187, 188, 193, construct, 4, 10, 14, 15, 17, 18, 37201, 217, 221, 222, 223, 224, 225, 226, 227, 228, 235, 236, 237, 238, 239 Content validity, 37 Context, 4, 6–27, 34–35, 38–41, 52, 56, 57, 59, 61, 65, 69, 70–73, 75–78, 81, 86, 89, 91, 92, 95–98, 152, 153, 160,

Index 271 161, 163, 165, 171, 178, 179, 180, 181, 184, 185, 193, 200, 203, 204, 221, 222, 223, 224, 225, 227, 228, 229, context, 221, 222, 223, 224, 225, 227, 228, 229, 235, 237, 238, 239, 247, 260, 261 Correlation 167, 168, 170, Correlation, 201, 202, 207, 209 Correlation, 9, 19 course-based, 20, 39, 46, 151, 154, 152, 161, 164, 165, 166, 167, 168, 169, 170, 180, 181, 187, 188, 204 criterion-referenced, 3, 11, 179, critical attitude, 241 Critical Language Testing, 34, 41 culture, 14, 15, 28, 40, 41, 44, 69, 81, 202, 229, 258 curriculum, 5, 6, 39, 42, 43, 45, 47, 55, 56, 70, 88, 176, 182, 186, 187, 199, 217, 221, 225, 235, 236, 237, descriptive statistics 167, 168, 206 dimensions of assessment, 16, 199, discourse analysis, 24 dynamic assessment, 239 EAP, 9, 33, 34, 37, 41, 44, 47 education, 3, 14, 19, 22, 27, 34, 35, 42, 44, 52–55, 57, 59–65, 70, 81, 82, 88, 199, 153, 154, 161, 176, 177, 178, 180, 182, 183, 190, 191, 199, 210, 217, 218, 220, 220, 221, 225, 237, 239, 247, 258, 259 Educational reforms, 221, 235, 237, Effectiveness 89, 92, 184, 190, 191, EFL, 9, 10, 19–24, 64, 69, 83–91, 93, 95–97, 151, 152, 153, 154, 161, 162, 163, 165, 166, 171, 200, 201, 204, 206, 207, 209, 217, 218, 219, 239, 240, 259, ELT practitioners, 260 English 51, 61, 63–65, 69–71, 81–84, 88, 96, 161, 165, 176, 199, 200 English as a Foreign Language, 19, 41, 69, 81, 160, 161, 171, 200, 240, 259, English language testing, 9, 18, 35, 161, 221, 222, 235 English, 9, 18–22, 26, 27, 33–46, 199, 217, 218, 219, 200, 203, 204, 205, 206, 207, 209, 221, 224, 225, 237, 238, 239, 240, 249, 252, 256, 259, 260

exam-oriented 151, 153, external, 8, 28, 176, 179, 186, fair, 4, 10, 42, 218, 166, 170, 178, 179, 180, 182, 183, 184, 185, 188, 235, 259 feedback 69, 70–83, 152, 154, 171, 177, 178, 182, 183, 184, 185, 188, 189, 192, feedback, 211, 220, 248, 249, 250, 251, 252, 253, Foreign Language Anxiety, 217, 218 foreign language teaching, 199, 200, 201, 203–211, 218, 200, 221, 222, 224, 235, 236 formative assessment, 5, 7, 71, 76, 82, 83, 151, 154, 177, 178, 179, 220, 236, 247, 254, frameworks, 7, 14, 15, 24, 25, 52, 54, 55, 62, 160, 200, 227, functional literacy, 15, 16, 57, functions 54, 84, 97, 99, 199, 200, functions, 7, 38, 236 genre, 45, 224 grammar, 22, 40, 41, 80, 82, 93, 96, 151, 154, 166, 167, 170, 204 high-stakes tests 186, 178, 185, 186, 220, 236 historical, 14–16, 35, 47, 54, 55, 57, 160, holistic assessment 162, hypothesis 181, 203, 204, 222 IELTS, 9, 18, 35, 36, 38, 39, 44, 161, Illiteracy, 15, 16, 57 impact of testing, 14, 55, 160, 221 implications 10, 63–65, 81, 97, 201, 236, 153, 159, 160, 161, 165, 169, 171, 193, implications in-service courses, 6, 236 inaccurate paraphrase 163, institutional, 151, 59, 161, 171, 172, 176, instruction, 4, 5, 9, 10, 22, 43, 54, 72, 74, 79, 82, 88, 89, 99, 199, 204, 205, 206, 208, 225, interfaces, 8, 10, 199, internal, 81, 76, interpretative, 14, 171, 200 interpretative, 200 journal, 9, 44, 56, 64, 81, 82, 83

272 Index Knowledge, 3–22, 26, 28, 34–44, 51–54, 56–65, 69, 74, 76, 79, 95, 98, 151, 152, 153, 160, 161, 162, 163, 171, 190, 193, 201, 220, 221, 222, 227, 229, 233, 235 L2 82, 83, 200–211, 222, 226 LAL training 151, 152, 153, 154, 160, 171, 236 Language 151, 152, 153, 154, 159, 161, 164, 167, 176, 180, 188, 199, language assessment 51–100, 151, 154, 159, 160, 161, 162, 163, 171, 172, language assessment literacy, 51–100, 159, 160, 220, 220–235, 237 language education 52, 59, 62, 63, 161, 199, 210, 220, language learning 52, 62, 81–83, 86, 88, 100, 201, 152, 154, 202, 205 language pedagogy, 58 220, 236 language proficiency 171, 200 language skills 61, 151, 152, 154, 159, 160, 163, 222 language teaching 54, 61, 64, 82–84, 88, 89, 159, 177, 199–211 language testing 51, 61–65, 85, 151, 152, 153, 159, 161, 162, 177, 179, 180, 221, language trait 161, 223 language, 70, 88, 201, 202, 203, 204, 206, 210, 222, 224, 225, 236 large-scale testing 54, 61, 91, 179, 180, 182, learning 51–52, 56, 60–62, 64, 69, 70–77, 79, 81–83, 86, 88, 89, 91–94, 97, 98, 100, 151, 152, 154, 160, 161, 171, 176, 177, 178, 180, 184, 188, 192, 200, learning 51–53, 56, 60–62, 64, 69–77, 79, 81, 82, 83, 86, 88, 89, 91–94, 97, 98, 100 learning outcomes 72–76, 220 Learning, 3–11, 14, 20, 21, 27, 33, 42, 44, 200, 201, 202, 204, 206, 210, 217, 218, 219, 220, 221, 230, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 246, 247, 248, 249, 250, 252, 253, 254, 255, 258, 259, 260 levels 54, 56–59, 70, 74, 88151, 154, 161, 162, 163, 164, 165, 171, 172, 176, 180, 200, 202, 206, 209, 210, 222, 228, 235

Limitations 90, 163, 171, 192,200, 227 listening 61, 84, 87, 93, 203, 209, 211, 220–237 literacy 51–55, 57–100, 154, 159, 160, 162, 163, 164, 165, 166, 170, 171, 172, 177, 180, 181, 182, 184, 186, 189, 193, literacy, 3–27, 238, 240, 247, 254, 260 local practices 58, 59, 184 long-term 84, 172, 191, 211, 222, measurement 55, 56, 161, 162, 164, 167, 171, 177, 178, 180, 188, measurement scale 160, 170, 171, Metacognitive strategies, 223, 227, Method 64, 69, 86, 92, 94, 160, 161, 171, 178, 181, 183, 192, 205, 227, mid-term 182, 183, 186, multidimensional literacy 57 narrative genre, 200 negotiation 74 nominal literacy 57 non-parametric correlation analysis 167, norm-referenced 179, novice EFL teachers 151, 152, 153, 154, opinions 151, 152, oral proficiency 200, 199, 200, 204, 211, outcomes 52, 53, 72–76, 82, 88, 97, 161, 171, 188, 220 overrepresentation 187, paradigm 51,180, paraphrasing 159, 160, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, Participants 95, 100, 152, 159, 162, 163, 165, 170, 171, 184, 227, 228, 229, 230, 231, 233, 235 patchwriting 163, 164, 167, 168,169, pedagogy 58, 59, 78, 220, 236, 178, perception 82, 92, 200, 201, 153, 154, 159, 162, 172, 177, 182, 183, 184, 185, 187, 188, 191, 192, 200 Performance 53, 54, 56, 74, 77, 81, 84, 86, 88, 91, 93, 96, 99, 100, 159, 160, 161, 163, 165, 167, 168, 169, 170, 172, 178, 179, 180, 189, 199, 200, 201, 202, 204, 205 Positive washback, 220–237 practical knowledge 153, practical skills 61, 152,

Index 273 practices, 4–28, 33, 34, 37, 41, 42, 44, 47, 152, 153, 154, 160, 165, 170, 171, 172, 177, 178, 181, 183, 184, 185, 186, 191, 192, 193, 201, 203, 204, 207, 208, 209, 211, 220, 221, 258, practicum 152, 154, 162, pre-service teachers 60, 84 principles 54, 56–60, 64, 81, 82, 84, 94, 154, 159, 160, 180, 181, 182, 185, 192, principles and concepts, 58–59, 160, principles, 236 procedural and conceptual literacy 57 productive skills 170, productive skills, 199, 204, 208 progress 70, 71, 76, 79, 81, 90, 172, 178, 193, 220 Purpose for listening, 222, 226, 229, 236 qualifications 165, qualitative data, 227, 228 quality 63, 70, 71, 81, 85, 87, 92, 94, 164, 171, 172, 176, 178, 179, 180, 181,182, 183, 185, 186, 187, 190, 192, 193, 222 reading 53, 61, 79, 84, 88, 90, 93, 199, reliability 56, 97, 164, 176, 177, 178, 179, 180, 181, 182, 183, 186, 189, 191, 192, 193, 228 repetition 167, 224 research 151, 152, 154, 159, 160, 162, 164, 165, 168, 169, 172, 178, 181, 182, 183, 185, 186, 190, 191, 193, 199, 200, 201, 203, 222, 227, Retrospective verbal protocol, 227, 228, 229 rubrics 73, 75, 84, 204, 159, 160, 161, 162, 164, 165, 166, 167, 168, 169, 170, 171, sample size 166, 171, 205, 227 scores 152, 160, 164, 165, 167, 168, 169, 170, 179, 188, 191, 207, 233, 235 Second Language Acquisition 82, 199, 221 short-term 190, 191, 193, 227 Skills 53–56, 58, 59, 61, 73, 74, 76, 77, 79–81, 84, 92, 96, 99, 151, 152, 153, 154, 159, 160, 161, 163, 165, 166, 170, 171, 178, 180, 181, 199, 200,

201, 203, 204, 205, 210, 220, 222, 225, 227, 233, 235, 236 social 51, 52, 54–57, 59, 71, 75,160, 167, 171, 172, 199,200 social context 56, 71,200 social interaction, 204 Social-Constructivism, 203 sociocultural values 58 source text 163, 169, 172,224, 226, speaking 61, 84, 87, 88, 93, 95, 199 Speaking Anxiety, 199–216, 200 – 210, 217, 218, 219 speaking, 199–216 staff seminars 190, 192, stakeholders 54, 55, 57–62, 64, 91,159, 177, 181, 182, 184, 185, 191, 193 Standard English 51, 224 standardized tests 161, 176, 180, 220, Strategic competence, 222 structure 74, 75, 84, 85, 87, 89, 95, 97, 98, 100, 163, 164, 169, 187, 188, 199, students 61, 67, 69, 70–82, 84, 85, 88–100, 153, 159, 162, 163, 165, 166, 168, 169, 170, 171, 172, 176, 177, 178, 180, 181, 182, 183, 184, 185, 187, 188, 189, 192, 193, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 220, 221, 225, 236 Students, summative assessment 71, 151, 160, 165, 176, 177, 178, 181, 182, 183, 184, 185, 188, 189, 190, 192, 193, 204, 205, 210, 220 system 65, 70, 76, 82, 154, 165, 171, 172, 177, 176, 201, 202, 204, 205, 210, 221, 225 Target language use domain, 224, 225 teacher 51, 55, 58, 59, 60–62, 64, 70, 72, 76, 78, 82–84, 88, 89, 90, 92, 94, 96, 98, 151, 152, 153, 154, 159, 160, 161, 162, 163, 165, 166, 167, 169, 170, 171,172, 176, 177, 178, 179, 180, 181,182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, teacher educators 154, teacher training 61, 64, 177, 181, 220, teacher training programs 161, 178, teacher, 203, 204, 205, 208, 210, 211, 220, 221, 235, 236 teachers’ assessment literacy 64, 163, 169, 177, 180, 182, 220,

274 Index teaching career 154, teaching experience 95 teaching practice 152, teaching practice 71, 97, 205, 206, 207, 209, teaching, 62, 69, 70, 74, 81, 83, 89, 151, 152, 153, 154, 159, 160, 162, 163, 165, 171, 172, 176, 177, 178, 179, 186, 188, 199, 200, 201, 203, 204, 205, 208, 210, 211, 220, 221, 222, 224, 235, 236 test 51, 54, 56, 58–60, 64, 84, 87, 88, 91, 93, 96, 151, 152, 153, 154, 159, 160, 161, 162, 164, 165, 166, 168, 169,170,171, 176, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 192, 193, test construction 87, 161, test design 91, 161, 162, 185, 189, Test development 56, 176, 181, 182, 183, 185, 186, 192, 193, 221, 222 Test preparation, 220 test results 54, 159, 179, 182, 188, 192, test scores 152, 160, 164, 165, 168, 169, 170, 179, 188 test scores, 233, 235 Test specifications, 226, 227, 236 test takers 159, 187, 227 Test washback, 220–237 test, 210, 220, 221, 222, 224, 225, 227, 228, 229, 232, 233, 235, 236 testing criteria 152, testing policy 153, testing practices 60 testing procedures 152, testing skills 151, 152, 153, testing system 161, testing techniques 152, 181, testing tools 152, 153, 154, text length 164, 167, 168, 169, 170, Text mapping protocol, 226

theoretical knowledge 54, 153, 171, theories 57, 62, 82, think alouds, 227, 228 TOEFL 161, 235 training programs 60 transparency 167, 181, trustworthiness 179, 181, 182, 189, 191, 193, under-representation 186, 224 undergraduate students 165, 166, 202, 203, 206, uniformity 171, 182, 183, 184, 185, 186, 192, university entrance tests, 220, 221, 224, 225 university, 79, 151, 162, 163, 176, 177, 181, 183, 184, 193, 200, 203, 205, 225, valid 55, 161, 177, 180, 182, 185, 187, 235 validity 56, 97, 179, 180, 181, 183, 186, 187, 188, 221, 228 Validity evidence, 221 vocabulary 93, 96, 154, 151, 205, 228, 235 washback 182, 183, 184, 185, 188, 220–237 washback effect 154,155, 221 words 51, 80, 95, 163, 165, 167, 168, 169, 170, 222, 224, 229, 236 writing 52, 53, 56, 59, 61, 63, 71–75, 77–84, 88, 100, 159, 160, 161, 162, 163, 165, 166, 167, 169, 170, 171, 172, 181, 199, 199, 203, 204, writing examiners 161, writing exams 161, Zone of Proximal Development, 204