Advances in Corpus-Based Research on Academic Writing: Effects of Discipline, Register, and Writer Expertise 902720506X, 9789027205063

This volume showcases some of the latest research on academic writing by leading and up-and-coming corpus linguists. The

213 26 13MB

English Pages 358 [366] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Advances in Corpus-Based Research on Academic Writing: Effects of Discipline, Register, and Writer Expertise
 902720506X, 9789027205063

Table of contents :
Advances in Corpus-based Research on Academic Writing
Editorial page
Title page
Copyright page
Table of contents
Introduction: Advances in corpus-based research on academic writing
Acknowledgements
References
Part I. Focus on writer expertise and nativeness status
A corpus-based study of academic word use in EFL student writing
1. Introduction
2. Background
2.1 Linguistic variation in university registers
2.2 Academic vocabulary
2.3 The English as a Foreign Language (EFL) context
3. Methodology
3.1 The corpus
3.2 Analytical procedures
4. Findings
4.1 Courses
4.2 Assignments
5. Discussion
6. Conclusion, implications and limitations
References
Appendix A. Sample assignments
Appendix B. Text extracts
Give constructions in Korean EFL learner writing
1. Introduction
2. Literature review
2.1 Constructions of the verb give
2.2 Contrastive corpus studies and L1 influence in corpus studies
2.3 The current study
3. Method
3.1 Corpora
3.2 Data extraction
3.3 Data analysis
4. Findings
4.1 Constructions and frequencies
4.2 In-depth analysis of ditransitive constructions
4.3 L1 influence found in collocations and idiosyncratic usage
5. Conclusions
5.1 Limitations
5.2 Future directions
References
A corpus-based exploration of constructions in written academic English as a lingua franca
1. Introduction
2. Literature review
2.1 A brief overview of written academic ELF research
2.2 Corpus-based phraseology and Construction Grammar
2.3 The current study
3. Data and method
3.1 Corpora used
3.2 Analysis
4. Results
4.1 KFW exploration 1: Of
4.2 KFW exploration 2: In
4.3 Construction candidates with KFWs in and of
4.4 KFW exploration 3: On
5. Discussion and conclusion
5.1 Core constructions of written academic ELF
5.2 Limitations and future directions
References
The influence of sources on first-year composition L1 student writing: A multi-dimensional analysis
1. Introduction
2. Literature review
3. Method
3.1 Participants and setting
3.2 Instruments and materials
3.3 Procedures
3.4 Analysis
4. Results and discussion
4.1 Dimension 1: Source-based concept density vs. Prompt-based freewriting
4.2 Dimension 2: Impersonal extension of source-based concepts
4.3 Dimension 3: Source text deixis
5. General discussion
6. Limitations
7. Conclusion
References
Appendix A. Prompts
Appendix B. Word list variables
Appendix C. Full factor loading table
Students’ use of lexical bundles: Exploring the discipline and writing experience interface
1. Introduction
2. Literature review
2.1 Lexical bundles
2.2 Lexical bundles in academic discourse
2.3 Students’ use of lexical bundles
3. Methods
3.1 Lexical bundles in medical research articles
3.2 Corpora
3.3 Data collection
4. Results
4.1 Use of all bundles
4.2 Use of research-oriented bundles
4.3 Use of text-oriented bundles
5. Discussion
6. Conclusion
References
Part II. Focus on disciplinary variation
Combining rhetorical move analysis with multi-dimensional analysis: Research writing across disciplines
1. Introduction
1.1 Move analysis
1.2 Investigating the linguistic characteristics of moves
1.3 Multi-dimensional analyses of moves
2. The ISURA corpus: Design, compilation, and move annotation
2.1 Design and compilation
2.2 Development of the IMRD/C move/step framework
2.3 Annotating the ISURA corpus for moves and steps
3. Applying MD analysis to the move-annotated ISURA corpus: Methods
3.1 Preparing the move-annotated ISURA corpus for analysis: Text segmentation and tagging
3.2 Obtaining rates of occurrence for linguistic features: Adapting to a specialized domain and short texts
3.3 Conducting the statistical analyses
3.4 Calculating dimension scores and interpreting dimensions
4. Dimensions of variation across rhetorical moves
4.1 Dimension 1: Interpretation and expansion vs. simple reportage
4.2 Dimension 2: Abstraction / Overt empiricism
4.3 Dimension 3: Procedural narration
4.4 Dimension 4: Interpreting results vs. informational density
5. Applying and evaluating the MD analysis of moves
5.1 Evaluation of the dimensions and process
5.2 Implications and applications
Acknowledgements
References
Appendix A. Scree plot for Principal Components Analysis
Lexical bundles across disciplines: A look at consistency and variability
1. Introduction
2. Methods
3. Results and discussion
3.1 Bundles shared across disciplines
3.2 Bundles with a productive variable slot
3.3 Nuanced meanings of variable slot fillers
3.4 Pedagogical applications
4. Conclusion
References
Lexical bundles as reflections of disciplinary norms in Spanish and English literary criticism, history, and psychology research
1. Introduction
2. Review of related literature
2.1 LBs and their connection to argumentation in disciplines
2.2 Lexicogrammatical features and their connection to argumentation
3. Data and methodology
3.1 Corpus
3.2 LB identification
3.3 Statistical analysis
4. Results
4.1 Preliminary study: Replicating MacDonald’s study of sentence subjects
4.2 Main study: Subcorpus-level differences at the formulaic language level
5. Discussion
5.1 “Facts”: The most frequent epistemic bundles in all disciplines
5.2 Further expressions of epistemicity in psychology writing
5.3 Equivalent phenomenal bundles in all disciplines
5.4 History-specific phenomenal bundles
5.5 Psychology-specific phenomenal bundles
6. Conclusion
References
Adjectives as nominal pre-modifiers in chemistry and applied linguistics research articles
1. Introduction
2. Methods
3. Results and discussion
3.1 NP lexical similarities
3.2 Types of attributive adjectives and their NP function
3.3 NP size and NP sequence frequency
4. Conclusion
References
The use of lexical patterns in engineering: A corpus-based investigation of five sub-disciplines
1. Introduction
1.1 Multiword sequences in discipline-specific texts
2. Method
2.1 Construction of engineering corpora
2.2 Identification of target phrase-frames
2.3 Structural and functional analysis of phrase-frames
2.4 Overlapping of phrase-frames
3. Results and discussion
3.1 Distribution and variability of phrase-frames
3.2 Structural categories for phrase-frames
3.3 Functional categories for phrase-frames
3.4 Findings for overlapping PFs
4. Pedagogical applications of findings
5. Conclusion
References
Appendix
Stance in unpublished student writing: An exploratory study of modal verbs in MICUSP’s Physical Science papers
1. Introduction
2. Studies about stance
2.1 Stance in student writing
2.2 Modal verbs as stance markers in academic writing
3. Methods and materials
3.1 Corpus
3.2 Data analysis
4. Results
4.1 Possibility modals
4.2 Prediction modals
4.3 Necessity modals
4.4 Disciplinary variation
4.5 Student level
4.6 Nativeness
4.7 Registers
5. Discussion and pedagogical implications
6. Future directions
7. Conclusion
References
Appendix A.
Modal verbs
Algorithm
Appendix B.
Appendix C.
Part III. Focus on register variation
P-frames and rhetorical moves in applied linguistics conference abstracts
1. Introduction
2. Methodology
2.1 Corpus
2.2 Procedures
3. Results
3.1 The distribution of p-frames across the rhetorical move-steps
3.2 The association of p-frames with individual rhetorical moves/steps
4. Summary and conclusion
References
Appendix A. Phrase-Frames in the 2017 AAAL conference abstracts sorted by their primary function
Stand-alone literature reviews: A new multi-dimensional analysis
1. Introduction
2. Methods
2.1 Corpus
2.2 Statistical analyses
3. Results
3.1 Dimension 1: Human vs. technical/academic focus
3.2 Dimension 2: Questioning/interpreting vs. knowledge-conferring
3.3 Dimension 3: Expression of stance
3.4 Dimension 4: Author/discourse community vs. topic focus
3.5 Dimension 5: Abstract vs. concrete focus
3.6 Dimension 6: Methodological concerns vs. description
4. Conclusion
References
Appendix. Means and Standard Deviations for Dimensions 1 to 6
A multi-dimensional view of collocations in academic writing
1. Introduction
2. Collocational multi-dimensional analysis
3. Collocational multi-dimensional analysis of academic writing
4. Dimensions of collocation in academic writing
5. Conclusion
Acknowledgements
References
Name index
Subject index

Citation preview

Advances in Corpus-based Research on Academic Writing Effects of discipline, register, and writer expertise edited by Ute Römer Viviana Cortes Eric Friginal Studies in Corpus Linguistics

95 JOHN BENJAMINS PUBLISHING COMPANY

Advances in Corpus-based Research on Academic Writing

Studies in Corpus Linguistics (SCL) issn 1388-0373

SCL focuses on the use of corpora throughout language study, the development of a quantitative approach to linguistics, the design and use of new tools for processing language texts, and the theoretical implications of a data-rich discipline. For an overview of all books published in this series, please see benjamins.com/catalog/scl

General Editor

Founding Editor

Ute Römer

Elena Tognini-Bonelli

Georgia State University

The Tuscan Word Centre/University of Siena

Advisory Board Laurence Anthony

Susan Hunston

Antti Arppe

Michaela Mahlberg

Michael Barlow

Anna Mauranen

Monika Bednarek

Andrea Sand

Tony Berber Sardinha

Benedikt Szmrecsanyi

Douglas Biber

Elena Tognini-Bonelli

Marina Bondi

Yukio Tono

Jonathan Culpeper

Martin Warren

Sylviane Granger

Stefanie Wulff

Waseda University

University of Alberta University of Auckland University of Sydney Catholic University of São Paulo Northern Arizona University University of Modena and Reggio Emilia Lancaster University University of Louvain

University of Birmingham University of Birmingham University of Helsinki University of Trier Catholic University of Leuven The Tuscan Word Centre/University of Siena Tokyo University of Foreign Studies The Hong Kong Polytechnic University University of Florida

Stefan Th. Gries

University of California, Santa Barbara

Volume 95 Advances in Corpus-based Research on Academic Writing Effects of discipline, register, and writer expertise Edited by Ute Römer, Viviana Cortes and Eric Friginal

Advances in Corpus-based Research on Academic Writing Effects of discipline, register, and writer expertise Edited by

Ute Römer Viviana Cortes Eric Friginal Georgia State University

John Benjamins Publishing Company Amsterdam / Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

Cover design: Françoise Berserik Cover illustration from original painting Random Order by Lorenzo Pezzatini, Florence, 1996.

doi 10.1075/scl.95 Cataloging-in-Publication Data available from Library of Congress: lccn 2019055545 (print) / 2019055546 (e-book) isbn 978 90 272 0506 3 (Hb) isbn 978 90 272 6145 8 (e-book)

© 2020 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Company · https://benjamins.com

Table of contents

Introduction: Advances in corpus-based research on academic writing Ute Römer, Viviana Cortes and Eric Friginal

1

Part I.  Focus on writer expertise and nativeness status A corpus-based study of academic word use in EFL student writing Eniko Csomay Give constructions in Korean EFL learner writing Yunjung Nam

9 33

A corpus-based exploration of constructions in written academic English as a lingua franca Selahattin Yilmaz and Ute Römer

59

The influence of sources on First-Year Composition L1 student writing: A multi-dimensional analysis Stephen M. Doolan

89

Students’ use of lexical bundles: Exploring the discipline and writing experience interface Ndeye Bineta Mbodj and Scott A. Crossley

115

Part II.  Focus on disciplinary variation Combining rhetorical move analysis with multi-dimensional analysis: Research writing across disciplines Bethany Gray, Elena Cotos and Jordan Smith Lexical bundles across disciplines: A look at consistency and variability Randi Reppen and Shannon B. Olson Lexical bundles as reflections of disciplinary norms in Spanish and English literary criticism, history, and psychology research William Michael Lake and Viviana Cortes

137 169

183

vi

Advances in Corpus-based Research on Academic Writing

Adjectives as nominal pre-modifiers in chemistry and applied linguistics research articles Deise P. Dutra, Jessica M. S. Queiroz, Luciana D. de Macedo, Danilo D. Costa and Elisa Mattos

205

The use of lexical patterns in engineering: A corpus-based investigation of five sub-disciplines Tatiana Nekrasova-Beker and Anthony Becker

227

Stance in unpublished student writing: An exploratory study of modal verbs in MICUSP’s Physical Science papers Kimberly Becker and Hui-Hsien Feng

255

Part III.  Focus on register variation P-frames and rhetorical moves in applied linguistics conference abstracts Jungwan Yoon and J. Elliott Casal

281

Stand-alone literature reviews: A new multi-dimensional analysis Heidi R. Wright

307

A multi-dimensional view of collocations in academic writing Maria Carolina Zuppardi and Tony Berber Sardinha

333

Name index

355

Subject index

357

Introduction Advances in corpus-based research on academic writing Ute Römer, Viviana Cortes and Eric Friginal Georgia State University

As its title indicates, the present volume showcases recent advances in corpus-based research on academic writing. This topic emerged as a central theme at the 14th conference of the American Association for Corpus Linguistics (AACL), held at Georgia State University in Atlanta, GA in September 2018. The editors of the volume (who also served as organizers of AACL 2018) invited authors who had presented work on this topic at the conference to submit proposals for inclusion in the present volume. Written by a mix of established and up-and-coming researchers, the selected contributions show applications of innovative corpus-based methodologies, replicate earlier studies to validate methodological procedures, investigate academic registers rarely analyzed using corpus-based methods before, and compare the linguistic production of different groups of academic writers against each other. What we consider particularly novel aspects of the collection are the inclusion of research that combines rhetorical moves with multi-dimensional analysis, work that incorporates textual position analyses of frequent items across disciplines and languages, studies that cover both fixed and variable phraseological items (lexical bundles, phrase-frames, constructions), and research that is based on corpora of English as an academic lingua franca. Gray (2015) states that the written language of academia has distinct characteristics that make it different from other types of language, and a vast number of studies from varied areas in the field of applied linguistics that used very different research methodologies have proven this statement right (see for example, Biber 1988; Crompton 1997; Flowerdew 2002; Grabe & Kaplan 1997; Halliday & Martin 1993; Hyland 2000, among many others). Over the past two decades, corpus-based methodologies have been favored in the analysis of a wide variety of written registers frequently produced in academia because these methodologies provide reliable and generalizable findings that yield applications that help complete the description of these registers and, in many cases, can have direct pedagogical applications. https://doi.org/10.1075/scl.95.int © 2020 John Benjamins Publishing Company

2

Ute Römer, Viviana Cortes and Eric Friginal

Recent studies like the ones reported in Biber & Gray (2016), Cao & Hu (2014), Gray (2015), Kopaczyk & Tyrkkö (2018), and Parkinson (2013), to mention only a few examples, used corpus-based research methodologies to identify and analyze linguistic phenomena in written academic registers. These studies range from focusing on metadiscourse markers and disciplinary and research paradigm linguistic variation to analyses of grammatical features of complexity and elaboration. The large numbers of corpus-based studies that have been published in books, edited volumes, and peer-reviewed journals have helped provide a thorough description of the lexico-grammatical features of academic registers (see for example, Gray 2015: 15–18, for a summary of studies of language use in academic research articles). There is, however, still a lot of unexplored territory in this area – a gap that the contributions to this volume attempt to address. The studies included in this volume are based on a wide range of corpora spanning first (L1) and second language (L2) academic writing at different levels of writing expertise, containing texts from a variety of academic disciplines (and sub-disciplines), and of different academic registers. These three areas of focus (writer expertise and L1 status, disciplinary variation, register variation) provide the organization of the volume and the division of its 14 chapters into three parts. Part I of the volume focuses on which effects writer expertise and nativeness status of the writer can have on academic writing. It starts with two chapters that focus on second language (L2) learner writing in academic settings, the first one entitled “A corpus-based study of academic word use in EFL student writing” and written by Eniko Csomay. Based on a corpus of writing assignments collected in a dual-degree program in STEM (science, technology, engineering and mathematics) in Tbilisi, Georgia, Csomay traces progress in students’ writing by quantifying their use of academic vocabulary. The second chapter by Yunjung Nam on “Give constructions in Korean EFL learner writing” draws on data from the Yonsei English Learner Corpus (YELC; Rhee & Jung 2014) and the Michigan Corpus of Upper-level Student Papers (MICUSP; Römer & O’Donnell 2011) as it discusses dominant patterns in the use of constructions of the high-frequency verb give by L1 Korean novice academic writers compared to L1 English novice writers. Nam’s study highlights aspects of English lexicogrammar that L1 Korean novice academic writers struggle with and has important implications for academic writing pedagogy in a Korean EFL setting. Also focusing on lexicogrammatical patterns is Selahattin Yilmaz and Ute Römer’s chapter on “A corpus-based exploration of constructions in written academic English as a lingua franca.” Using a corpus of English as a lingua franca (ELF) writing produced in academic settings and a subset of academic writing from the Corpus of Contemporary American English (COCA; Davies 2008–) as a reference dataset, the authors demonstrate how a combined key function word and phraseological analysis can lead to the identification of constructions that are

Introduction 3

characteristic of written academic ELF. Next in Part I of the volume is a chapter by Stephen Doolan on “The influence of sources on first-year composition L1 student writing: A multidimensional analysis”. With the help of a multi-dimensional analysis and a corpus of student essays, Doolan highlights core aspects of source-based language use by English as a first language novice academic writers. His study allows us to better understand the complex task of referencing and has practical implications for first-year composition teachers and students. The final chapter of Part I of the volume entitled “Students’ use of lexical bundles: Exploring the discipline and writing experience interface” is written by Ndeye Bineta Mbodj and Scott Crossley. The authors use a variety of corpora of novice and expert writing in medical and non-medical fields to investigate in what ways and to what extent academic writing expertise and field-specific knowledge contribute to the production of medical lexical bundles. With its concentration on medical English, the Mbodj and Crossley chapter serves as a bridge between the first part and the second part of the collection which focuses on disciplinary variation. The chapters in Part II of the volume highlight how academic writing may differ based on the discipline or sub-discipline texts come from. First up, Bethany Gray, Elena Cotos and Jordan Smith’s contribution entitled “Combining rhetorical move analysis with multi-dimensional analysis: Research writing across disciplines” is methodologically particularly interesting. It shows how two analytic techniques that are commonly used in academic writing research can be integrated to provide valuable new insights into the mapping of communicative functions and linguistic patterns in research articles from a wide range of disciplines. Informed by the results of their study, the authors also argue for further required adaptations of multi-dimensional analysis for use with move-annotated data. Next, Randi Reppen and Shannon Olson in their chapter entitled “Lexical bundles across disciplines: A look at consistency and variability”, discuss both discipline-specific and cross-disciplinary fixed bundles, as well as bundles with variable slots (also known as phrase-frames), in a large corpus of course readings from nine disciplines (Architecture, Business, Culinary Science, Digital Arts, Fashion Design, Film, Hospitality Industry, Interior Design, and Studio Arts). The authors highlight the value of especially those bundles they identified as “cross-disciplinary” in pedagogical contexts and share practical activities that may promote awareness of lexical bundles for students and teachers. Also focusing on lexical bundles and sharing insights of pedagogical relevance, the chapter by William Lake and Viviana Cortes on “Lexical bundles as reflections of disciplinary norms in Spanish and English literary criticism, history, and psychology research” illuminates aspects of humanities and social sciences writing in two world languages. Access to parallel corpora of research articles in English and Spanish allowed the authors to not only analyze disciplinary but also cross-linguistic variation in lexical bundles which

4

Ute Römer, Viviana Cortes and Eric Friginal

appear in sentence subject position and contain epistemic or phenomenal nouns. The next chapter by Deise Dutra, Jessica Queiroz, Luciana de Macedo, Danilo Costa and Elisa Mattos on “Adjectives as nominal pre-modifiers in chemistry and applied linguistics research articles” also looks at patterns within noun phrases but focuses on how attributive adjectives are used to modify nouns. The authors use two matched corpora of research articles from chemistry and applied linguistics to highlight disciplinary preferences in the frequency of adjective pre-modifiers, the length of noun phrases, and the specific functions they express. The chapter by Tatiana Nekrasova-Beker and Anthony Becker on “The use of lexical patterns in engineering: A corpus-based investigation of five sub-disciplines” shifts the focus from disciplinary to sub-disciplinary variation in the expanding field of engineering. Based on data from five corpora of undergraduate engineering textbooks and focusing on variable multi-word sequences (phrase-frames, such as the * part of the), the authors show overlap in sub-disciplinary writing in terms of frequencies and structural characteristics of common phrase-frames while at the same time highlighting interesting differences in the discourse functions that these sequences express. Three engineering sub-disciplines as well as physics are also in the spotlight in Kimberly Becker and Hui-Hsien Feng’s chapter entitled “Stance in unpublished student writing: An exploratory study of modal verbs in MICUSP’s physical science papers”. Their study offers relevant new insights into variation in the use of modal verbs as stance features and has important pedagogical implications for disciplinary writing contexts. The contributions to Part III of the volume discuss the ways in which register or text type affect academic writing. First, Jungwan Yoon and Elliott Casal take a phraseological approach to describing how academic writers realize their rhetorical goals in conference abstracts. In their chapter entitled “P-frames and rhetorical moves in applied linguistics conference abstracts”, the authors combine quantitative and qualitative analysis to link phrase-frames to rhetorical moves. Their results indicate that phrase-frames are not only highly prevalent in this academic register but also that they are associated with particular rhetorical functions. Next, Heidi Wright in her chapter on “Stand-alone literature reviews: A new multi-dimensional analysis” draws on a 3.4-million word corpus of literature reviews from three different disciplines published in academic journals in the years 1950, 1980, and 2010 to uncover central linguistic aspects of this text type. A multi-dimensional analysis of 72 linguistic features in her corpus allowed Wright to show that stand-alone literature reviews have significantly changed over time and also differ considerably across disciplines. Last but certainly not least, we have Maria Carolina Zuppardi and Tony Berber Sardinha’s chapter on “A multi-dimensional view of collocations in academic writing” which focuses on the registers of academic textbooks and research articles. The authors identify interrelated sets of collocations (what they refer

Introduction 5

to as “dimensions of collocation”) in a corpus of over 10-million words. They hence provide a systematic description of collocational co-occurrence patterns across these two registers which could be beneficially used in EAP pedagogy. Taken together, the chapters included in the present volume all contribute to helping us expand our knowledge of how academic writing functions and what it looks like in various contexts. The authors of these chapters skillfully demonstrate how corpora and corpus methods can be employed to uncover central aspects of texts of different academic registers, produced by different groups of academic writers, and in different academic disciplines and sub-disciplines. Going beyond merely summarizing their findings, they also discuss what their research means for academic writing practice and pedagogical settings. We invite the reader to delve in and explore new aspects of academic writing, and hope that this volume serves to inspire future research that further illuminates linguistic aspects of texts produced in academic settings.

Acknowledgements We would like to thank all contributors to this volume for sharing their work in this collection and for complying with our somewhat ambitious timeline. We are also very much indebted to the following colleagues, who served as peer reviewers for this volume (listed in alphabetical order), for their time and valuable constructive feedback which allowed the authors to further strengthen their contributions: Gena Bennet, Tony Berber Sardinha, Cindy Berger, Marina Bondi, John Bunting, Ulla Connor, Averil Coxhead, Eniko Csomay, Philip Durrant, Jesse Egbert, James Garner, Bethany Gray, Jack A. Hardy, Mark Johnson, Amanda Lanier, Tove Larsson, Dilin Liu, Katherine Moran, Akira Murakami, Tatiana Nekrasova-Beker, Anne O’Keeffe, Carmen Perez-Llantada, Brittany Polat, Audrey Roberson, Yu Kyoung Shin, Nicole Tracy-Ventura, Marcia Veirano Pinto, Heidi Vellenga, and Hyung-Jo Yoon. We are also grateful to Siyun Tan for her help with copyediting, the faculty and students in our department (the Department of Applied Linguistics and ESL at Georgia State University) for their constant support in our academic efforts (including the hosting of large conferences), and to Kees Vaes and the production team at John Benjamins for their support throughout the entire publication process.

References Biber, Douglas. 1988. Variation across Speech and Writing. Cambridge: CUP. https://doi.org/10.1017/CBO9780511621024 Biber, Douglas & Gray, Bethany. 2016. Grammatical Complexity in Academic English. Linguistic Change in Writing. Cambridge: CUP.  https://doi.org/10.1017/CBO9780511920776 Cao, Feng & Hu, Guangwei. 2014. Interactive metadiscourse in research articles: A comparative study of paradigmatic and disciplinary influences. Journal of Pragmatics 66: 15–31. https://doi.org/10.1016/j.pragma.2014.02.007

6

Ute Römer, Viviana Cortes and Eric Friginal

Crompton, Peter. 1997. Hedging in academic writing: Some theoretical problems. English for Specific Purposes 16: 271–287.  https://doi.org/10.1016/S0889-4906(97)00007-0 Davies, Mark. 2008–. The Corpus of Contemporary American English: 450 million words, 1990present. (30 September 2019). Flowerdew, John. 2002. Academic Discourse. New York NY: Longman. Grabe, William & Kaplan, Robert. 1997. On the writing of science and the science of writing: Hedging in science text and elsewhere. In Hedging and Discourse. Approaches to the Analysis of a Pragmatic Phenomenon in Academic Texts, Raija Markkanen & Hartmut Schröder (eds), 151–167. Berlin: Walter de Gruyter.  https://doi.org/10.1515/9783110807332.151 Gray, Bethany. 2015. Linguistic Variation in Research Articles. When Discipline Tells Only Part of the Story [Studies in Corpus Linguistics 71]. Amsterdam: John Benjamins. https://doi.org/10.1075/scl.71 Kopaczyk, Joanna & Tyrkkö, Jukka (eds). 2018. Applications of Pattern Driven Methods in Corpus Linguistics [Studies in Corpus Linguistics 82]. Amsterdam: John Benjamins. https://doi.org/10.1075/scl.82 Halliday, Michael A. K. & Martin, James R. 1993. Writing Science. Literacy and Discursive Power. Pittsburgh PA: University of Pittsburgh Press. Hyland, Ken. 2000. Disciplinary Discourses. London: Pearson Education. Parkinson, Jean. 2013. Representing own and other voices in social science research articles. International Journal of Corpus Linguistics 18: 199–228.  https://doi.org/10.1075/ijcl.18.2.02par Rhee, Seok-Chae & Jung, Chae Kwan. 2014. Compilation of the Yonsei English Learner Corpus (YELC) 2011 and its use for understanding current usage of English by Korean pre-university Students. The Journal of the Korea Contents Association 14(11): 1019–1029. https://doi.org/10.5392/JKCA.2014.14.11.1019 Römer, Ute & O’Donnell, Matthew B. 2011. From student hard drive to web corpus (part 1): The design, compilation and genre classification of the Michigan Corpus of Upper-level Student. Papers (MICUSP). Corpora 6(2): 159–177.  https://doi.org/10.3366/cor.2011.0011

Part I

Focus on writer expertise and nativeness status

A corpus-based study of academic word use in EFL student writing Eniko Csomay

San Diego State University

Many corpus-based studies focus on the use of academic vocabulary in journal articles and textbooks while creating vocabulary lists and argue for the teaching of such lists, but few examine how students employ those words in their writing. The present study investigates English as a Foreign Language (EFL) learners’ use of academic vocabulary in their writing assignments as they attend a dual-degree program in STEM fields in Tbilisi, Georgia. Over a thousand student papers from their General Education courses are examined for their use of academic vocabulary, using Gardner and Davies’ (2014) Academic Vocabulary List. Results show variation in students’ academic vocabulary use and quantity while progressing in their studies and as they produce the different text-types in their coursework.

1. Introduction Researchers have successfully described the linguistic characteristics of academic language use from a variety of perspectives (e.g., Biber et al. 2002; Biber 2006; Csomay 2005), including the use of academic vocabulary. While the majority of the studies focus on academic vocabulary examining written texts that students are exposed to or discuss how to teach such vocabulary, relatively few studies explore how students actually use these kinds of words in their scholarly papers. Most recently, however, scholars have described the lexical characteristics of non-native student writing as well to explore developmental patterns, for example, in the use of lexical bundles (Yan & Staples 2017) or in the use of academic words (e.g., Durrant 2016; Csomay & Prades 2018). These studies typically focus on student writing either by native speakers or by English as a Second Language (ESL) learners.1 1. The difference between English as a Second and English as a Foreign Language (ESL and EFL, respectively) learners is the context. ESL learners are surrounded by the target language in their everyday lives, hence receiving input in the context as well as during their studies, while https://doi.org/10.1075/scl.95.01cso © 2020 John Benjamins Publishing Company

10

Eniko Csomay

The present study investigates a specific group of English as a Foreign Language (EFL) learners’ use of academic vocabulary in their writing assignments. These students, attending an American STEM degree program offered in Tbilisi, Georgia (Eurasia), produced a variety of naturally occurring classroom-assignment papers in English in their General Education courses during a period of one academic year. Over a thousand student papers originating from lower- and upper-level composition classes, as well as History, Philosophy, and Political Science were compiled into a corpus for linguistic examination. Using Gardner and Davies’ (2014) Academic Vocabulary List (AVL) generated from the Corpus of Contemporary American English (COCA), the percentage of academic vocabulary was counted in the final draft of each student paper. The students were neither exposed to this academic vocabulary list during their studies nor were they explicitly taught how to use these specific academic words in their writing assignments. 2. Background 2.1

Linguistic variation in university registers

Previous studies successfully describe variation in academic language use from various perspectives as they portray individual lexico-grammatical features (Biber 2006) or research linguistic patterns based on co-occurring grammatical features in discourse (e.g., Biber et al. 2002; Csomay 2006). In addition to comprehensive grammatical descriptions, researchers have also examined lexical aspects of university registers. For example, Fortanet (2004) looked at references and functions of the word ‘we’ in academic lectures, Simpson-Vlach and Ellis (2010) created an academic formulas list based on academic lectures, Hsu (2014) examined the categories of formulaic sequences in English-medium college textbooks, and Biber and Egbert (2019) explored the benefits of incorporating text dispersion into keyword analysis. Going beyond the word as the unit of analysis, other researchers created an academic collocation list (Ackermann & Chen 2013), or looked at lexical bundles (frequently occurring word sequences) in university registers (Biber & Barbieri 2007), and more specifically in university teaching and textbooks (Biber et al. 2004). Another group of researchers also using corpus-based methodologies focused on student writing. For example, Omidian et al. (2017) looked at academic words in secondary school writing, and Coxhead (2011a, 2012) investigated how vocabulary EFL learners receive input in the target language only through their educational context and not through their everyday lives.



A corpus-based study of academic word use in EFL student writing

is used in second language writing. As for longer stretches of lexical units, researchers explored lexical bundle use in student writing (e.g., Cortes 2004) as well as the teachability of bundles to various university level student groups (Cortes 2006; Eriksson 2012). While these studies are able to provide comprehensive linguistic analyses or provide detailed functional analyses of single words, collocations, or extended sequences in one or multiple registers in the academia, little research to date has been done on the lexical description of how students use academic vocabulary in their writing in an EFL context. 2.2

Academic vocabulary

A widely accepted view among both researchers and teachers is the fact that vocabulary is more than simply the sum of individual words and that vocabulary should be taught in context (e.g., Coxhead 2000, 2002, 2011b; Coxhead & Nation 2001; Nation 2013). Choosing which words to teach or learn has been discussed by many and for over a decade (e.g., Schmitt 2008; Coxhead & Byrd 2007; Myers & Chang 2009; Newton 2013; Lu 2013). In a university setting, many researchers find it important to focus on academic words that are identified based on general academic and discipline-specific texts that the students would likely encounter (Celce Murcia 2002; Schleppegrell et al. 2004). That is, the general principle behind choosing words is that “teaching materials should reflect actual use (Biber & Reppen 2002; Csomay & Petrovic 2012) and practice activities should simulate real life scenarios (Batstone & Ellis 2009)” (Csomay & Prades 2018: 100). Equally important is how students can build a large vocabulary most efficiently. Zimmerman (2014: 290) points out that two types of learning are involved when we learn vocabulary: (1) explicit learning, where the students intentionally study the words for their meaning and other aspects such as, morphology or phonology; and (2) incidental learning, where the students unintentionally notice the words and pick up their use from the context. The best way to stimulate incidental vocabulary learning is through extensive reading, participating in dialogues, and listening to the media or lectures when in an academic context. According to Nation (2013), research has shown that extensive reading results in the most gain in vocabulary knowledge. Nation (2013) and Schmitt (2008) argue further that “some [word] features are best acquired incidentally, while others benefit from explicit treatment” (Zimmerman 2014: 291). While the focus of previous, corpus-driven academic vocabulary studies in this area have been to create vocabulary lists in the interest of anchoring teaching

11

12

Eniko Csomay

practices, relatively few studies looked at student writing. Most recently, however, scholars have examined how students use vocabulary in their exam papers, for example Yan & Staples (2017) for lexical bundles, or in their class assignments as in Csomay & Prades (2018) and in Durrant (2016) for academic vocabulary use. 2.2.1 The AVL The first popular word list with academic vocabulary containing 570 word families that has been used in many teaching materials to date (e.g., Academic Word Power series) was called the Academic Word List (AWL), and was created by Coxhead (2000, 2011b). More recently, Gardner and Davies (2014) built a new list and called it the Academic Vocabulary List (AVL). This word list has also been used in a few, more recent research studies as well (e.g., Csomay & Prades 2018; Goulart 2018). The AVL was created based on a 120-million-word academic sub-corpus isolated from the Corpus of Contemporary American English (COCA). Using a set of criteria that included sophisticated statistical measures to identify academic words to be placed on the AVL, Gardner and Davies (2014: 313–316) identified the top 3,000 lemmas that occur in all academic domains in COCA’s academic sub-corpus. According to the creators of the list, the AVL provides twice as much coverage of academic texts in English as the AWL (AVL 14% vs. AWL 7.2%), and reflects a more accurate and contemporary picture of academic word use “due to the larger, more recent corpus that it pulls from (120 million versus just 3.5 million for the original AWL) and its more narrow academic focus” (Csomay & Prades 2018: 102).2 Instead of word families, as in AWL, the AVL contains lemmas. The authors argue that word families, although may have the same stem, might represent different meanings as they cross part of speech categories. Lemmas, on the other hand, remain within the same part of speech category as only the inflectional morphemes are stripped off the base, hence, the meaning does not change dramatically. Gardner and Davies (2014) claim that adding part of speech category information with the words on the list provide teachers and researchers with valuable information. Table 1 shows the list of the first 70 most frequently occurring words identified on AVL (Csomay & Prades 2018: 103).

2. As a point of reference, Gardner and Davies (2016: 64) report on creating three word lists extracted from COCA: a “general words of English”, a “core academic words (crossing over multiple disciplines)”, and a “discipline-specific or technical academic words” list. The core academic word list equals the Academic Vocabulary List (AVL) used in this study.

A corpus-based study of academic word use in EFL student writing 13



Table 1.  Academic Vocabulary List (Gardner and Davies 2014: 317) (selection) 1. study.n 2. group.n 3. system.n 4. social.j 5. provide.v 6. however.r 7. research.n 8. level.n 9. result.n 10. include.v 11. important.j 12. process.n 13. use.n 14. development.n

15. data.n 16. information.n 17. effect.n 18. change.n 19. table.n 20. policy.n 21. university.n 22. model.n 23. experience.n 24. activity.n 25. human.j 26. history.n 27. develop.v 28. suggest.v

29. economic.j 30. low.j 31. relationship.n 32. both.r 33. value.n 34. require.v 35. role.n 36. difference.n 37. analysis.n 38. practice.n 39. society.n 40. thus.r 41. control.n 42. form.n

43. report.v 44. rate.n 45. significant.j 46. figure.n 47. factor.n 48. interest.n 49. culture.n 50. need.n 51. base.v 52. population.n 53. international.j 54. technology.n 55. individual.n 56. type.n

57. describe.v 58. indicate.v 59. image.n 60. subject.n 61. science.n 62. material.n 63. produce.v 64. condition.n 65. identify.v 66. knowledge.n 67. support.n 68. performance.n 69. project.n 70. response.n

n = noun; v = verb; j = adjective; r = adverb.

The goal of this study is to investigate the rate at which academic vocabulary is used in writing assignments by a specific group of non-native EFL learners at the university level, and without explicit instruction of these words. 2.3

The English as a Foreign Language (EFL) context

San Diego State University (SDSU) is one of twenty-three campuses within the California State University system. In 2014, SDSU received a substantial grant to build the infrastructure in (country) Georgia (Eurasia)3 necessary to successfully offer a dual degree program in five STEM disciplines (biochemistry, chemistry, computer science, computer engineering, and civil engineering) in collaboration with three local universities, namely, Tbilisi State University (TSU), Ilia State University (ISU), and Georgia Technological University (GTU). The goals of the program were to prepare the three participating universities to meet international accreditation requirements, and to build capacity for the local industry. As SDSU’s curriculum is adopted, Georgian faculty are trained to apply new pedagogies both in the disciplinary and the General Education (GE) programs and are encouraged

3. Georgia is a country in the Caucasus region of Eurasia with its capital, and largest city, Tbilisi. Georgia is surrounded by the Black Sea to the west, Russia to the north, Turkey and Armenia to the south, and its southeast neighbor is Azerbaijan. On a relatively small territory (69,700 square kilometers or 26,911 square miles), Georgia has mountains, plains, and the sea, and a population of about 3.7 million in 2017. The population is primarily Caucasian.

14

Eniko Csomay

to collaborate with SDSU faculty on research in the STEM disciplines. All instruction is conducted in English. Participants in this dual-degree program are Georgian high school graduates whose primary language is Georgian4 and who learnt English as a Foreign Language (EFL) in the Georgian primary and secondary school environment. To be accepted to this particular dual-degree program, students need to gain a high score on the Georgian National Exam (called NAEC) in both Math and English, and pass a (paper-based) TOEFL test with SDSU-determined threshold of 525 points. Instruction started in the academic year of 2015/16, and in Fall 2018, the fourth cohort started their studies. Each of the first four cohorts accepted roughly 80 to 200 students. Once accepted, every student takes a placement test in August to qualify for membership in one of the two lower division writing classes (Linguistics 94 or 100) in the first year. Given this background, we seek answers to the following research questions: 1. How do Georgian students use academic vocabulary in their writing? 2. How do the different General Education courses and their corresponding writing assignments offered at the dual-degree program affect the use of academic vocabulary in student writing? 3. How does the level of instruction affect the use of academic vocabulary in student writing? 4. How does the assignment type affect the use of academic vocabulary in student writing?

4. The Georgian language belongs to the Kartvelian language family which is unique to the Caucasus region. The native language of Georgia as well as the language of education from kindergarten to higher education is Georgian, although due to the former Soviet occupation, the majority speak Russian as well. In terms of foreign language education, like with other post-Soviet countries, Russian is no longer compulsory in the school system. Instead, students can choose to study English, French, or languages other than Russian as a second language in private and in public schools.

A corpus-based study of academic word use in EFL student writing 15



3. Methodology 3.1

The corpus

The corpus of student writing for this study was compiled from regular class assignments in a selection of non-science related lower and upper division General Education courses taught in Tbilisi in the 2017/18 academic year. While all papers were graded throughout the semester by the respective instructors, that data is not available for the present study. However, unlike Csomay and Prades (2018), we are not examining the relationship between the grade received and the degree to which students applied academic words in their work. Table 2 shows the composition of the corpus as it summarizes the courses taken by students, the number of papers submitted, and the number of words as well as the average length for each paper. Table 2.  Total number of papers, total number of words, and average word-length for each class Courses Linguistics 94 (Critical Thinking) Linguistics 100 (Critical Thinking) Linguistics 200 (Critical Thinking) History 100 (Humanities) Philosophy 335 (Humanities) Political Science 101 (US Institutions) Linguistics 305 (Upper division writing) Total

Number of papers

Number of words

Average length (words)

  179   552   438   133    98   298     9 1,707

  151,976   459,795   749,970   218,152   111,842   225,983    17,382 1,935,100

  849.03   908.78 1,713.81 1,640.24 1,141.24   758.33 1,931.33 1,122.36

As Table 2 shows, Linguistics 94, 100, and 200 are in the Communication and Critical Thinking GE area; History 100, Philosophy 335 are in the Humanities GE area; Political Science 101 is taken for the American Institutions requirement specific to California; and finally, Linguistics 305 is an upper division writing course. While several drafts were produced prior to the final paper, only the final version was included in the present analysis, totaling about 1,700 papers. On average, the papers were about 1,100 words long leading to about 1.9 million words in the entire corpus. The types of writing assignment for each class, some of which are further described in Appendix A, and corresponding total word numbers are listed in Table 3.

16

Eniko Csomay

Table 3. Total number of words for each assignment type in each class

argument synthesis compare contrast explanatory synthesis comparative analysis response paper IMRD research paper text-based research paper discipline specific research paper argument take-home essay 1 argument take-home essay 2 source-based research paper thesis-based argument essay Total

Ling 94

Ling 100

67,280 65,468 19,228

150,565 33,229 141,179 134,822

Ling 200

History 100

Poli Sci 101

Phil 335

Ling 305

134,361

321,111 294,498 17,382 122,718 103,265 218,152 151,976

459,795

749,970

218,152

225,983

111,842 111,842

17,382

Total 217,845 65,468 186,818 141,179 134,822 321,111 294,498 17,382 122,718 103,265 218,152 111,842 1,935,100



A corpus-based study of academic word use in EFL student writing 17

Noticeably, as Table 3 shows, two assignment types, namely, argument synthesis and explanatory synthesis, appear in multiple linguistics classes while all other assignments are of different types and have varying pedagogical goals. 3.2

Analytical procedures

Several steps were taken to arrive at the required results involving a variety of computer programs. Mathematical formulas calculated the percent scores, and statistical tests, by way of SPSS 24.0, provided the results. 3.2.1 Calculations Multiple programs were used to arrive at the percent scores. First, each text was automatically tagged with Biber’s grammatical tagger. Second, a computer program was written to process each tagged file in juxtaposition with the AVL and to do the calculations. For this step, the program would search each lemmatized word from the Academic Vocabulary List in each student paper. Then, the percentage of academic vocabulary used in each paper was calculated by dividing the total number of academic vocabulary (from the AVL list) by the total number of words used in the paper and multiplying that by 100. For example, if there were 45 academic words used in a paper that was 1,100 words long, the value would be 4%, or 4.1% to be precise (45/1100 = 0.0409; 0.0409*100 = 4.09). We rounded up all percent values to the nearest score. By calculating the percent value, we could compare the papers as, practically, we were norming the scores to an essay length of 100 words. We could have normed the scores to 1,100 words since that is the average essay length but after careful consideration we decided to work with percent values as that is easier to conceptualize when discussing the results.5 3.2.2 Statistical tests In general, One-way ANOVA tests are applied when three or more groups are compared to tell us whether the variation in the scores is due to variation within the groups or across the groups entering the equation. In our study, seven courses (groups), as listed in Table 4, were studied for this variation. To see differences between two groups, an Independent T-test was applied. While these tests have been very useful to detect significant differences between and across groups as continuous data is turned into dichotomous results (i.e., significant or not), as Plonsky (2015: 36) argues, effect sizes “provide an estimate of the actual strength of the relationship or of the magnitude of the effect in question … and unlike p values, effect sizes are continuous, standardized, and scale free.” 5. In addition, even if normed to 1,100, the percent value would be the same.

18

Eniko Csomay

Hence, Cohen’s d measures were calculated as well, which is “a descriptive statistic that expresses the mean difference between (or within) groups” (Plonsky 2015: 31). Unlike a T-test, a Cohen’s d measure is independent of sample size and does not take into account how frequent a feature is in the first place.6 The following formula has been used to calculate Cohen’s d: Cohen’s d =



x1 − x2 sd12 + sd22 2

In terms of magnitude, we adopt Plonsky’s (2015: 38) claim that a small effect size is below or at ±0.40, a medium effect size is between ±0.41 and ±0.70, and a large effect size is between ±0.71 to ±1.00 and beyond. The direction of the Cohen’s d (positive or negative) depends on whether the first value entering the equation is larger than the second one. 4. Findings 4.1

Courses

The findings reported in this section provide answers to research questions 1 and 2. One-way ANOVA results (F(6, 1700) = 52.485, p