Beyond Age Effects in Instructional L2 Learning: Revisiting the Age Factor 9781783097630

Combining advanced quantitative methods in classroom research with individual-level qualitative data, this study demonst

173 13 5MB

English Pages [282] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Beyond Age Effects in Instructional L2 Learning: Revisiting the Age Factor
 9781783097630

Table of contents :
Contents
Acknowledgements
1. Mapping the Terrain
2. The Current Empirical Study
3. Age and (Statistical) Analysis
4. Age and Rate of Acquisition
5. Age and Affect
6. Age and Crosslinguistic Influence
7.Age and the Impact of Differential Input
8. Educational Implications
9. Conclusion and Future Perspectives
Appendices
References
Index

Citation preview

Beyond Age Effects in Instructional L2 Learning

SECOND LANGUAGE ACQUISITION Series Editors: Professor David Singleton, University of Pannonia, Hungary and Fellow Emeritus, Trinity College, Dublin, Ireland and Dr Simone E. Pfenninger, University of Salzburg, Austria This series brings together titles dealing with a variety of aspects of language acquisition and processing in situations where a language or languages other than the native language is involved. Second language is thus interpreted in its broadest possible sense. The volumes included in the series all offer in their different ways, on the one hand, exposition and discussion of empirical findings and, on the other, some degree of theoretical reflection. In this latter connection, no particular theoretical stance is privileged in the series; nor is any relevant perspective – sociolinguistic, psycholinguistic, neurolinguistic, etc. – deemed out of place. The intended readership of the series includes final-year undergraduates working on second language acquisition projects, postgraduate students involved in second language acquisition research, and researchers, teachers and policy-makers in general whose interests include a second language acquisition component. Full details of all the books in this series and of all our other publications can be found on http://www.multilingual-matters.com, or by writing to Multilingual Matters, St Nicholas House, 31–34 High Street, Bristol BS1 2AW, UK.

SECOND LANGUAGE ACQUISITION: 113

Beyond Age Effects in Instructional L2 Learning Revisiting the Age Factor

Simone E. Pfenninger and David Singleton

MULTILINGUAL MATTERS Bristol • Blue Ridge Summit

For Judith and Werner Pfenninger

DOI 10.21832/PFENNI7623 Library of Congress Cataloging in Publication Data A catalog record for this book is available from the Library of Congress. Names: Pfenninger, Simone E., author. | Singleton, D.M. (David Michael) author. Title: Beyond Age Effects in Instructional L2 Learning: Revisiting the Age Factor/Simone E. Pfenninger and David Singleton. Description: Bristol; Blue Ridge Summit: Multilingual Matters, [2017] | Series: Second Language Acquisition: 113 | Includes bibliographical references and index. Identifiers: LCCN 2016053116 | ISBN 9781783097623 (hbk: alk. paper) | ISBN 9781783097616 (pbk: alk. paper) | ISBN 9781783097654 (kindle) Subjects: LCSH: Language acquisition–Age factors. | Classroom environment. | Second language acquisition. Classification: LCC P118.65 .P44 2017 | DDC 418.0071–dc23 LC record available at https://lccn.loc.gov/2016053116 British Library Cataloguing in Publication Data A catalogue entry for this book is available from the British Library. ISBN-13: 978-1-78309-762-3 (hbk) ISBN-13: 978-1-78309-761-6 (pbk) Multilingual Matters UK: St Nicholas House, 31-34 High Street, Bristol BS1 2AW, UK. USA: NBN, Blue Ridge Summit, PA, USA. Website: www.multilingual-matters.com Twitter: Multi_Ling_Mat Facebook: https://www.facebook.com/multilingualmatters Blog: www.channelviewpublications.wordpress.com Copyright © 2017 Simone E. Pfenninger and David Singleton. All rights reserved. No part of this work may be reproduced in any form or by any means without permission in writing from the publisher. Front cover image: This photograph by Kilian Schönberger depicts the ‘Crooked Forest’, which is located outside of Nowe Czarnowo, West Pomerania, Poland, and is regarded as one of the most unusual forests in the world. The nearly 400 trees are widely agreed to have been shaped by human hands sometime in the 1930s, but for what purposes is still up for debate. Each tree is bent near the base at 90 degrees, but eventually straightened out and returned to vertical just like the other – non-affected – trees. The policy of Multilingual Matters/Channel View Publications is to use papers that are natural, renewable and recyclable products, made from wood grown in sustainable forests. In the manufacturing process of our books, and to further support our policy, preference is given to printers that have FSC and PEFC Chain of Custody certification. The FSC and/or PEFC logos will appear on those books where full certification has been granted to the printer concerned. Typeset by Deanta Global Publishing Services Limited. Printed and bound in the UK by Short Run Press Ltd. Printed and bound in the US by Edwards Brothers Malloy, Inc.

Contents

Acknowledgements 1

2

ix

Mapping the Terrain Introduction and Research Goals Growing Out of the ‘Earlier=Better’ View: Changing Perspectives on the Role of the Age Factor as a Predictor of Success in Foreign Language Learning Multilingual Switzerland as an Empirical Framework Pedagogical and Curricular Landscapes in Switzerland Conclusion

1 1

7 12 16 24

The Current Empirical Study Participants and Research Design: Four Independent Samples Tasks and Procedure Methodology: Preparing, Coding and Tagging the Data

25 30 37

3

Age and (Statistical) Analysis Benefits of Multilevel Modelling for Age-Related Research Solving the Generalisability Issue Solving the Randomisation Issue Solving the Cohort Issue Solving the Time Issue Fitting Our Models Limitations of Multilevel Modelling

40 40 41 47 48 50 52 54

4

Age and Rate of Acquisition Introduction Hitting the Ground Running: The Rate Advantage of Late Starters

56 56

v

25

58

vi

Contents

Evidence From Bound Morphology at Time 1 Perceived Growth Rate From the Point of View of the Learners Interaction Between Age and School/Class Variation and Contextual Effects The Question of Educational Continuity from Primary to Secondary Impact of Starting Age on Retention and Recall of Target Language Input Conclusion 5

6

Age and Affect Introduction Motivational Profiles of Different Age Groups and AO Groups Age-Within-Context in a Mixed Methods Design Motivation as a Strong Explanatory Force for Both FL Growth and Achievement Motivation and Age Across Learners, Classes and Schools The Age Factor From the Perspective of the Learner Multiple Foreign Language Learning: Burden or Opportunity? World Language English vs National Language French: Battle of the Languages Conclusion Age and Crosslinguistic Influence Introduction The Literacy Factor in the Optimal Age Debate Reported Use of L1 Transfer Strategies Age and Age-Related Predictors of Crosslinguistic Influence Influence of (Swiss)German and French on English and the Relevant Interaction The Issue with Intentionality Impact of Starting Age on Metalinguistic Awareness and Crosslinguistic Associations Influence of the Learning Environment Conclusion

65 70 71 81 83 92 94 94 95 97 107 116 119 129 130 132 137 137 140 147 150 153 170 171 173 177

Contents

7

8

9

Age and the Impact of Differential Input Introduction Learning Through an L2 vs Learning the L2 as a Subject: CLIL as ‘the Egg of Columbus’? A Critical Look at CLIL Research Motivation as the (Ugly) Stepsister of CLIL CLIL Trumps Age In-Class vs Out-of-Class Exposure to the Target Language Conclusion

vii

180 180 182 185 187 189 201 207

Educational Implications The Complexity of the Swiss Educational System The Obsession with ‘the Earlier the Better’: Research versus Beliefs

209 209

Conclusion and Future Perspectives Key Findings Limitations and Proposals for Future Research

215 215 221

211

Appendices

225

References

237

Index

267

Acknowledgements

We would like to express our gratitude to the many people who provided us with insightful and generous feedback, talked things over, read, wrote, offered comments, allowed us to quote their remarks and assisted in the editing, proofreading and design of this book. We are first and foremost grateful to (in alphabetical order) Raphael Berthele, Kees de Bot, JeanMarc Dewaele, Johanna Gündel, Marianne Hundt, Scott Jarvis, Johanna Lendl, David Matley, Urs Maurer, Martin Meyer, Alene Moyer, Judith Pfenninger, Daniel Stotz and Jan Vanhove. To each and to all, we offer our great gratitude. Finally, we are greatly indebted to the students and teachers for their enthusiastic participation and support. The writing of this book was partly supported by a research grant of the University of Zurich, Grant FK-15-078, to Simone E. Pfenninger. This grant is hereby gratefully acknowledged. Finally, we have been lucky enough to work with an enthusiastic and professional team at Multilingual Matters who have helped us produce this book. Special thanks are due to Laura Longworth, who – as always – has done a splendid job with the processes leading to the book’s publication. Any remaining errors remain our own.

ix

1 Mapping the Terrain

Introduction and Research Goals Almost 20 years ago, Harley (1998: 27) lamented the fact that no explanation had been provided as to why in school settings ‘the additional time associated with an early headstart has not been found to provide more substantial long-term proficiency benefits’. Despite the abundance of critical period studies, maturational state studies and ultimate attainment studies that have been carried out in the meantime (see, e.g. García Lecumberri & Gallardo, 2003; Larson-Hall, 2008; Moyer, 2004; Muñoz, 2003, 2006a, 2008, 2011; for reviews see Lambelet & Berthele, 2015; Muñoz & Singleton, 2011; Singleton, 2005; Singleton & Ryan, 2004), there are still unexplored issues regarding the amount and type of input needed for earlier starters in a school context to surpass later starters and to be able to retain their learning advantages in the long term. In the light of the fact that, since the 1990s, numerous educational authorities in Europe have brought forward the starting age of language instruction in elementary schools, it seems to be particularly important at this point to further analyse the magnitude of the effects of initial age of learning on the end state in a foreign language (FL) classroom. The introduction of early FLs has given rise to several educational, political and research questions, most of them related to the teaching approach in primary school and the apparent lack of learning success of early classroom learners (ECLs) compared to late classroom learners (LCLs), who, in the past, used to start learning the FL at the beginning of secondary school. In view of their rather unimpressive impact, early FL programmes are currently under scrutiny in Europe (and elsewhere in the world), and the question has arisen as to what extent these investigations relate to what crucially influences the absolute abilities of classroom learners at the end of mandatory school time, and, importantly, how we can exploit an earlier starting age more effectively – a topic that is, at present, still not understood very well. Overall, there has been a tremendous over-reliance on, and blind trust in, the age factor and the amount of time spent learning an FL, at the expense of the conditions of learning. In Ewa Da˛browska’s (personal communication, 11 June 2016) words: ‘We have become obsessed with age effects’. As Mihaljević Djigunović (2014) points out, an early starting age has become something of a given because education policymakers decide on the introduction of second languages (L2s) at an

1

2

Beyond Age Effects in Instructional L2 Learning

early age irrespective of what research findings suggest and, often, only because of strong parental pressure – a phenomenon that Enever (2004) calls ‘parentocracy’. Thus, the urgent problems that research on age effects can help solve concern the understanding by all parties involved (educators, policymakers, parents, politicians, education researchers) of the impact of FL instruction addressed to learners at different ages. What is more, as multilingualism no longer presents an exception but the rule in Europe (see, e.g. Cenoz & Jessner, 2009), the interest in early multiple FL acquisition and multilingual schooling has been heightened in recent years. In Europe, language use and language rights have long been central concepts of European Union (EU) citizenship, and language skills have become economic assets and a series of targeted agendas. A central aim of EU educational policy is the ‘1 plus 2’ model, which aims to provide EU citizens with ‘real opportunities to learn to communicate in two languages plus their mother tongue’ (European Commission, 2008: 566; see also European Commission, 2005, 2012). The study of the age factor in multilingualism is a particularly complex issue because there is great diversity in the process of acquiring several languages and numerous individual differences are involved. To date, research is still far from the goal of providing a clear understanding of the role that age of onset (AO) plays over an extended period of time in an input-poor (or ‘minimal input’) environment where several FLs are taught. This calls for systematic and critical examination and discussion of the age factor in multilingual education and acquisition, particularly based on longitudinal studies with a respectable number of participants. The present study is the first and only longitudinal study in Switzerland that systematically and empirically explores issues regarding the unique profiles of early vs late starters, the significance of school contexts and the amount and type of English input needed for early starters to retain the advantages conferred by their learning head start in the long term. In conformity with so many ministries of education throughout the world before them, the Swiss Conference of Education Directors (Schweizerische Konferenz der Kantonalen Erziehungsdirektoren [EDK]) decided to recommend to the cantons that they should revise their curricular policies by moving the teaching of French and English from the secondary to the primary level in 1989 and 2004, respectively (EDK, 2004). Using cross-sectional data from over 200 Swiss learners of English as a third or fourth language, as well as longitudinal data from an additional 200 Swiss learners between 2009 and 2015, has made it possible for us to examine in real time, and in a thorough and detailed manner, the relationships among (1) onset variables; (2) the full array of learners’ experience, contexts, attitudes and orientations as well as the correlation between learners’ first language (L1) mastery and target language (TL) proficiency; and (3) school achievement at the end of the period of normal

Mapping the Terrain

3

schooling in Switzerland. It also enables us to put the ‘older=better in a classroom’ hypothesis to the test: by revisiting the achievements of earlier vs later learners of English as a foreign language (EFL), it is hoped to discover whether there is indeed a long-term trend for late starters to catch up with the performance of early starters. Note, however, that the issue at stake here is not to shed light on the ultimate attainment of FL learners, but rather to analyse how what happens in the course of mandatory instructional time can be optimised for long-term benefits to unfold before the end of secondary education, given that we know today that ‘the necessary length of the relevant period of instruction [to reach native-likeness] is not within the bounds of possibility’ (Singleton & Skrzypek, 2014: 6). In order to be able to present solutions and new perspectives, it is vital to first identify the factors that do not work in young learners’ favour and that prevent them from profiting from their extended learning period, as well as understanding the mechanisms that provide late starters with the well-documented kick start; i.e. the fast learning rates in the initial stages of learning, which enable the late starters to catch up with the early starters relatively quickly (see Muñoz, 2008; Muñoz & Singleton, 2011; Singleton & Ryan, 2004). These are questions of considerable theoretical and practical significance, as they are at the heart of the debates revolving around age as one of the most powerful and misunderstood variables in the research on FL learning and teaching. They are also integral to designing effective FL pedagogy. Clearly, for educators, teachers and policymakers, as well as for theorists, it is of compelling interest to know more about the end state of the FL instruction process, as such research has important implications for multilingual education in relation to decision-making about (1) early instruction in different languages at primary level and (2) later instruction in and through different languages at secondary school. Furthermore, the comparison of English and French as FLs is expected to yield pedagogically relevant results not only in Switzerland but also for other European countries where French is still an important language. The bulk of the data were collected during a period of transition when students, who were subject to one or the other of two educational policies that were implemented before and after the Swiss Conference of Education Directors issued its new set of guidelines for FL instruction throughout Switzerland, coexisted for some time (see EDK, 2004). The ECLs were schooled according to the new model and learned Standard German from first grade onwards, English from third grade onwards and French from fifth grade onwards, while the LCLs were schooled in the old system without English instruction at primary level, learning only Standard German from first grade and French from fifth grade onwards. This constellation of learners provided a unique window into the benefits of early versus late FL learning.

4

Beyond Age Effects in Instructional L2 Learning

Research goals

In general, multiple FL programmes from an early age are complex and raise a number of interesting and important issues: how effective are they? Are they effective for students in diverse learning situations? What is the developmental relationship among the languages known by the learner? Does the sequencing of FL instruction matter? Do the power and status of one FL have negative consequences for students’ motivation to learn another FL? Does early FL instruction have a negative impact on the development of L1 (literacy) skills? As recent findings have cast doubt on the over-emphasis of the importance of biological and strictly cognitive dimensions of early FL learning in a formal, instructional setting (see research review below), the main aim of this project is to focus on the interaction of age with non-maturational factors (hence the title of this book), such as situation of learning (e.g. type of instruction, cohort effects), socio-affective variables (motivation, strategy use), the roles of L1 literacy skills and knowledge of additional languages (notably French), as well as institutional problems such as streaming and the transition from primary to secondary school, while making use of the most advanced techniques and statistical methods in quantitative and qualitative approaches to the age factor. The following general research questions will be addressed: (1) What is the strength of the association between superior FL performance and starting age, on the one hand, and socio-psychological, contextual and individual factors, on the other, in learners with a long learning experience (up to 11 years)? (2) Which input measures (length of instruction in years, type and intensity of instruction and informal, extracurricular contact with the TL) are more strongly associated with superior long-term oral and written performance? (3) Are L1 literacy skills and crosslinguistic influence (e.g. the influence of Swiss/Standard German or the intervention of the influence of French as an additional FL) affected by the AO? It is important to mention that as recent research has challenged the notion of construing FL ‘success’ as the attainment of native-likeness (Muñoz & Singleton, 2011), the upper limits of competence are not compared here to the competence of native monolinguals (for a recent study that includes a native-speaker dimension, see Tavakoli & Foster, 2010). We thus refrain from using monolingual native-speaker proficiency as the yardstick for evaluating the attainment of early and late FL learners. As we have indicated, the notions of ‘long-term’ and ‘ultimate FL achievement’ in the present context refer to the end-state or ‘final product’ of obligatory EFL instruction at the termination of secondary education (see Hammer & Dewaele, 2015), the natural limit of most students’ FL learning process

Mapping the Terrain

5

(see Muñoz, 2008: 583). Needless to say, such an interpretation might lead to attainment becoming confounded with rate effects, as the amount of instructional time may not be considered sufficient for the study of long-term benefits (e.g. Birdsong, 2004, 2006; DeKeyser, 2000; Muñoz, 2008). However, the issue at stake here is not to focus on very long-term attainment in FL learning, but to analyse how the length of mandatory instructional time can be optimised for benefits to unfold. Another major goal of this project was to address the issue regarding the limited availability of convenient and successful methods for analysing age effects. For instance, so far only minimal attention has been paid to the interaction between person and context in quantitative age research. We are going to provide a non-technical but hopefully accessible account of statistical models that focus on the learner as a developing system on their own, rather than as a generalized hypothetical representative of a larger sample. Another criticism is raised by Moyer (2014: 458), who suggests that because AO has a significant relationship to experience, the nature of that relationship must be clarified in future research: ‘research needs to open up to the relative “messiness” of introspective methods’ if we are to understand the social implications of age-related learner outcomes. As Pica (2011) rightly points out, the heavy emphasis on age in making decisions about school policy and practice has overlooked the abundant research on psychosocial factors, such as learner personality and motivation, that have been shown to impact language learning in school context: A range of social, cognitive, and affective factors, especially those that bear on the ability to learn and apply SLA skills and strategies, is relevant to explaining why, for so many, early L2 schooling is not necessarily better, and initiation of formal learning at a somewhat later time might be best. (Pica, 2011: 260) Thus, we aim to also draw on methodological advancements in studying AO and subjective well-being in the classroom by combining large-scale quantitative methods that account for both participant and item variability with individual-level qualitative data. Such a mixed methods approach, that is, the meaningful merging of qualitative and quantitative approaches, offers a radically different new strand of research methodology that suits the multilevel analysis of complex issues, because it allows investigators to obtain data about both the individual and the broader societal context and brings out the best of the qualitative and the quantitative paradigms while also compensating for their weaknesses (Dörnyei, 2009: 242; Dörnyei & Ushioda, 2011: 205). In a quantitative perspective, multilevel analyses (a subgroup of linear mixed effects regression modelling) are performed to investigate to what extent late starters’ long-term achievement in instructional settings

6

Beyond Age Effects in Instructional L2 Learning

matches the supposedly advantaged performance of early starters and to analyse how social, psychological and contextual variables factor into this (see Chapter 3). In order to complement and triangulate the quantitative findings and to capture psychological elements of learning EFL from different ages and at different levels that are internal to the learner, such as memories, beliefs and experiences, language experience essays produced by the participants are used to help identify aspects of early and late EFL instruction that seem salient to particular individuals at the beginning and end of secondary school, and, thus, to help constrain the multitude of influential factors that play a role in L2 development beyond age effects. Such a holistic approach considers the combined and interactive operation of several different elements/conditions relevant to specific situations, rather than following the more traditional practice of examining the relationship between well-defined variables in relative isolation (Dörnyei et al., 2014: 1). We also believe that this approach can better account for the interaction of AO and other ‘often hidden variables’ (see Muñoz, 2014a: 466) than an approach solely focusing on learners’ long-term outcomes as a function of AO. The methods described in this study may thus provide a much-needed convenient way to study the age factor in a way that solves many of the difficult problems facing previous methods. Note, however, that we do not aim to provide a detailed overview of research designs, methodologies and instruments for investigating the age factor. The principal focus will be on the broad conceptual profile of the statistical models in question, rather than the technical details. Finally, a few words are in order concerning our sample of secondary school students. Secondary education in Switzerland comprises six years (Grades 7 to 12), and, on average, students study around 13 school subjects per academic year. The school track under investigation here, which we refer to as the ‘academically oriented secondary school’ category, represents the main – but not the only – university entry pathway. It is a selective, publicly funded school category, representing one of three main secondary school tracks (the highest, in terms of educational level). In the canton of Zurich, admission is based on students’ average grades and an entrance examination. The number of those taking the matura or maturité exam (i.e. the final graduation exam) has increased in recent years. Between 1986 and 2013, the percentage of students awarded this certificate almost doubled to 20% (http://www.bfs.admin.ch/bfs/portal/de/index/themen/ 15/01/pan.html). There are three main reasons why it was decided to assess the development of EFL skills in this group of learners: (1) This particular secondary school track is roughly equivalent to grammar schools, baccalaureate schools and high schools in other countries in terms of length of instruction (six years until graduation), institutional design (e.g. number and kinds of compulsory subjects, assessment of students, final certificate) and purpose (e.g. they do not

Mapping the Terrain

7

lead to professional qualifications, but prepare students for tertiarylevel education programmes). This is important for comparisons with related previous work in Europe and elsewhere. (2) Other secondary school types (i.e. performance-based groups at basic and intermediate levels), whose programmes have a duration of only three years, are not ideal to test for long-term effects (in the school context-related sense defined above) of an early FL programme. In age research, including school-based age research, it is one of the most basic and most important tasks to identify predictors of shorter-term and longer-term FL achievement. Furthermore, it has been previously suggested that it takes a substantial accumulation of input to yield manifestations of advantages of an early start (e.g. Larson-Hall, 2008; Muñoz & Singleton, 2011; Singleton, 1995a, 1995b, 2005). (3) Assessing ‘good and motivated’ learners,1 who (ideally!) involve themselves in the language-learning process and consider the demands that FL learning imposes, should not be considered a limitation in this kind of study: strong learners can provide key data on the effectiveness of a new FL programme and yield revealing results in the search for influential factors in the process of FL learning (see, e.g. Muñoz, 2014a). The insights thus gained can then also help learners who are not obtaining such good results. (4) Finally, having all the groups at similar state schools in one and the same canton presented a number of design advantages. The students worked with the same materials, and although some of the participants had different teachers, they all followed a similar methodology and curriculum. Before we provide more specific information about the methodology of our research, we will provide a brief state-of-the-art review of studies that have focused specifically on the age factor and that have inspired this research.

Growing Out of the ‘Earlier=Better’ View: Changing Perspectives on the Role of the Age Factor as a Predictor of Success in Foreign Language Learning As the research questions above imply, the starting-point of this study were several widely held – and sometimes competing – elements of folk wisdom: on the one hand, the idea that you need to be solid in your L1 before studying an FL – and that the latter might somehow ‘contaminate’ the former – and, on the other hand, the belief that ‘a tree must be bent while it is young’ (or ‘the earlier the better’). It is particularly this last belief, the assumption that age is the most important and robust predictor of success in FL learning, that has sparked heated debates and divided minds for centuries.

8

Beyond Age Effects in Instructional L2 Learning

Interest in age as an explanatory factor of differences in L2 learning has a long history. In the first century AD, Marcus Fabius Quintilianus (c. 35–c. 100 AD), a Roman rhetorician from Calagurris, now known as Calahorra, Spain, compiled one of the most complete descriptions of the Roman educational system. In the first volume of this Institutio Oratoria (English Institutes of Oratory or Training of an Orator), which is a 12-volume textbook on the education of rhetoricians from childhood to adulthood, published around 95 AD (Murphy, 2012), Quintilian commented on the proper age for beginning to learn literacy skills in different languages. He believed that language instruction should be started early, particularly because of children’s superior cognitive capacities (according to him, memory is ‘specially retentive at that age’) and because an earlier start increases the learning period. Furthermore, comparing children’s mind to farmland that ‘should not be allowed to lie fallow for a moment’, he mentioned a critical period – or ‘formative years’, as he expressed it – between birth and the age of seven, which must not be ‘wasted’: Some hold that boys should not be taught to read [in Latin and Greek] till they are seven years old, that being the earliest age at which they can derive profit from instruction and endure the strain of learning. … Those however who hold that a child’s mind should not be allowed to lie fallow for a moment are wiser. Chrysippus, for instance, though he gives the nurses a three years’ reign, still holds the formation of the child’s mind on the best principles to be a part of their duties. … Let us not waste the earliest years: there is all the less excuse for this, since the elements of language training are solely a question of memory, which not only exists even in small children, but is specially retentive at that age. (Institutio Oratoria I, I, 13–17) As we will see below, many of Quintilian’s viewpoints and arguments have survived the passing of time and crop up again in European education policy documents. Interest in the age factor as a main predictor of L2 learning outcomes was reignited in the 20th century by Penfield and Roberts (1959), who propounded a brain plasticity hypothesis, according to which, the optimal period for language acquisition ends when, at the end of childhood, the brain – according to their hypothesis – begins to lose its plasticity, that is the ability to form new connections between brain cells. This line of thinking became firmly established with Lenneberg’s 1967 proposal of a critical period hypothesis (CPH) for first and second language acquisition (SLA), which stated that there is a limited developmental period during which it is possible to acquire a language to normal, native-like levels – although it should be noted that Lenneberg’s original claim was intended for native/L1 development, not second or multilingual language development. Following

Mapping the Terrain

9

in Lenneberg’s (1967) footsteps, numerous critical period-friendly studies have laid out evidence for the proposition that people who acquire an L2 later in life generally perform more poorly on tests of various dimensions of language proficiency than native speakers or early acquirers. A fairly recent example is Abrahamsson’s (2012) study, which found that nativelike morphosyntactic and phonetic intuition ceased to occur after the age of 13. In naturalistic studies, AO of acquisition has traditionally been found to be a strong predictor of level of ultimate L2 attainment (for a recent publication, see Murphy & Evangelou, 2016). Most neurobiological studies dealing with this topic either propose that (1) younger learners process L2 information differently from older learners, which is rooted in different brain mechanisms (e.g. Clahsen & Felser, 2006; Hernandez et al., 2007; Morgan-Short et al., 2010; Saur et al., 2009; Ullman, 2005; Wartenburger et al., 2003); or (2) they process it similarly but under difficult circumstances (Dick et al., 2001; Herschensohn, 2007; McDonald, 2000, 2006). In the field of multilingualism, studies on self-reported proficiency and emotional expression in multilinguals have claimed that AO generally has a significant effect on self-perceived competence in the L2: the higher the AO, the more likely it is that L1 will remain the more emotional language (Dewaele, 2009a, 2009b, 2010; Hammer & Dewaele, 2015; Munro & Mann, 2005). However, scientific evidence has mounted steadily against the contention that, to reach very advanced levels of proficiency in an L2, individuals must acquire it before whenever the critical period is seen as ending. On this last point, it is worth saying that regarding the offset of the critical period, although puberty is (following Lenneberg) often mentioned in this regard, other suggestions abound (cf. Singleton, 2005). This undermines the plausibility of the whole notion of a critical period, and deprives the concepts of ‘early’ and ‘late’ L2 learning of any kind of stable reference point (cf. Muñoz & Singleton, 2011). If there were clear evidence of an offset point for a window of opportunity for language acquisition, it ought surely to be possible for researchers to agree where it is situated. Concerning reference points, if 12 years is taken to be the critical age, L2 learning at age 4 is ‘early’ learning; if 12 months is taken to be the critical age, L2 learning at the age of 4 is ‘late’ learning. Variability also relates to the scope of critical period effects. Whereas, for example, Lenneberg saw maturational constraints as affecting language in general, for others they are relevant only to accent. Also worth attending to in this connection is the fact that, within the naturalistic context, the ‘younger=better’ tendency is just that – a tendency. It is not the case that everyone who begins an L2 in childhood in an informal setting ends up with a perfect command of the language in question (see Montrul, 2008); nor is it the case that those naturalistic learners who begin the L2 later in life inevitably fail to attain the levels reached by younger

10

Beyond Age Effects in Instructional L2 Learning

beginners (see Kinsella & Singleton, 2014). Dörnyei (2007: 243) points out that ‘[c]ommon figures of the proportion of post-pubertal learners who reach a native-like level range between 5 and 10 per cent of learners in naturalistic learning environments [...], which is definitely large enough to qualify as something more than merely a few fluke cases’. In any case, researchers have increasingly been seeing age as a very complex factor, a ‘macrovariable’, and most have been calling for dimensions other than maturation to be taken into consideration in this context. The current understanding of the age factor in naturalistic learning situations includes the following insights (see also Pfenninger, in press): •









while it has been argued that they both may impact on L2 achievement by confounding with cognitive factors, education and other background variables (Bialystok & Hakuta, 1999), several scholars (e.g. Muñoz, 2008) have put the case for the claim that a confound between chronological age and AO may partly explain the negative effect on the performance of the youngest learners in comparison with older learners in school settings, and may thus contribute to the positive relationship between L2 proficiency and older age of learning; ‘age is a factor from birth to death’ (Singleton & Lengyel, 1995: 51), i.e. it is hard to find an overall critical period effect with a particular cut-off age but the language learning ability continues to decline throughout life; those whose exposure to an L2 begins in childhood, in general, eventually surpass those whose exposure begins in adulthood, even though the latter usually show some initial advantage over the former, and, in some cases, achieve long-term native-like levels of performance; there are differential age effects across domains, seemingly supporting the ‘multiple critical/sensitive period hypothesis’, reported by Granena and Long (2013), Huang (2014) and Werker and Tees (2005) – e.g. L2 morphosyntax seems to be more vulnerable to processing difficulties than L2 lexico-semantics and therefore more susceptible to age; it is difficult to determine the significance of AO precisely because it cannot be disentangled from other, co-occurring factors that seem to operate more favourably in respect of younger learners but have nothing to do with effects of biological ageing per se (e.g. positive attitudes, risktaking behaviour, openness to the new culture, greater commitment of time and/or energy, better support system, friendships with speakers of the TL, urge or need to fit in, exposure to rich and varied input in school).

Research on early FL learning in typical limited exposure classrooms has painted a somewhat different picture from that which emerges from naturalistic age research, in that very few linguistic and extralinguistic

Mapping the Terrain

11

advantages have been found to be associated with beginning the study of an FL earlier in an instructed situation. Learners who started later, and therefore had less time to learn, have been found to be equal or superior to earlier beginners across a range of measures and skills (see, e.g. Al-Thubaiti [2010] for Saudi Arabia; Larson-Hall [2008] for Japan; Myles and Mitchell [2012] for Great Britain; Muñoz [2006a, 2011] for Catalonia [Spain]; Unsworth et al. [2012] for the Netherlands). Muñoz’s (2011, 2014b) research studies on longer-term outcomes in FL learning contexts still revealed no early learning benefits; i.e. the results in FL learning contexts do not seem parallel to those in the naturalistic context. Reasons that have been proposed for the poorer showing of early starters in a classroom setting have included the proposition that the implicit kind of learning at which younger learners excel requires more input than primary school FL instruction provides and that older learners are immediately able to make more of their advanced literacy skills in the L1 (see Muñoz & Singleton, in press; Pfenninger, 2011, 2014b). DeKeyser (2012) discusses age-by-treatment interaction research in the narrow sense, suggesting that different learning processes are at work at different ages, which may imply the need for different treatment (implicit instruction for younger students vs traditional teaching methodology for older students). Sze (1994) mentions that because classroom-based L2 learning is generally more cognitively oriented than naturalistic acquisition, there is more reason to believe that the older instructed learner, whose cognitive ability is more developed, will outperform the younger learner in the L2 classroom. Lightbown (2003: 8) points out that ‘in instructional settings where the total amount of time is limited, instruction may be more effective when learners have reached an age at which they can make use of a variety of learning strategies, including their L1 literacy skills, to make the most of that time’. It is important to note that the ‘older-is-better’ trend has also been found in partial and full immersion programmes (see, e.g. Genesee, 1987; Harley, 1986). Actually, such negative findings regarding the effects of early instruction go back a long way. For instance, the inauguration of projects to introduce L2 instruction into primary/elementary schools in the 1950s and 1960s was dealt a severe blow by the findings of research in the 1970s that cast doubt on the capacity of early instruction to deliver higher proficiency levels as compared with later instruction (e.g. Burstall et al., 1975; Oller & Nagato, 1974). As early as 1975, Carroll recommended the following procedure with respect to the age issue, based on the findings of an eight-country study of learning French as an FL: The data of the present study suggests that the primary factor in attainment of proficiency in French (and presumably, any foreign language) is the amount of instructional time provided. The study

12

Beyond Age Effects in Instructional L2 Learning

provides no clear evidence that there is any special advantage in starting the study of a foreign language very early other than the fact that this may provide the student more time to attain a desired performance level at a given age. In fact, the data suggest that students who start the study of a foreign language at relatively older ages make somewhat faster progress than those who start early. The recommendation that emerges is that the start of instruction be placed only so early as to permit students to have the amount of instructional time they need to achieve whatever level of competence is regarded as desirable by a given stage of their education. If necessary, the start of instruction can be delayed more than normally if more intensive instruction is given. (Carroll, 1975: 276–277) The disillusionment occasioned by such findings seems, however, to have been rather swiftly consigned to the dustbin of history, and more recent and continuing negative results in this connection have also been largely ignored. In fact, at the very moment when the pros and (especially) cons of the teaching of additional languages to young learners was being debated in research circles, a general reform movement was changing educational systems all around the world in the direction of establishing instruction in such languages ever earlier in the curriculum (see Pfenninger & Singleton, under review). In 1965, the first French ‘immersion’ kindergarten opened its doors in an anglophone elementary school in St. Lambert, Montreal, Canada. Throughout the 1980s and the early 1990s, education systems in the EU and the EFTA/EEA countries promulgated numerous reforms along these lines (Eurydice, 1995: 1). This is perhaps not too surprising in the light of Spolsky’s (1989) claim that educational systems usually arrive at a decision about optimal learning age on political or economic grounds and only subsequently seek rational justification for their decision. In the following, we are going to take a closer look at this hypothesis.

Multilingual Switzerland as an Empirical Framework In many respects, Switzerland is a perfect place to examine non-native English acquisition and multilingualism (see Pfenninger & Singleton, 2016d). First, because it is a multilingual nation, displaying national quadrilingualism with four national languages – German, French, Italian and Romansh – enriched by medial diglossia in the German-speaking part (Swiss German, Standard German), a relatively high number of FLs are taught at primary school level and secondary school level (EDK, 2004). Specifically, it is a country that is characterised by institutional and societal multilingualism. In spite of the legal foundations aiming at widespread societal multilingualism, the multilingual repertoire of individuals may vary a great deal within each language area (see, e.g. Berthele, 2016;

Mapping the Terrain

13

Durham, 2014; Dürmüller, 2001; Lüdi & Werlen, 2005). Second, Germanspeaking Switzerland represents one of the few minority contexts where early trilingual schooling is common, partly because the L1 (Swiss German) is not the primary literary and written language. Finally, Switzerland is on the verge of transitioning from an EFL country to an English as an L2 country by virtue of the fact that English is used as an intranational lingua franca in a number of domains (see, e.g. Durham, 2007; Dürmüller, 2001), which means that many of the learners of English as an FL strive for high proficiency in this language. However, despite Switzerland’s commitment to official linguistic diversity, a significant amount of intra-Swiss tension has historically centred on questions of language (Demont-Heinrich, 2005: 70). In the following two sections, we provide general information about the linguistic circumstances in Switzerland and on the most frequently mentioned Swiss standpoints as to when to teach which language, as well as the type of early language learning programmes offered.

Swiss German and German

Thuleen (1991: para. 3) points out that as Switzerland is a multilingual nation, ‘[t]his has in some ways led to the standardization of a High German usage, often in somewhat stilted and formal situations, but in a Standard German that varies slightly from the one used in Germany and Austria’ – although we need to be careful about suggesting causal links here (Swiss High German as a result of Swiss multilingualism or an unrestrained desire to be accepted as part of pluricentric German-language culture?). Standard German is rarely spoken, and so the situation which is in place is sometimes referred to as ‘medial diglossia’ (Kolde, 1981) or ‘functional diglossia’ (Brohy, 2005; Rash, 1998; Werlen, 2004). As the Swiss term for Standard High German (Schriftsprache or Schriftdeutsch) implies, Swiss Standard German is primarily a literary and written language, its spoken form is used in some selected formal situations (e.g. in school, public speeches, some broadcasting and increasingly in communication with immigrants in their first years after arrival). While Swiss German is mutually intelligible with some (predominantly) southern dialects of Germany (like Swabian or Bavarian), Swiss German is hardly understandable to someone who knows only Standard German, as the two varieties differ to a significant extent in lexicon, phonology and syntax (see Demont-Heinrich, 2005). According to Berthele (personal communication, 3 September 2016), who measured the typological difference between Standard and Swiss German, the two varieties differ by a similar extent as Dutch and German. Accordingly, in what follows, in those areas of language where there are distinctions between Swiss German and Standard Modern High German, the differences will be pointed out. While many Swiss German children have a good receptive – and sometimes even productive – command of Standard German before they

14

Beyond Age Effects in Instructional L2 Learning

enter school (e.g. because of exposure to the media), Standard German is learned by all Swiss German children at school at the latest, starting in the early primary grades. Specifically, children receive formal literacy training in L2 German from first grade (age seven) onwards. However, as pointed out above, while Standard German might be the language of literacy, it is not a dominant societal language. Several authors (e.g. Brohy, 2005) have observed an increased use of the Swiss German dialects in schools, media and public discourse. Even more interestingly, recent studies on the linguistic competencies of Swiss adults have shown that over 50% of native Swiss German speakers mention Standard German as their first FL and that they would prefer switching to French or English (Werlen, 2004).

World language English vs national language French

The local learning contexts are fundamentally different for English and French in German-speaking Switzerland. As in many other European countries (see, e.g. Henry, 2014), the status of English in Switzerland has undergone considerable change, and the distinctions between learning English as an FL and learning English as an L2 – and consequently the distinctions between English as a school subject and English as a global lingua franca – have become increasingly difficult to sustain, although, admittedly, its value as a global language for wider communication still outweighs its use as an L2 intranationally (Stepkowska, 2016; but cf. Durham, 2016). Even though English is still not the medium of communication in society but is learned formally in the classroom, English is increasingly regarded as a component of basic education (Graddol, 2006; Lo Bianco, 2014; Ronan, 2016). As Durham (2007, 2014), among others (e.g. Watts & Andres, 1993), has noted, the ‘expanding circle’ of Kachru’s (1985) depiction of the historical spread of English around the world from its native Anglophone ‘inner circle’ bases has recently been dramatically instantiated in Switzerland to the extent that it is on the verge of transitioning into a second language country (‘outer circle’), which has many kinds of implications. On the one hand, in the last decade, English has become part of a basic social literacy and a medium of expression used extensively in day-to-day life, particularly among young people, who spend larger amounts of time in Englishmediated environments than previous generations (see Ronan, 2016). On the other hand, English is regularly used in the working environment, particularly in multi-national companies that may introduce English as a general company language (e.g. Stotz, 2001). It is in this linguistic and educational context that, since the 1990s and early 2000s, two kinds of discourse have been competing: on the one hand, a discourse promoting the multilingual nature of Switzerland through its four national languages, which has construed the advance of English as a threat to the harmonious coexistence of those languages (particularly

Mapping the Terrain

15

French), and, on the other, a discourse promoting English in addition to the L1 in preference to other national languages (see Watts, 2011: 272). With respect to the first discourse, which Watts (2011: 278) calls the ‘pure language myth’, some voices have warned against the ever-increasing presence of English and its purportedly negative effects on language competence and attitudes towards French (compare the discussions, e.g. in Aebeli, 2001; Dröschel, 2011; Rosenberger, 2009; Stauffer, 2001), whereas others are highly critical of the early teaching of several early FLs. Brohy (2005: 136) observed in 2005 that ‘many fear that English could ultimately threaten the national languages as a communication base between the indigenous language communities and many raise the question as to its status as a fifth national language’. Durham (2016: 107) even goes as far as to call English a ‘de facto Swiss language’. It is thus not surprising that the decision by the Zurich cantonal council (Kantonsrat) and former Zurich Director of Education, Ernst Buschor, to designate English rather than French as the first FL taught in primary school, set off a national furore at the beginning of the 2000s (Demont-Heinrich, 2005: 75). According to Watts (2011: 268), the driving force behind this movement in Zurich lay in commerce and industry; i.e. early English was introduced because it was considered absolutely necessary in the world of business and global commerce – a phenomenon that Watts calls the ‘English as the global language myth’, which is the core belief in the second type of discourse mentioned above. We elaborate on the motives for introducing early FLs in more detail below. Crystal (2003) calls attention to the fact that other languages may be pushed aside by the rising importance of English as a global language: Perhaps the presence of a global language will make people lazy about learning other languages, or reduce their opportunities to do so. Perhaps a global language will hasten the disappearance of minority languages, or – the ultimate threat – make all other languages unnecessary. (Crystal, 2003: 15) Indeed, as early as 2004, Manno observed that there was a lack of motivation to acquire the French language in German-speaking Switzerland, which reportedly resulted from the difficulty of the subject. While learning English was embraced by many with euphoria and often with an uncritical attitude (although there were critical votes by the Social Democrats and the Greens in the cantonal debates in Zurich around 1998; cf. also Stepkowska, 2016), especially in relation to ‘its alleged ease of learning’, the French language – in particular French as a school subject – was suffering from ‘a very poor reputation in German-speaking Switzerland’ (Manno, 2004: 153; see also Projekt Fremdsprachenevaluation BKZ, 2016). Stepkowska (2016) suggests that the fact that English is preferred by the

16

Beyond Age Effects in Instructional L2 Learning

Swiss comes from concrete communication needs which are the source of instrumental motivation. In a school setting, Schwarz et al.’s (2006) study showed that after the national languages, their 280 informants named the English language as being their favourite language (see also Deluigi, 2015). By contrast, Haenni Hoti and Werlen’s (2009) large-scale Swiss study on early FL instruction in primary schools found no effects of English on the motivation for learning French. There were no significant differences in the motivation regarding learning French between the group who had already been learning English in school and the group who had not yet started learning English. This shows that the phenomenon we observe today is rather more complex than simply the proliferation of English and the alleged linguistic imperialism associated therewith (see Duchêne & Heller, 2012) – a hypothesis which will be further tested in our study.

Pedagogical and Curricular Landscapes in Switzerland The optimal age debate in Swiss policy documents

Towards the end of the 1990s, the pressure to introduce English as an FL that all students had to learn got stronger and stronger, especially in the Zurich region (Brohy, 2005). It was then that the Ministry of Education asked a group of experts to develop a coherent national concept for the teaching and learning of second and foreign languages in Switzerland. As mentioned above, one dominant kind of discourse driving the introduction of early English was the discourse of global English, fuelled by the perceived needs of the Swiss economy and Swiss business and industry (Watts, 2011: 277). However, as we have already seen, there is another kind of ideology that has become part of the hegemonic discourse: the assumption that the AO of instruction is one of the most important and robust predictors of success in FL learning in an instructional setting. Along those lines, most language policy documents explicitly state the advantages of early language learning. While some of them define the benefits in a broad sense (e.g. better L1 skills, metalinguistic awareness and favourable attitudes to other languages, people and cultures) and list conditions under which they can be beneficial (e.g. trained teachers, small classes, enough time devoted to languages), others are more concrete concerning the cognitive advantages of early starters (see Appendix 1 for the original quotes in German): (1) ‘On neuro-psychological grounds early learning is especially important and profitable specifically for the learning of languages: early language learning is more efficient, creates favourable preconditions for the learning of further languages and promotes the development of language-learning strategies’. (EDK, 2004, our translation)

Mapping the Terrain

17

(2) ‘An interval of two years between the introduction times of the two foreign languages is ideal from the perspective of neuropsychology’. (Pedagogical School of Higher Education, Zug, 2013, our translation) (3) ‘Younger learners are capable of acquiring and storing a language unconsciously, provided they are exposed to regular and rich input. Language skills that are stored in this manner will automatically be available to the learner later in life’. (Englisch an der Primarschule. Grundsatz und Rahmenbedingungen. Erwägungen zum Bildungsratsbeschluss und Bildungsratsbeschluss vom 18.3.2003 (Advisory Council of Education (Bildungsrat) of the Canton Zurich), 2003, our translation) These quotes perfectly exemplify the ‘leaps in logic’ made in the misapplication of research that was carried out in one context to another context, as described by Hatch (1979: 138) 30 years ago. Clearly, the arguments of the policymakers are based on research that has investigated age and L2 learning in the naturalistic, not the instructional, setting (see also Spada, 2015) – although oftentimes the authors merely relied on workshops and consultancy with content and language integrated learning (CLIL) and immersion specialists rather than SLA research. As Examples (1) and (2) show, many explanations that have been offered for the differences between early and late starters in Swiss policy documents are concerned with the neurobiological advantages of younger learners. As early as 2000, Marinova-Todd et al. (2000: 14) cautioned that ‘[g]iven the glamour of brain science and the seemingly concrete nature of neurophysiological studies, the conclusions have often been readily accepted by the public’. However, as noted in Muñoz and Singleton (2011: 25), findings from a number of neurolinguistic studies conducted since the 1990s could not provide decisive evidence concerning the existence of a critical or sensitive period, ‘because they fail to relate differences in brain activation patterns to differences in target language proficiency’. Many neurobiological studies essentially fail to relate differences in brain activation patterns to differences in TL proficiency and do not imply anything about critical periods – and thus are essentially irrelevant to any claim concerning early vs late FL learning (see Marinova-Todd et al., 2000; Snow, 2002). The current consensus among cognitive scientists is that the brain remains plastic throughout life, and that the brain is modified by experience at any age (Ramírez-Gómez, 2016). Recently, neuroscientists have offered other explanations than the one that is, for instance, based on the gradually diminishing plasticity of the brain or changes in the perceptual and sensory-motor processes. In 2013, for instance, DeKeyser (2013: 55) stated that ‘[t]he neurological correlates could be much more subtle, and could be completely dependent

18

Beyond Age Effects in Instructional L2 Learning

on accumulated learning experience rather than independently dictated by biological development’. Example (3) illustrates what we call the ‘linguistic sponge myth’: children soak up language like a sponge; you just have to talk to them and they learn automatically. Partly because of the overinterpretation of the facts relating to naturalistic SLA found in the age literature, members of the public have been given a false sense of the ultimate proficiency attainable by young L2 learners generally. Children tend to be considered capable in all circumstances of acquiring a new language rapidly and with little effort, whereas adults are believed to be doomed to failure (Marinova-Todd et al., 2000). For the sake of fairness, it needs to be mentioned that the main goal of the Swiss policy is for all or most children to reach Level A2.1 in the Common European Framework of Reference for Languages (CEFR) at the end of primary English in three skills and A1.2 in writing, which might not sound like an unreasonable promise at first glance. However, the main (explicit or implicit) promise is that earlier starters will do better than later starters in a school context – a claim that deserves further attention in this study. Although in the EDK report of 2012, the optimism displayed in Example (1) was duly mitigated, the ‘earlier=better’ myth appears to have triggered a discourse concerning the acquisition of language that very quickly became hegemonic; one notes, incidentally, that this line of discourse made its first appearance at least as early as the first century. In Pfenninger and Singleton (under review), how this myth began to emerge is described, and Pfenninger and Singleton trace its progress into 21st-century discourses on FL teaching and learning. One of the most important points made is that the ‘earlier=better’ myth is not a lie; it was not (and is not) recounted with the intention of deceiving anyone, and it does contain important elements of truth. However, the problem with the ‘catch them young’ notion discourse is that it is used to justify and legitimise the decision to introduce English language instruction at earlier grade levels, and it is readily accepted by the public, arguably because it provides a plausible and easily comprehensible explanation for one dimension of the relatively complex phenomenon that is language learning. It is part of a long cultural tradition, and it is a subject that laypersons who would feel unqualified to speak on other pedagogical topics are eager to express their views about. After all, common experience tells us that starting to learn anything early in life – the violin, chess, golf – often appears to yield dramatic advantages. It should be said that much of the ill-informed and naïve optimism on the part of parents regarding early FL instruction has disappeared in Switzerland in recent years, in the light of their experience of its effects. According to Watts (2011: 4) though, doubting the factuality of such a deeply ingrained myth can be ‘interpreted as an act of heresy if the story, or even only part of it, is

Mapping the Terrain

19

firmly and widely believed by the group’, even though for more than five decades, classroom research has contradicted these commonly held beliefs and myths that have influenced the instruction, educational practices and organisational structure of educational programmes. Another major aim of early English in Switzerland seems to be to support the learning of French later in primary school; the following extract highlights the claim of a positive impact of an early FL on later learned languages (see also Haenni Hoti et al., 2011): (4) ‘In the learning of the second foreign language primary-school children benefit from their learning of the first: there are positive transfer effects from preceding languages’. (Pedagogical School of Higher Education, Zug, 2013, our translation) Along those lines, the general conclusion reached in the Swiss study by Haenni Hoti and Werlen (2009) was that the new model with early English supports and helps the pupils to learn French because English has a positive influence. Early FL instruction has always also had overt political and economic dimensions. For instance, the official multilingualism requirement/goal in Europe and the Council of Europe’s active promotion of ‘plurilingualism’ (Eurobarometer, 2006; European Commission, 1995) have had a clear impact on policy decisions in Switzerland. Although not part of the EU, Switzerland has followed a similar path in its language policy to many European countries and has acknowledged the importance of establishing multilingualism with respect to its far-reaching impact on globalisation, competitiveness and national and cross-national stability among others (Hutterli, 2012: 9). Plurilingualism is here defined as an attribute of the individual rather than community, and is deemed to have a significant cumulative influence on the development of a European sense of identity, as it enables participation in democratic, social and political processes within the multilingual context of Europe (Breidbach, 2003). According to the official policy directions in Switzerland, the main goal of early English in Switzerland is to foster multilingualism in future generations and to provide the students with the possibility of becoming solid (not nativelike) L2 speakers, as a consequence of globalisation, integration and the worldwide network. In Europe, language use and language rights have long been central concepts of EU citizenship, and language skills have become economic assets and a series of targeted agendas. According to Spada (2015: 74), the English language has become ‘a valuable commodity for countries wanting to position themselves more firmly in the global economy’ (see also Watts, 2011). Dürmüller (2001: 71) notes that ‘the attraction of English lies largely in the economic benefits that come with it and in the fact that good or even fair knowledge of English offers access to the

20

Beyond Age Effects in Instructional L2 Learning

US-dominated Western cultural community’. What is more, knowledge of English correlates with improved life chances in Switzerland, it is needed for many jobs today and it is associated with significant earnings gains on the Swiss labour market (see Grin, 2001) – although it should be mentioned that Grin also argues that the financial advantage of English is likely to drop over time (see also Lüdi et al., 2013). The new guidelines also aim at taking into account both the political and cultural significance of the four national languages (German, French, Italian, Romansh) on a national level, as well as the steady growth of English as a lingua franca at an international level: (5) Various reasons argue for keeping French, alongside English, in the primary grades within the framework in which it has operated to date. For national-political reasons alone the idea of assigning a shortened instruction time to the national language as compared to English does not come into consideration. The learning of French, which is a national language and one of the important languages of Europe, is an expression of the esteem in which French-speaking Switzerland is held. (Englisch an der Primarschule. Grundsatz und Rahmenbedingungen. Erwägungen zum Bildungsratsbeschluss und Bildungsratsbeschluss vom 18.3.2003 (Advisory Council of Education (Bildungsrat) of the Canton Zurich), 2003, our translation) This ties in with the question of whether the Swiss language communities need some sort of minimal exposure to each other’s languages to maintain a collective sense of national identity (Demont-Heinrich, 2005; but cf. Andres et al., 2005). With the introduction of two FLs in primary school (English and French), a compromise was able to be reached in the heated debate about whether a national language (French) or a world language (English) should be prioritised in primary school. Furthermore, in Germanspeaking Switzerland, there are now two regions with a differing FL sequence (French-English vs English-French, see EDK, 2015). It is also claimed that the idea of introducing early English into Swiss primary schools has been a response to parental desire rather than an action based on theories of child language development: at the beginning of the 2000s, many parents increasingly expected primary schools to include one or several FL(s) in the existing curriculum (Driscoll, 1999: 9). Swiss parents have high expectations for their children’s achievement in English. According to former Zürich Director of Education, Ernst Buschor, one of the leading figures in the rise of English in the sphere of primary public education: … some parents had started to set up private English classes because they were worried that their children were learning it too late. And we

Mapping the Terrain

21

didn’t want this kind of division [between richer and poorer children] to arise. (Bierling, 2002: para. 3) This is important insofar as parents are nowadays able to exert influence on the curriculum by opting for schools of their choice, which has increased parental power immensely in recent years (Driscoll, 1999).

The unique challenge of (early) multiple foreign language instruction

The national reforms for FLs have posed several challenges. The complexity of the Swiss educational system includes several models, different types of state schools and different approaches to the teaching of English, as well as reflecting cantonal differences. In fact, there are 26 different education systems for the 26 cantons and half-cantons (Brohy, 2005); i.e. the cantons are sovereign in matters of education, with the federal state playing the role of coordinator, although because of the constitution and the languages law, the 26 cantonal systems are not so different. In Zurich, English and French were traditionally (i.e. before 2000) introduced in the first or second year of secondary school (age 13–14), but when the educational reforms were implemented in 1989 (French) and 2004 (English), respectively, English was introduced in the second year of primary school when children are eight years old, while French instruction began in Grade 5 (age 11). The fact that early French used to be implemented before early English has provided us researchers with the possibility of comparing groups of children who have started their English classes at two different ages within the same FL programmes and school curriculum. Although the external pressure to learn FLs is strong, the model of early language learning in Swiss elementary schools is characterised by (a) a fairly limited amount of time that is dedicated to instruction per week; (b) teachers who are generalists rather than specialists; and (c) a teacher proficiency-level in the FL often falling far short of nativespeaker level, although the goal is for newly graduating primary teachers to have C1 level. While primary school children usually have two to three lessons of English a week, English takes between three and four hours of the weekly school timetable in secondary school. Interestingly, the achievement levels for the end of compulsory school have not been raised as a consequence of the earlier introduction of English (B1 level) (Bildungsdirektion des Kt. Zürich, 2010). Early English classes often focus on spoken English, particularly vocabulary (formulaic language), leaving formal grammatical instruction untill secondary level. The syllabus is loosely based on the concept of CLIL, although current curricula also list grammar structures to be learned in primary school (how students do this is left open in the curricula). In the

22

Beyond Age Effects in Instructional L2 Learning

CLIL classroom, the accent is placed on EFL sensitisation, oral fluency, comprehension, cultural awareness, vocabulary and formulaic language, based on the hypothesis that younger children cannot attend to formal, explicit FL instruction to the same extent as older children because prepubertal learning is less reliant on analytic ability (e.g. Ellis, 2002). Strictly speaking, the CLIL programme is not an immersion programme (see below). While activities are undertaken in English, these activities relate to the learning of the L2, although some teaching materials are what we might call content-based; i.e. subject matter of general educational value is included. In other words, this programme is comparable to the ‘intensive English programmes’ in Canada (see, e.g. Netten & Germain, 2004), albeit with considerably fewer hours of instruction per week. However, the strong focus on meaning in comprehensible input and the communication of authentic messages call to mind the main goal of immersion programmes, although admittedly in immersion programmes the focus is on attaining subject knowledge and skills. In any event, EFL in Switzerland does not follow a strict protocol. English may be the central focus of the lesson, but the teacher is free to incorporate it into, or combine it with, other subjects or conduct classroom business in the FL. As teacher training is also a cantonal responsibility, it varies considerably from canton to canton. Initially, many Swiss elementary schools did not have classroom teachers who were trained to teach English, which meant that the teachers had to undergo extensive training. The Zurich Ministry of Education specifies that they must all reach a highly proficient level; in this connection, they must (a) pass a language test (equivalent to the Cambridge Advanced Certificate); (b) take a class in didactics/methodology; and (c) spend some time abroad in an Englishspeaking country as an assistant primary school teacher. While there is no option to hire part-time instructors (e.g. native L2 speakers or trained English teachers) locally, if a teacher does not wish to teach an FL, there is the possibility of either cutting back on their teaching load or trading subjects with fellow teachers.

Different types of provision of foreign language teaching

Around the same time as the Swiss Conference of Education Directors recommended to the cantons what is called Model 3/5, where the onset of the first FL is at the latest in Grade 3 and the second FL in Grade 5 of primary school, individual schools, and then cantons, also started to implement so-called late immersion programmes – the Swiss term for CLIL – in academically oriented secondary schools. For this reason, we decided to include a group of such CLIL students in our analyses to be able to investigate instruction effects (see Chapter 2). Note that in this book, the notion of CLIL will be used as a cover term for both CLIL and immersion, following Mehisto et al. (2008; see also Chapters 2 and 7).

Mapping the Terrain

23

Although these programmes were not introduced on a large scale at first and there were (and still are) sizeable regional and cantonal differences, the role of immersion instruction or CLIL has significantly increased since 1995 (Hutterli, 2012: 29). CLIL is ‘a dual-focused educational approach in which an additional language is used for the learning and teaching of both content and language’ (Coyle et al., 2010: 1). The introduction of CLIL in Switzerland reflected a general aspiration in Europe for providing students with enhanced opportunities in school to acquire competence in additional languages (see Marsh, 2002). Since then, the very positive associations of CLIL (e.g. its perceived success and effectiveness) have attracted researchers, administrators, teacher educators and teachers, particularly those in the field of English as an L2/FL (Cenoz et al., 2014: 247). Zurich schools for secondary education are relatively free to design their immersion curricula. Late immersion programmes usually start in lower secondary education (Grades 1–4, age group 12–15), and consist of three content subjects (e.g. mathematics, biology and history) taught through the FL (usually English). The intent is to maximise the quantity of comprehensible input and purposeful use of English, in line with Swain’s (1985) Output Hypothesis and Long’s (1981) Interaction Hypothesis (see Chapter 7 for a discussion of the wide scope of experiences encompassed by immersion type provision). Additionally, English is taught formally as a separate school subject. Thus, learners experience a combination of formal learning and content-based learning, which offers them what seems to be an ideal opportunity to learn an FL in a classroom: a combination of explicit learning, or ‘focus on form’, and implicit learning, or ‘focus on meaning’, to use Long and Robinson’s (1998) terms. It is important to note that the Swiss education system does not automatically provide continuity in the immersion programme for students moving from primary to secondary levels. Students may voluntarily opt for immersion instruction. As the demand for a place in an immersion class is usually larger than the actual number of places available, a student’s average school grade functions as a criterion in deciding who can join the programme and who cannot. Thus, we had to make sure in this study that the immersion students did not have significantly better grades in English than students in regular EFL programmes before they entered the programme, and we had to factor motivation into our model, as these students might have been significantly more motivated to study English than the students who did not sign up for an immersion programme (see also Bürgi, 2007: 79; Elmiger et al., 2010: 70). Owing to the higher amount and intensity of exposure to the FL, on the one hand, and the opportunities for engaging in authentic and meaningful interaction in real-life contexts, on the other, immersion students have traditionally been found to be highly successful in comparison with students who have received regular FL instruction – that is, instruction

24

Beyond Age Effects in Instructional L2 Learning

that focuses primarily on language instruction and is restricted to separate, limited periods of time or so-called ‘minimal input’ of no more than four hours of instruction per week (Larson-Hall, 2008: 36). However, thus far, the L2 learning context has rarely been included as an important factor in the discussion of age effects (Muñoz, 2006b: 6), even though the type of instruction that learners receive plays a decisive role in formal instructional settings, as it determines, for instance, the quality and quantity of input the students encounter and the variety and amount of practice opportunities they receive. It appears that age effects may not only differ according to whether the learning context provides learners with unlimited exposure to the TL (as in naturalistic language learning settings) but also whether exposure to the language is limited to a great extent (as in FL learning settings) or to some extent (as in school immersion settings) (Llanes & Muñoz, 2013).

Conclusion This chapter highlights the interest in age as an explanatory factor of differences in FL learning in SLA research and FL pedagogy, the importance of multilingualism in general, particularly in education, and provides some background information about the linguistic and curricular landscape in Switzerland. As we have seen in this chapter, despite the consistent findings of large-scale, longitudinal classroom studies, the evidence against an advantage accruing from an earlier start in an instructional setting is far from forming the basis of accepted wisdom. Because of the focus of the present work, it fills an important gap in age research in that it highlights the importance of factors other than starting age that contribute to success in FL learning (or lack thereof), such as issues concerning rate of acquisition, streaming and continuity of education (Chapter 4); socio-affective and contextual factors (Chapter 5); the roles of L1 literacy skills and previously learned languages (Chapter 6); quality, quantity and intensity of instruction in primary and secondary school (with a special focus on immersion programmes, see Chapter 7); and overt political and economic dimensions that have been used to justify ignoring the arguments of SLA researchers (Chapter 8).

Note (1)

Note that our studies show that, naturally, we also find a clear discrepancy between low-proficiency and high-proficiency FL learners as well as more motivated and less motivated students in this population.

2 The Current Empirical Study

Participants and Research Design: Four Independent Samples As briefly mentioned in Chapter 1, the data in question were collected during seven academic years when two curricula coexisted in Switzerland and we were able to test early and late starters of English as a foreign language (EFL) who were in the same secondary schools with the same teachers and teaching materials. We recruited three different groups: 100 early learners of EFL and 100 late learners of EFL for the focal group (longitudinal design); 100 immersion students (50 early and 50 late starters) to test for instruction effects; and 100 students who belonged to the fifth generation of early learners in Switzerland. Note that simultaneous and sequential bilinguals (n=436) were excluded from the analyses here as they were treated separately (see Pfenninger, in prep.). In the following, we are going to describe each group in detail.

Characteristics of the focal group

Two groups of subjects formed the focal group in the longitudinal part of this study: a group of 100 learners of EFL with an age of onset (AO) of 8 (early classroom learners [ECLs]) and a group of 100 learners with an AO of 13 (late classroom learners [LCLs]). Of the 200 participants, 103 were female and 97 male. They were clustered in five state schools in 12 classes, ranging in size from 9 to 22 members. Table 2.1 displays information about the schools and classes, while Table 2.2 shows information about the subjects. One of the four schools was in a suburban area, while the others were in urban school districts. The first test series was administered after six months of EFL in secondary school, that is, after 440 hours of instruction (ECLs) and 50 hours of instruction (LCLs) respectively. The second data collection took place five years (680 hours) later. At no point were early starters mixed with late starters in the same class. This presents the advantage of there being no blurring effect of mixing ECLs and LCLs in the same class. In a mixed class, the teacher has only few options to support and enhance the specific foreign language (FL) performance and proficiency of individuals in such a heterogeneous group. This may lead to boredom, demotivation and eventually stagnation of knowledge on the part of the ECLs and/or overextension, pressure and inferiority complexes on the part of the LCLs. In any case, it 25

26

Beyond Age Effects in Instructional L2 Learning

Table 2.1 Nesting structure at Time 2 School ID 1

2

3 4 5 Total

5

Class ID

AO

Class size

1

Early

10

8

Late

18

9

Late

21

3

Early

19

4

Early

15

10

Late

12

6

Early

16

7

Late

14

2

Early

21

11

Late

18

5

Early

19

12

Late

17

12

200

is certainly not an optimal learning environment. Given that they had the same biological age, both groups can be taken to have had attained broadly the same state of neurological and cognitive development and the same level of first language (L1) proficiency. Thus, neither learner group can be said to have been characterised by cognitive advantages, which is imperative in a study where test-taking is the main measure. However, crucially, they differed in terms of their age of English instruction onset (AO=8 vs AO=13) and, therefore, in terms of length of instruction. As subjects whose language learning begins later than 12 years AO are traditionally considered late learners (e.g. Birdsong, 2006: 27), the distinction between 8 years AO and 13 years AO is appropriate on this early/late criterion. Table 2.2 Focal subjects participating in the study

Group

Number of subjects

Age at time of testing (mean)

ECL1 LCL1 ECL2 LCL2

100 100 100 100

13–14 (13;8) 13–14 (13;4) 18–19 (18;8) 18–19 (18;9)

Age of onset

Length of instruction (years)

Length of instruction (hours)

8–9 13–14 8–9 1–14

5.5 0.5 10.5 5.5

440 50 1170 730

Note: ECL1=early classroom learners at Time 1; ECL 2=early classroom learners at Time 2; LCL1=late classroom learners at Time 1; LCL 2=late classroom learners at Time 2.

The Current Empirical Study

27

The design aims to do justice to the real school situation in Switzerland. According to official policies, the hope is that an earlier start and a longer period of instruction will yield more proficient FL learners by the time they graduate from high school (that is, at age 18). The biological age at graduation is the only variable that does not change in the new education system. As Muñoz (2008) and Birdsong (2006) rightly point out, the age of first exposure is difficult to define, as it may occur in the schooling environment at the beginning of FL instruction, or, especially perhaps in the case of English as the FL, on the occasion of visits to the second language (L2) country, during contact with relatives who are L2 speakers, as a result of input from the media, etc. Therefore, all students completed a five-page questionnaire detailing their experience with English both in school and outside school; hours studying English in elementary school, secondary school, immersion programmes and/or regular L2 classes; and their attitudes towards the language. Students who had received additional instruction in English or had been systematically exposed to English outside school were excluded from the sample (the initial sample comprised 296 students). Admittedly, there could be some error in this measurement, as students might not have accurately recalled events from the previous 6–11 years. In any case, however, it can be assumed with a reasonable degree of confidence that before the age of eight, Swiss children are not normally exposed to the English language to a significant extent in their free time (e.g. by watching English-language TV). As mentioned in Chapter 1, all of the participants went to the same type of state school, that is, to a typical academically oriented, highlevel secondary school (university preparatory school) in the canton of Zurich. This is also perhaps the main limitation of our study: students in Switzerland are screened at the end of primary school in Grade 6, at which time they are placed in different school tracks on the basis of the results of such screening. In order to attend an academically oriented high school, they have to pass an entrance test and a six-month probation period. Thus, the participants of this study are representative of many but certainly not all students in Switzerland (see discussion in Chapter 1). Another important point needs to be borne in mind. The students came into the secondary school programmes from different primary schools in the canton of Zurich, and even though we can assume that they had all experienced a similar broadly communicative approach in English class in Grades 2–6, the exact nature of their prior classroom exposure is unknown. We thus had to rely in this regard on the students’ self-reports in the biodata questionnaire. The learners of the focal group all speak the Zurich standard variety of Swiss German, which is one of the largest in Switzerland and, like almost all the varieties gathered together under the heading of Swiss German, belongs to the High Alemannic grouping (see Chapter 1). At the first measurement in 2009, all of them had received instruction in Standard German for 6.5 years1 and had had French as a school subject for 2.5 years. In the case of the

28

Beyond Age Effects in Instructional L2 Learning

latter, they had learned it for two years in primary school (two 45-minute classes a week) and for six months in secondary school (three 45-minute classes). This means that for the ECLs, English represents the second FL to be learned at school (that is, in primary school), while for the LCLs, it is the third L2 (or rather the third language [L3]). Finally, the students met the following criteria: • • • •



They were not bilingual or highly proficient speakers of English (as selfreported and/or reported by teachers). They were present at both measurement times, although not every student completed each task, which was taken into account in the statistical analysis (see below). They had not stayed outside of Switzerland for more than one month. They had ‘limited’ access to English outside the school (explained in Chapter 7): they reported living in homes with caregivers who spoke Swiss German and they had not attended an English-medium school at any point in their prior schooling. They had never had to repeat a grade in their schooling.

Characteristics of the CLIL student sample

In order to extract a sample of English immersion students for our investigation of the interaction between starting age and type of instruction, we recruited 100 students who were enrolled in two secondary schools that offer a type of English programme designated as ‘late partial immersion’ (henceforth CLIL for content and language integrated learning). This programme provides students with significant exposure to English in an educational setting, but is less than the amount of exposure to English that students receive in a full immersion programme (see Chapter 1). The CLIL group had similar characteristics to the focal group at the second data collection time: they were in Grade 12 English classes in the state system, they were between 17 and 20 years old (mean 18;9), they came from similar socioeconomic backgrounds, they did not take any private classes in English outside school and they were based in the same secondary schools from which we took our focal group, which meant that they had the same EFL teachers as the NON- CLIL students. It was decided to divide them into four groups according to AO and learning constellation in primary and secondary school instead of using correlational analysis, as described above. The division was as follows: 50 early starters who attended an immersion (CLIL) programme in primary school and who continued CLIL in secondary school (EARLY CLIL); 50 early starters who followed the same primary school programme but then received regular EFL instruction after primary school (EARLY NON- CLIL); and 50 who were late starters who began learning English in an immersion programme in secondary school (LATE CLIL), while the other 50 late starters attended a regular EFL programme in secondary school (LATE NON- CLIL).

The Current Empirical Study

29

The partial immersion programmes that the EARLY CLIL and LATE CLIL attended in secondary school consisted of three content subjects taught through the FL (English). Additionally, English was taught formally as a separate school subject. Like many other intensive programmes in Europe, the schools participating in this research imposed selection criteria based on previous academic performance (see discussion in Chapter 1). Overall, the EARLY CLIL group spent an average of 1770 hours learning English from Grade 2 to Grade 12, followed by the LATE CLIL with 1330 hours, the EARLY NON-CLIL with 1170 hours and the LATE NON-CLIL with 730 hours. Other recent studies of maturational effects in a classroom have used shorter periods (from 600 to 800 hours) in their longest-term comparisons (e.g. García Mayo & García Lecumberri, 2003; Larson-Hall, 2008; Muñoz, 2006a).

Five years later: Characteristics of the control group

Finally, an additional control group of 102 early starters in Grade 7 was recruited in 2014. They were in five classes in two secondary school state schools (starting age of EFL: 8; mean biological age at testing 13;4, range 12–14). The purpose of this control group was two-fold: their English skills were investigated at the beginning of secondary school (1) in order to be able to obtain a realistic picture of the benefits of early FL programmes some years after their introduction (this part of the analysis can thus be considered as a follow-up study) and (2) to be able to analyse the importance of grades in the early FL classroom. They differed from the ECLs in the focal group in that they belonged to the fifth cohort of early English learners in the canton of Zurich (as opposed to the ECLs, who were part of the first generation) and their FL performance had been graded in primary school.2 Thus, this follow-up part was conducted to evaluate the long-term effects of the new early EFL programme as well as to analyse the impact of grades on the learning outcome; the results of both analyses will be presented in Chapter 4. Goal-setting is known to be a powerful motivator that might enhance intrinsic interest, and grades and tests have been found to function as ‘proximal subgoals and markers of progress that provide immediate incentive, self-inducements, and feedback and that help mobilize and maintain effort’ (Dörnyei, 1994: 276). The perceived likelihood of success and reward is thus thought of as constituting an important part in the concept of motivation (Moyer, 2004). Nikolov (1999: 46) puts it thus: ‘achievements represented by good grades, rewards and language knowledge all serve as motivating forces: children feel successful and this feeling generates the need for further success’. In Nikolov’s study, grades seemed to be very important for her participants (ages 6–14); extrinsic motives constituted one of four main areas of motivation mentioned most frequently in students’ answers to open-ended questions. However, the status of school grades is still fiercely debated, particularly in respect of their potential to exercise pressure on students.

30

Beyond Age Effects in Instructional L2 Learning

Tasks and Procedure In order to reliably determine the end state of the learners tested here, we need convergent evidence from multiple elicitation methods that have proven to show age-related differences in previous research (notably Llanes & Muñoz, 2013; Muñoz, 2006a). It is also important to include measures that have been shown to be generally related to IQ scores (e.g. literacy-related skills, see Genesee, 1976) as well as others that have not (e.g. listening comprehension skills, see Ekstrand, 1977). All tasks, described in detail below, were piloted in two classes of intermediate learners (n=54, mean age 15;9) in June and August 2008 and – if necessary – revised in the light of the piloting results in order to elicit the intended kind of responses and to obtain the appropriate difficulty level for the students concerned and ensure sufficient validity and reliability coefficients. To attain the above-mentioned aims, nine longitudinal measures and three post-test measures were included (see Table 2.3). Table 2.3 Measures

Two standardised listening comprehension tasks (level CAE) Oral tasks (retelling task, spot-the-difference task) Productive vocabulary test: Productive Vocabulary Size Test by Laufer and Nation (1999) Receptive vocabulary test: Academic sections in Schmitt et al.’s (2001) Versions A and B of Nation’s Vocabulary Levels Test English argumentative essay on the pros and cons of (reality TV) talent shows English narrative essay: retelling of a silent movie German language experience essay Grammaticality judgment task including 49 items and 15 distractors (reliability coefficient [KR-20] 0.90 for grammatical items and 0.95 for ungrammatical items) Motivation questionnaire (28 closed-ended items) Strategies questionnaire (15 closed-ended items) Three open-ended questions enquiring about (1) participants’ experiences with immersion vs regular instruction; (2) feelings of being overburdened by the multiple foreign language requirement; and (3) the transition from foreign language learning in primary school to secondary school Biodata questionnaire (demographic variables)

Time 1

Time 2

X

X X X

X

X

X X X

X X X

X X X

X X X

X X

X X

The Current Empirical Study

31

Some of the tasks (listening comprehension task, productive vocabulary test, questions enquiring about past experiences in secondary school) could only be administered once, at Time 2, because the pilot testing showed that the listening comprehension task and the productive vocabulary test might have been appropriate for the students at Time 2, but they would have shown floor effects at Time 1. The tasks assessed the following skills: productive and receptive vocabulary size (oral and written), grammaticality judgments, listening comprehension, oral and written fluency, syntactic complexity and morphosyntactic accuracy, motivation, learner beliefs and strategies, to be described in more detail in what follows.

English listening comprehension

We chose two listening comprehension tests that included a part of a radio discussion and part of an interview, which had been aligned against Level B2/C1 in the Common European Framework of Reference for Languages (CEFR). The participants had to demonstrate a wide range of listening skills needed for real-life purposes, such as understanding the gist of an extract, understanding specific information or understanding the speakers’ opinion, attitude or feeling. They could score a maximum of 16 points.

English oral proficiency

In the oral retelling task, subjects were asked to tell the researcher what happened in a silent film they had previously watched without the researcher. We chose The Triplets of Belleville (2003) because (a) it lacks dialogue, the majority of the film story being told through song and pantomime; (b) it was expected that the content would be accessible to high school students due to its cartoon-like nature; and (c) the plot is relatively simple yet contains significant details (e.g. its underlying black humour), which means its retelling constitutes a challenging task for advanced learners as well as for beginners. This task elicited ample language and ensured that the learners were obliged to use a certain predetermined set of vocabulary. The participants were not allowed any planning time. The second oral task, the spot-the-difference task, paired subjects, each pair interacting with each other to find the divergences between pictures that were different in predetermined ways. This task was used to elicit examples of existential and locative constructions as well as of yes/no questions and wh-questions. Following Gass and Mackey (2007: 112ff.), among others, participants were told how many differences there were so that they had a goal to work towards. The picture that was used was adopted from Gass and Mackey (2007: 113–114); it depicts a kitchen scene and involves basic household vocabulary that is appropriate for a near-beginner level up until, and including, upper-intermediate levels.

32

Beyond Age Effects in Instructional L2 Learning

English receptive and productive vocabulary

Vocabulary size was assessed through the Academic sections in Schmitt et al.’s (2001) Versions A and B of Nation’s Vocabulary Levels Test, which includes academic words from the Academic Word List (AWL; Coxhead, 2000), fitting into a broad range between the 2,000 level and the 10,000 level (Schmitt et al., 2001: 68). The receptive vocabulary task included 60 items in total. As this test does not provide direct information about the ability to use the target words productively (Schmitt et al., 2001: 62), it was decided to add the Productive Vocabulary Size Test by Laufer and Nation (1999), which gives some indication of the size of productive mastery, as readers are required to supply the appropriate missing words in a cloze test with short contexts. Muñoz (2006b: 19) suggests that such tests are cognitively demanding, as they require understanding of a text and readers have to draw on their pragmatic knowledge as well as grammatical, lexical and contextual knowledge. The productive task included 54 items (18 items each at the 2000-word level, 3000-word level and 5000-word level).

English grammaticality judgments

In the grammaticality judgment task (GJ task), the learners had to make some kind of metalinguistic assessment regarding the grammaticality of a mixed set of sentences, some grammatical and some ungrammatical. Multiple-choice test results are prone to include a high risk of chance-level errors. Thus, in order to enhance the sensitivity of grammaticality judgments, subjects were asked to correct ungrammatical sentences, instead of simply stating whether they believed a certain sentence to be grammatical or not. Gass and Mackey (2007: 94) emphasise the importance of these corrections because the learners’ grammar can be non-native-like in many ways and we cannot be sure what they think is incorrect about the sentences they label as ‘ungrammatical’. Thus, the participants were asked to judge each sentence as grammatical or ungrammatical and then immediately make corrections if the sentence had been judged ungrammatical. However, this procedure bears some risks as well, as some subjects, especially at a low proficiency level, might be unable to make a judgment or to describe the judgment explicitly, and thus are more likely to pronounce something as correct when their ‘true’ intuition says otherwise (Amma, 2004: 2). To solve the problem of the possible lack of confidence of the learners, we favoured a three-point decision to allow for an ‘I don’t know’ choice. The GJ task has been claimed to be a reliable and valid instrument in critical period studies (e.g. DeKeyser, 2000; García Mayo, 2003; LarsonHall, 2008; McDonald, 2006). The instrument used here was a version of McDonald’s (2006) test of basic English morphosyntax – adapted and used by Pfenninger (2011, 2012, 2013a, 2013b, 2014a, 2014b) – and included 49 items and 15 distractors designed to test judgments on the linguistic forms and structures listed in Table 2.4.

The Current Empirical Study

33

Table 2.4 Grammaticality items in the GJ task Syntactic structures (total 25 items) Declaratives (main and sub clauses) with lexical verbs and auxiliaries Adverb placement Negation (see Pfenninger, 2013b) Yes/no questions (see Pfenninger, 2013b) Wh-questions (see Pfenninger, 2013b) Morphological forms (total 24 items) Articles (see Pfenninger, 2012) Regular past tense (see Pfenninger, 2011) Regular plural (see Pfenninger, 2011; Pfenninger & Singleton, 2016b) Third person sg. (see Pfenninger, 2011) Total GJ items

4 items 8 items 5 items 4 items 4 items 6 items 6 items 6 items 6 items 49 test items (+15 distracters=64)

Sentences were constructed in such a way that they were grammatical or ungrammatical either in all the languages known by the learners, or in some languages but not others (for a list of all the items see Appendix in Chapter 6). This design allowed us to have some idea of the source of any crosslinguistic influence that emerged. The main advantages of a GJ task are the following: (a) as free production tasks (e.g. argumentative and narrative essays) involve the risk of avoidance of the use of linguistic forms around which there is uncertainty, it is important to be able to consult more controlled data sources; (b) the GJ task is a response task designed to measure the (subconscious) knowledge of the linguistic rules that are widely considered to constitute the learner’s internal grammar (see, e.g. García Mayo, 2003: 97); (c) it is more direct and economical than spontaneous speaking and writing tasks (Larson-Hall, 2008: 42); (d) the judgments may reflect information about implicit knowledge (Bialystok, 1981); and (e) the correction of errors reflects explicit, analysed knowledge that represents consciously held insights about language (Bialystok, 1981). In other words, while linguistic knowledge is unconscious, metalinguistic judgment knowledge can be very conscious. This explains why individuals can be very confident in their grammaticality decisions without knowing why. Finally, (f) in naturalistic settings, late learners have been found to experience more difficulty in their grammaticality judgments than early learners, as, according to a widespread view, ‘memory capacity, decoding ability and processing speed are deficient in late L2 learners’ (McDonald, 2006: 383; see also Larson-Hall, 2008); and (g) the written

34

Beyond Age Effects in Instructional L2 Learning

GJ task has the advantage over an auditory GJ task in that it avoids the problem of phonological decoding, which is difficult for many L2 learners in an instructed setting (Jiang, 2004: 608). For a discussion of the main drawbacks of the GJ task, the interested reader is referred to Gass and Selinker (2008), Menn and Bernstein Ratner (2000) and Tarone et al. (1994), among many others. The reliability coefficient (KR-20) obtained was 0.90 for grammatical items and 0.95 for ungrammatical items. In order to prevent the participants from drawing too heavily on their explicit L2 knowledge, the task was timed. The students had a maximum of 11 minutes to make their judgments (approx. 10 seconds per sentence).

English writing proficiency

In order to gather more naturalistic written data, where the learners would be concerned with the conveyance of meaning rather than the form of the message (Ellis & Barkhuizen, 2005: 23), participants were asked to write an English argumentative essay on the pros and cons of (reality TV) talent shows, a topic that was deemed suitable for adolescents and was found to elicit different semantic and syntactic contexts (see Pfenninger, 2011, 2012, 2013a). The prompt read as follows at the first data collection time: ‘Casting shows – Career opportunity or public humiliation?’, while it read ‘Casting shows – the fine line between exultation and human debasement’ at Time 2 (see Chapter 4 for a detailed description of the procedure). In addition, in the narrative essay, participants were asked to narrate what happened in the movie referred to above. The written retelling task was preceded by the oral retelling task without any prior warning or indication that the task was to be repeated – in a similar way to Bygate’s (2001) study, which compared performance on an unrehearsed oral task with performance on a second oral task which had been preceded by written output of the material. In both cases, the learners had the opportunity to build on their previous attempt at completing the task; i.e. they were likely to be able to take advantage of familiarity with the content and with the processes of formulating the meanings. The participants were given 20 minutes to write each composition and were asked to write a minimum of 200 words.

Learner beliefs and learning experiences/German writing proficiency

In order to give a fuller account of the interaction of AO and other (often hidden) variables such as motivation, attitudes and beliefs, we thought it essential to ask learners about their intended goals and approaches towards early/late FL learning in order to put their actual attainment in context. At both measurement times, we explored participants’ own perspectives on the age factor, on their intended goals and on their

The Current Empirical Study

35

learning experiences through reflective writing. We provided loose guidelines for the writing. These stated, ‘You should write about your feelings, thoughts, opinion, motivation as well as any experiences with regard to the early or late introduction of multiple foreign languages’. No specific length was set (see Chapter 5 for a more detailed description), but the participants were asked to write these essays in their language of literacy (German). As the participants had specific characteristics as experienced multilingual learners, their language learning awareness was considered likely to be relatively high. This composition was also included to compare the end state of EFL with an L2 baseline so as to be able to make predictions about the end-state proficiency based on L2 grammar. This enabled us to use the language of literacy as a yardstick of FL attainment for learners beginning at different ages and compute the predictive power of their German writing skills.

Foreign language learning motivation and learning strategies

Investigation of motivation towards learning the language and of FL learning strategies included looking at a combination of seven components of motivation as proposed and at two dimensions of strategy use: future L2 self-states, present L2 self-states, attitudes towards FLs in general, FL learning anxiety, parental encouragement, attitudes towards the learning situation, cultural interest and media, general L2 learning strategies and inferences/reliance vis-à-vis prior knowledge (e.g. associations with the L1). The questionnaire included 42 closed-ended items and 3 open-ended questions. The scales in the motivation questionnaire were validated by using the standard procedures of questionnaire design (Dörnyei, 2010), which typically involve a combination of content analysis and quantitative item analysis by means of computing Cronbach’s alpha internal consistency reliability coefficients (see Chapter 5). A five-point Likert scale was used for all categories to provide enough possibilities while avoiding confusion with the Swiss grading system, which scores 1–6. Some of these questions were adapted for the Swiss school context, a third of them were made negative and the resultant list was translated into German and randomised (see Appendix). The closed-ended questionnaire was chosen (a) in order to elicit responses that participants would be unlikely to produce spontaneously in the language experience essays or in answers to open questions; (b) to avoid obtaining vague answers or statements that might be uninterpretable afterwards (it is not an easy task for 13-year-olds to reflect on their language learning experiences); (c) to avoid misunderstandings and/or misinterpretations on the part of the learners; (d) to get responses even from learners who might lack confidence and, therefore, might be reluctant to describe their attitude towards English; and (e) to identify differences with respect to AO and biological age.

36

Beyond Age Effects in Instructional L2 Learning

Language background

Finally, an extensive biodata questionnaire was administered at both measurement times in order to collect biographical data and quantifiable information concerning participants’ German, English and French learning history, including questions about AO of instruction and quantity and type of input they received at the different levels of education, both formally and informally (out-of-school exposure, see Chapter 7). Some questions were added at Time 2, e.g. number of instructional hours throughout secondary school. At the first data collection time, when the participants were under 18 years old, parents’ consent was obtained to authorise the children’s involvement in the research.

Summary of tasks

In general, the tasks used in this study were designed to measure the knowledge that learners had gained through the English lessons. This knowledge is, to a large extent, declarative, e.g. vocabulary items or morphosyntactic rules (see Muñoz, 2006b: 17). The battery also involves tests that measure procedural knowledge acquired through practice in productive and receptive language skills, e.g. tests involving word recognition and grammaticality judgments. Finally, it is widely mentioned in the literature (e.g. Bygate, 2001; Foster & Skehan, 1996; Robinson et al., 2009; Skehan & Foster, 1999; Wigglesworth, 2001) that different task types differentially affect both the quality and quantity of linguistic output, i.e. a learner’s performance in terms of fluency, accuracy and complexity. Thus, task effects must be considered in an analysis of accuracy, fluency and complexity, as task demands push learners to perform tasks in certain ways, prioritising one or another aspect of language (cf. also Ellis & Barkhuizen, 2005: 143). Furthermore, there is a complex interaction between task type, cognitive load and planning time (Lennon, 1990; Wigglesworth, 2001). For instance, certain task types, such as certain narrative tasks, give rise to more fluent and complex output but less accurate performance, as fewer planning pauses or the use of more complicated syntax may involve the risk of more frequent error. On the other hand, personal information exchange tasks are said to elicit more accurate language but not more complex language. This trade-off effect between fluency, accuracy and complexity may result from differences in the perceived goals of the task (Wigglesworth, 2001). Thus, there are tasks (oral and written retelling, spot-the-difference) that are based on familiar information, where the retrieval of personally relevant information, which is well known to the participants, becomes the basis for completing the task. By using a variety of tasks, we hoped to be able to tease apart knowledge from task effects (cf. Klein & Martohardjono, 1999). Some tasks required interaction and a discourse style that led participants to alternate in who

The Current Empirical Study

37

holds the floor, compared to others where extended turns were required, with little need to interact other than listen and wait for one’s turn. The oral and written retelling tasks were identified as containing greater structure, which led to more fluent performance than tasks not structured in this way (e.g. argumentative essay). Some tasks required only straightforward outcomes, in which a simple decision had to be made (e.g. the spot-thedifference task), while others required multi-faceted judgments, which necessitated joint engagement with the ideas concerned (e.g. the discussion about talent shows). Four testing sessions of 45 minutes each were conducted with each class during regular class time. Some of the interviews took place outside regular classes, but all the tasks were administered in a controlled setting in school (e.g. in a distraction-free room). The effect of task-order was controlled for in that the order of tests was inversed for half the sample in all groups.

Methodology: Preparing, Coding and Tagging the Data At the broadest level, oral and written competence was measured in terms of fluency, lexical and syntactic complexity and morphosyntactic errors. Following Wolfe-Quintero et al. (1998), fluency in English and German was examined in terms of words per T-unit (W/TU), which is defined as one main clause and all of the dependent modifying clauses (Ellis & Barkhuizen, 2005). We should mention that words/T-unit is often also used as a complexity measure (see Bygate et al., 2001: 34). The measurement for fluency – or speed – in oral data is often marked as words per minute and can be unpruned or pruned. The former category entails the raw numbers of words [produced], including repetitions, reformulations, replacements, false starts and filled pauses (Gavin, 2014: 71). In the latter case, the ‘the additional material is removed, and the measure is of meaningful, contributing [words] per minute’ (Skehan, 2014: 20). Thus, after transcribing the speech samples, all the words were counted for each task, and then categorised into pruned and unpruned words, before calculating the speed rate per minute with the pruned words, as is listed in the tables. With respect to the interpretation and relevance of the study of (systematic) errors, we might add as a footnote that recently errors in written production made by FL learners have been characterised as evidence of progress in composition writing (see Lasagabaster & Doiz, 2003: 139). Oral and written syntactic complexity was examined in English and German using the clauses per T-unit (CL/TU) complexity ratio. In order to avoid over-estimating average sentence length in the written data, all proper names were replaced with name, all numbers were replaced with numb and all geographic names were replaced with place, as they normally vary greatly in

38

Beyond Age Effects in Instructional L2 Learning

length (see Penris & Verspoor, 2017); furthermore, enumerations exceeding three words were cut. In the oral data, the following vocalisations were deemed as not being part of a language’s vocabulary and were consequently removed: names (of people, regions, places, continents and countries), languages and nationalities, many of which are similar in German and English; feedback words (yea, yeah, ok, huh, mm) and voiced pausing (eh, uh/ uhm/um(m)); and, finally, fragments of words (Thur, archi, etc.). According to Penris and Verspoor (2017), length measures are accepted and reliable indicators of general syntactic complexity, but they do not indicate which elements have made the sentences more complex. For instance, these measures cannot tell us if longer sentences are due to the longer noun phrases or other non-finite constructions. Lexical complexity in the oral and written data was examined using Guiraud’s Index of Lexical Richness (GUI): word types divided by the square root of the word tokens. Accuracy (ERR/TU) was examined by counting (a) the number of misspellings (excluding ‘mechanical errors’ such as punctuation errors) and (b) the number of morphosyntactic errors per T-unit. According to James (1998), misspellings (or spelling errors) occur when the rules that determine how a given phoneme is to be represented in writing are broken. Such faulty grapheme to phoneme conversion occurs when L2 learners apply their L1 rules to the L2 or when they misapply the phonological rules of the L2 (James & Klein, 1994). In this study, we focused on misspellings that are due to lexical transfer (e.g. braun for brown; see Chapter 5 for a description of the analysis of crosslinguistic influence). Morphosyntactic errors included omission (e.g. he love singing), overuse (e.g. she cans again drives), substitution (e.g. many people singing), (ir)regularisation (e.g. he taked), misformation (e.g. he get’s), random misorderings3 (e.g. singer bad), systematic misorderings (e.g. When people good things make) and ‘other’ (e.g. agreement errors, as in I doesn’t like this) (see McDonald & Roussel, 2010; Pfenninger, 2011, 2014b). The reason why the focus is on absolute morphosyntactic abilities in this study is that (a) morphosyntax is one of the more reliable measures of FL proficiency, particularly with respect to predicting writing scores (Schoonen et al., 2011); (b) measures of syntax, morphology and literacyrelated skills assess a cognitive dimension of language proficiency, in contrast to basic interpersonal communicative skills (Cummins, 1981: 133); (c) L2 morphosyntax seems to be more vulnerable to processing difficulties than L2 lexico-semantics (independent of the L1) and therefore more susceptible to age (for a review, see Granena & Long, 2013); and (d) the receptive mode, which often constitutes late starters’ primary use of English, might make them aware of English vocabulary, but probably less so of English grammar, which is less salient in language input (Schoonen et al., 2011: 70). Similarly, code-switching and word-internal code-mixing can also shed light on the (possibly different) ways of learning of early vs

The Current Empirical Study

39

late starters (e.g. Celaya & Navés, 2009; Celaya et al., 2001; Pérez-Vidal et al., 2000), even though research on the relationship between the introduction of FL teaching at different ages and code-switching/mixing is still scarce (but see, e.g. Agustín Llach, 2011; Celaya, 2006) and tends to be limited to comparisons of different age groups, rather than different AO groups. For the holistic evaluation of the English and German essays (narrative and argumentative essays, learner experience essay), we partly followed Jacobs et al.’s (1981) scale, which, according to Lasagabaster and Doiz (2003: 140), requires two evaluators and considers the communicative effect of the speaker’s linguistic production on the receptor and, therefore, comes close to the main objective of the process of language acquisition, namely interpersonal communication. Our evaluation system consists of two criteria which measure different aspects of written production (Lasagabaster & Doiz, 2003: 142–143): (1) Content (30 points): This category considers the development and comprehension of the topic as well as the adequacy of the content of the text. (2) Organisation (20 points): Several factors are considered here, namely the organisation of ideas, the structure and cohesion of paragraphs and the clarity of exposition of the main and secondary ideas. The results for each of the criteria were added and the maximum score was 50. The final score was the average of the total points assigned by each of two independent evaluators (i.e. the two authors). The inter-rater correlation (Pearson correlation coefficient) for the content subscore was 0.82; the organisation subscore 0.89; and the total score 0.90. Note that it was decided to include only two holistic measures, as previous authors have questioned the reliability and informativeness of holistic rating of compositions (for a discussion of this, see Torras et al., 2006: 157ff). In the GJ task, if the errors were either not spotted or if they were correctly identified but not corrected properly, 0 points were awarded, as, e.g. in the correction of *When has he a break? (item 44/41) to When he has a break? (07_LL6_M_GJT). Invariant do-support (e.g. invariant do instead of does) was counted as 0.5 error, as in Do the teacher drive a fancy car? (07_ EL77_F_GJT) as a correction of *Drives the teacher a fancy car? (item 12/2).

Notes (1)

(2) (3)

In Switzerland, all school subjects – and thus all participants in this study – are taught in Standard German. It was not until 2007/2008 that foreign languages were officially graded in Swiss elementary schools. For a discussion of random vs systematic misorderings, see Ellis and Barkhuizen (2005).

3 Age and (Statistical) Analysis

Benefits of Multilevel Modelling for Age-Related Research The purpose of this chapter is two-fold: first, to make a case for alternatives to conventional quantitative research methodologies in agerelated research; and second, to outline the main benefits of multilevel modelling (MLM) for our study as we see them. In recent years, the general linear model (a family of statistical models that assumes a normal distribution, among other features, e.g. t-tests, analysis of variance [ANOVA] or multiple regression models, see Cohen [1968] and Plonsky [2013] – not to be confused with generalised linear models!) has been critically evaluated, and numerous limitations of empirical efforts in second language acquisition (SLA) have been documented (e.g. Norris & Ortega, 2000; Oswald & Plonsky, 2010; Plonsky, 2011; Plonsky & Gass, 2011; Plonsky & Oswald, 2015). In her meta-analysis of 524 empirical articles, Lazaraton (2005: 214) pointed out three main problems with the paradigm shift from qualitative to quantitative approaches that have been taking place in applied linguistics in general over the last few decades: (1) the general linear model is used (she fears) ‘in violation of at least some of the assumptions of the procedure’ (see also de Bot, 2011: 125), such as the inclusion of correlated data in linear models; (2) a great deal of the research becomes obscure for all but the most statistically literate; and (3) using high-powered parametric procedures may tempt one to overgeneralise results to other contexts or to other language users, when, in fact, many research designs do not use random selection from a population or random assignment to groups, but rather employ selected intact groups of a very limited demographic profile. Lazaraton (2005: 209) concludes that the whole discipline of applied linguistics seems to be struggling with a redefinition of its research goals, methods and paradigms, as can be seen in terms of the growing reversion to qualitative methods, increasingly pointed questions about the importance of research in this field and the continuing exploration of alternatives. Since Lazaraton’s article, applied linguistics has seen a steady increase in sophisticated statistical tests as well as a multiplication of the range of tests used. These recent advances in statistical techniques have not been confined to age-related research. For instance, multilevel models (a subclass of linear mixed-effects regression models) have slowly found their way into a variety of subfields of SLA. Research on the age factor has recently – but tentatively – begun to adopt these kinds of analyses (see Baayen et al., 40

Age and (Statistical) Analysis

41

2008; Jaeger, 2008; Pfenninger, 2016; Pfenninger & Singleton, 2016a, 2016b). The models in question were developed for precisely such situations as the one we encounter in this study, where (1) both participants and items constitute samples from larger populations (from the population of all possible participants and all possible discourses, respectively); (2) measurements within and between sampled students are nested in a hierarchical fashion within 12 classes within 5 schools, and it is possible, and indeed likely, that students from the same class or school perform more similarly to each other than do students from different classes or schools (see Hedges & Hedberg, 2007); (3) the participants are tested on a series of tasks and the same tasks are tested on a series of participants; and, finally, (4) as in most research involving longitudinal studies, a prominent feature is missing responses to individual items within a test and missing overall test scores (see Schoonen et al., 2011: 47ff). In such a research design, it is clearly not desirable to use ANOVA models (i.e. the general linear model), as they neglect both participant and item variability, they force the researcher to discard all results on any subject with even a single missing measurement and they are based on the independence assumption, which means they stipulate that multiple responses from the same subject have to be regarded as independent from each other (Seltman, 2009: 357). This chapter discusses the benefits that MLM can furnish to any SLA research that involves the sampling of populations, within educational establishments or naturalistic settings, and that has a particular focus on chronological age and the age of onset (AO) of acquisition.

Solving the Generalisability Issue Generalisability, a crucial concept in quantitative research, is understood in a number of different ways in the literature. Most often, the term is used to denote statistical, sampling-based generalisability, which refers (1) to the inferential link between an observation and an interpretation (internal validity) or (2) to an inference from the observation of a single case to a larger group (external validity) (e.g. Bachmann, 2006). As age researchers, we will want to assess whether the age effect generalises beyond the participants sampled to the wider population; i.e. we will want to test if results generalise both to the wider population of learners and the wider population of linguistic materials (see Cunnings & Finlayson, 2015). It is well-known that theory may never be scientifically generalized to a setting where it has not yet been empirically tested and confirmed. An increase in sample size leads to an increase in the generalisability of one sample to other samples that the same sampling procedure would produce; it does NOT, however, lead to an increase in the generalizability of any sample estimate to its corresponding population characteristic (Lee & Baskerville, 2003). Nevertheless, according to Duff (2006: 68) ‘it is commonly accepted that quantitative research, with appropriate sampling (random selection, large numbers, etc.), research design (e.g. counter-balancing of treatments,

42

Beyond Age Effects in Instructional L2 Learning

ideally with a control group, pre-post measures, and careful testing and coding procedures), and inferential statistics where appropriate, has the potential to yield generalizable results’. This section focuses on external validity, i.e. generalisability referring to the applicability of findings on the sample level to the population from which the sample was drawn. In the subsequent section, we will then take a closer look at internal validity. Compared to the situation with reference to qualitative research, scholars in the quantitative tradition usually define their scope more broadly and seek to make generalisations about large numbers of cases.1 This idiosyncrasy of quantitative studies has led researchers precisely to critique the findings obtained through quantitative methods. SLA research this past decade has evinced an increased propensity to carefully considering constraints on the statistical generalisability of results, particularly with respect to design features (e.g. group size, randomisation, control groups) and data analyses.

Sampling in age-related research

Sample sizes are a design feature discussed critically in the SLA literature in general, as small samples have a debilitating effect on statistical power. As early as 1980, Long (1980) cautioned that [b]ecause of the use of sampling procedures and large numbers of subjects, it is considered justifiable to generalize findings obtained in such settings to the population of the original, more complex ones from which those studied were drawn. Yet, by definition, what has been studied is different from and simpler than the real world. And just how many random samples of classrooms are needed before one can generalize safely? When dealing with human behavior, what level of difference in observed phenomena may be considered to be due to chance as opposed to the effect of the variables manipulated? (Long, 1980: 28) Despite the difficulties that come with using large sample sizes in formal, instructional settings (e.g. getting human subject committees to approve the relevant research, teachers to cooperate and students to participate), age-related research can pride itself on numerous classroom studies with relatively large sample sizes (e.g. the BAF project with over 2000 participants [Muñoz, 2006b]; the ELLiE project with 1200 children [e.g. Mihaljević Djigunović, 2012]; the FLIPP with almost 200 participants [e.g. Unsworth et al., 2014]; the Zagreb Project with approx. 3000 children [e.g. Vilke & Vrohvac, 1993, 1995]) – although even the largest data base is incomplete in that not every instance of language use has been recorded and is computer-searchable. With regard to the nature of the sampled groups, earlier age-related studies (e.g. Burstall et al., 1975) portrayed situations in which late-starting students and early-starting students were mixed up at some point in the

Age and (Statistical) Analysis

43

same classroom, which may result in a levelling-down effect on the early starters. In this area, however, major improvements have taken place (see, e.g. Muñoz, 2014b). The fact that the typical quantitative age-related study involves participants who began acquiring the target language at different ages poses a challenge for the future of age-related research: since the 1990s, throughout Europe and indeed across the world, there has been an accelerating trend towards the introduction of additional languages into primary-level curricula. This means that in many European countries it will soon no longer be possible to have late starting control groups, which is a vital component in the research design of studies like ours.

Research designs in age-related research

Even by using several complementary data sources, and by ensuring the validity of the instruments, it is still often suggested that any conclusion must be provisional until and unless the study is replicated or followed up with different populations and in different contexts. Plonsky and Oswald (2015) advocate such a replication-based approach in order to move theory forward – perhaps less rapidly but with greater accuracy. Although dynamic systems research has recently challenged the idea that the results of a classroom study can be replicated in the first place (see Larsen-Freeman, 2016), age-related research has benefited immensely from having other researchers return to previous key work in the area in order for approximate or conceptual replications to be carried out. Replication research can contribute to generalisability in this area by retesting and re-examining hypotheses about age-related outcomes, examining the validity of previous findings across different learner populations who learn the second language (L2) under different circumstances, different target languages, the susceptibility of different linguistic structures to age, etc. DeKeyser (2000) replicated Johnson and Newport’s (1989) landmark study, which has been considered by many researchers to have provided decisive support for a sensitive period in SLA. On the basis of their results, Johnson and Newport claimed that there was no difference in proficiency between those who arrived before the age of 7 and native speakers, but that there was a maturational decline in ability from 7 to about 15 years of age: they found a negative linear correlation (r=–0.87) between age of arrival and performance for the young arrivals but not for the older ones (r=–0.16). DeKeyser used a different population and improved the materials by making some changes to the items and the length of the grammaticality test. (Due to these modifications, Polio [2012: 65] classifies DeKeyser’s study as an ‘approximate/instrumental replication’ rather than a full replication.) DeKeyser obtained remarkably similar results, but in addition, he was able to suggest an explanation for why certain learners and certain structures appear to be exceptions to the critical period effect. DeKeyser’s (2000) replication demonstrated that replication can be a way for researchers to,

44

Beyond Age Effects in Instructional L2 Learning

in a sense, ‘methodologically clean up other studies by eliminating small problems in a study’s design’ (Polio, 2012: 65). Birdsong and Molis’s (2001) study was a strict replication of Johnson and Newport’s, but in this case, the subjects were Spanish native speakers (n=61). While the strong linear relationship between age of acquisition (AoA) and accuracy (r=−0.77, p