Intelligent Tutoring Systems: 16th International Conference, ITS 2020, Athens, Greece, June 8–12, 2020, Proceedings [1st ed.] 9783030496623, 9783030496630

This volume constitutes the proceedings of the 16th International Conference on Intelligent Tutoring Systems, ITS 2020,

960 94 32MB

English Pages XIX, 443 [462] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Intelligent Tutoring Systems: 16th International Conference, ITS 2020, Athens, Greece, June 8–12, 2020, Proceedings [1st ed.]
 9783030496623, 9783030496630

Table of contents :
Front Matter ....Pages i-xix
Multi-sensual Augmented Reality in Interactive Accessible Math Tutoring System for Flipped Classroom (Dariusz Mikułowski, Jolanta Brzostek-Pawłowska)....Pages 1-10
Adaptive Learning to Support Reading Skills Development for All (Lionel Alvarez, Thierry Geoffre)....Pages 11-16
General ITS Software Architecture and Framework (Nikolaj Troels Graf von Malotky, Alke Martens)....Pages 17-22
Let the End User in Peace: UX and Usability Aspects Related to the Design of Tutoring Systems (Juliano Sales, Katerina Tzafilkou, Adamantios Koumpis, Thomas Gees, Heinrich Zimmermann, Nicolaos Protogeros et al.)....Pages 23-26
Developing a Multimodal Affect Assessment for Aviation Training (Tianshu Li, Imène Jraidi, Alejandra Ruiz Segura, Leo Holton, Susanne Lajoie)....Pages 27-37
Scaling Mentoring Support with Distributed Artificial Intelligence (Ralf Klamma, Peter de Lange, Alexander Tobias Neumann, Benedikt Hensen, Milos Kravcik, Xia Wang et al.)....Pages 38-44
Exploring Navigation Styles in a FutureLearn MOOC (Lei Shi, Alexandra I. Cristea, Armando M. Toda, Wilk Oliveira)....Pages 45-55
Changes of Affective States in Intelligent Tutoring System to Improve Feedbacks Through Low-Cost and Open Electroencephalogram and Facial Expression (Wellton Costa de Oliveira, Ernani Gottardo, Andrey Ricardo Pimentel)....Pages 56-62
Computer-Aided Grouping of Students with Reading Disabilities for Effective Response-to-Intervention (Chia-Ling Tsai, Yong-Guei Lin, Ming-Chi Liu, Wei-Yang Lin)....Pages 63-67
SHAPed Automated Essay Scoring: Explaining Writing Features’ Contributions to English Writing Organization (David Boulanger, Vivekanandan Kumar)....Pages 68-78
Probabilistic Approaches to Detect Blocking States in Intelligent Tutoring System (Jean-Philippe Corbeil, Michel Gagnon, Philippe R. Richard)....Pages 79-88
Avoiding Bias in Students’ Intrinsic Motivation Detection (Pedro Bispo Santos, Caroline Verena Bhowmik, Iryna Gurevych)....Pages 89-94
Innovative Robot for Educational Robotics and STEM (Avraam Chatzopoulos, Michail Papoutsidakis, Michail Kalogiannakis, Sarantos Psycharis)....Pages 95-104
Supporting Students by Integrating an Open Learner Model in a Peer Assessment Platform (Gabriel Badea, Elvira Popescu)....Pages 105-114
Explaining Traffic Situations – Architecture of a Virtual Driving Instructor (Martin K. H. Sandberg, Johannes Rehm, Matej Mnoucek, Irina Reshodko, Odd Erik Gundersen)....Pages 115-124
MOOCOLAB - A Customized Collaboration Framework in Massive Open Online Courses (Ana Carla A. Holanda, Patrícia Azevedo Tedesco, Elaine Harada T. Oliveira, Tancicleide C. S. Gomes)....Pages 125-131
Mixed Compensation Multidimensional Item Response Theory (Béatrice Moissinac, Aditya Vempaty)....Pages 132-141
Data-Driven Analysis of Engagement in Gamified Learning Environments: A Methodology for Real-Time Measurement of MOOCs (Khulood Alharbi, Laila Alrajhi, Alexandra I. Cristea, Ig Ibert Bittencourt, Seiji Isotani, Annie James)....Pages 142-151
Intelligent Predictive Analytics for Identifying Students at Risk of Failure in Moodle Courses (Theodoros Anagnostopoulos, Christos Kytagias, Theodoros Xanthopoulos, Ioannis Georgakopoulos, Ioannis Salmon, Yannis Psaromiligkos)....Pages 152-162
Prediction of Users’ Professional Profile in MOOCs Only by Utilising Learners’ Written Texts (Tahani Aljohani, Filipe Dwan Pereira, Alexandra I. Cristea, Elaine Oliveira)....Pages 163-173
Cohesion Network Analysis: Predicting Course Grades and Generating Sociograms for a Romanian Moodle Course (Maria-Dorinela Dascalu, Mihai Dascalu, Stefan Ruseti, Mihai Carabas, Stefan Trausan-Matu, Danielle S. McNamara)....Pages 174-183
A Study on the Factors Influencing the Participation of Face-to-Face Discussion and Online Synchronous Discussion in Class (Lixin Zhao, Xiaoxia Shen, Wu-Yuin Hwang, Timothy K. Shih)....Pages 184-195
Applying Genetic Algorithms for Recommending Adequate Competitors in Mobile Game-Based Learning Environments (Akrivi Krouska, Christos Troussas, Cleo Sgouropoulou)....Pages 196-204
Dynamic Detection of Learning Modalities Using Fuzzy Logic in Students’ Interaction Activities (Christos Troussas, Akrivi Krouska, Cleo Sgouropoulou)....Pages 205-213
Adaptive Music Therapy for Alzheimer’s Disease Using Virtual Reality (Alexie Byrns, Hamdi Ben Abdessalem, Marc Cuesta, Marie-Andrée Bruneau, Sylvie Belleville, Claude Frasson)....Pages 214-219
Improving Cognitive and Emotional State Using 3D Virtual Reality Orientation Game (Manish Kumar Jha, Marwa Boukadida, Hamdi Ben Abdessalem, Alexie Byrns, Marc Cuesta, Marie-Andrée Bruneau et al.)....Pages 220-225
A Multidimensional Deep Learner Model of Urgent Instructor Intervention Need in MOOC Forum Posts (Laila Alrajhi, Khulood Alharbi, Alexandra I. Cristea)....Pages 226-236
Should We Consider Efficiency and Constancy for Adaptation in Intelligent Tutoring Systems? (Pedro Manuel Moreno-Marcos, Dánae Martínez de la Torre, Gabriel González Castro, Pedro J. Muñoz-Merino, Carlos Delgado Kloos)....Pages 237-247
An Interactive Recommender System Based on Reinforcement Learning for Improving Emotional Competences in Educational Groups (Eleni Fotopoulou, Anastasios Zafeiropoulos, Michalis Feidakis, Dimitrios Metafas, Symeon Papavassiliou)....Pages 248-258
Can We Use Gamification to Predict Students’ Performance? A Case Study Supported by an Online Judge (Filipe D. Pereira, Armando Toda, Elaine H. T. Oliveira, Alexandra I. Cristea, Seiji Isotani, Dion Laranjeira et al.)....Pages 259-269
AFFLOG: A Logic Based Affective Tutoring System (Achilles Dougalis, Dimitris Plexousakis)....Pages 270-274
Towards a Framework for Learning Systems in Smart Universities (Konstantinos Chytas, Anastasios Tsolakidis, Christos Skourlas)....Pages 275-279
Interweaving Activities, Feedback and Learner Model in a Learner Centered Learning Environment (Agoritsa Gogoulou)....Pages 280-283
Enriching Synchronous Collaboration in Online Courses with Configurable Conversational Agents (Stergios Tegos, Georgios Psathas, Thrasyvoulos Tsiatsos, Christos Katsanos, Anastasios Karakostas, Costas Tsibanis et al.)....Pages 284-294
Where the Competency-Based Assessment Meets the Semantic Learning Analytics (Khaled Halimi, Hassina Seridi-Bouchelaghem)....Pages 295-305
Towards CSCL Scripting by Example (Andreas Papasalouros, George Chatzimichalis)....Pages 306-315
Educators’ Validation on a Reflective Writing Framework (RWF) for Assessing Reflective Writing in Computer Science Education (Huda Alrashidi, Mike Joy, Thomas Daniel Ullmann, Nouf Almujally)....Pages 316-322
Validating the Reflective Writing Framework (RWF) for Assessing Reflective Writing in Computer Science Education Through Manual Annotation (Huda Alrashidi, Mike Joy, Thomas Daniel Ullmann, Nouf Almujally)....Pages 323-326
Recommender System for Quality Educational Resources (Wafa Bel Hadj Ammar, Mariem Chaabouni, Henda Ben Ghezala)....Pages 327-334
Intelligent Tutoring Systems for Psychomotor Training – A Systematic Literature Review (Laurentiu-Marian Neagu, Eric Rigaud, Sébastien Travadel, Mihai Dascalu, Razvan-Victor Rughinis)....Pages 335-341
Employing Social Network Analysis to Enhance Community Learning (Kyparisia Papanikolaou, Maria Tzelepi, Maria Moundridou, Ioannis Petroulis)....Pages 342-352
Is MOOC Learning Different for Dropouts? A Visually-Driven, Multi-granularity Explanatory ML Approach (Ahmed Alamri, Zhongtian Sun, Alexandra I. Cristea, Gautham Senthilnathan, Lei Shi, Craig Stewart)....Pages 353-363
Self-construction and Interactive Simulations to Support the Learning of Drawing Graphs and Reasoning in Mathematics (Sonia Palha, Anders Bouwer, Bert Bredeweg, Siard Keulen)....Pages 364-370
WebApriori: A Web Application for Association Rules Mining (Konstantinos Malliaridis, Stefanos Ougiaroglou, Dimitris A. Dervos)....Pages 371-377
Dialogue Act Pairs for Automated Analysis of Typed-Chat Group Problem-Solving (Duy Bui, Jung Hee Kim, Michael Glass)....Pages 378-381
Long Term Retention of Programming Concepts Learned Using a Software Tutor (Amruth N. Kumar)....Pages 382-387
Towards a Template-Driven Approach to Documenting Teaching Practices (Nouf Almujally, Mike Joy)....Pages 388-396
A Knowledge Sharing System Architecture for Higher Education Institutions (Nouf Almujally, Mike Joy)....Pages 397-402
Reducing Cognitive Load for Anatomy Students with a Multimodal ITS Platform (Reva Freedman, Ben Kluga, Dean Labarbera, Zachary Hueneke, Virginia Naples)....Pages 403-406
Learning Analytics in Big Data Era. Exploration, Validation and Predictive Models Development (Ioannis C. Drivas, Georgios A. Giannakopoulos, Damianos P. Sakas)....Pages 407-410
Learning Analytics Dashboard for Motivation and Performance (Damien S. Fleur, Wouter van den Bos, Bert Bredeweg)....Pages 411-419
Educational Driving Through Intelligent Traffic Simulation (Bogdan Vajdea, Aurelia Ciupe, Bogdan Orza, Serban Meza)....Pages 420-426
Quality Assurance in Higher Education: The Role of Students (George Meletiou, Cleo Sgouropoulou, Christos Skourlas)....Pages 427-431
Evolutionary Learner Profile Optimization Using Rare and Negative Association Rules for Micro Open Learning (Geng Sun, Jiayin Lin, Jun Shen, Tingru Cui, Dongming Xu, Huaming Chen)....Pages 432-440
Correction to: Changes of Affective States in Intelligent Tutoring System to Improve Feedbacks Through Low-Cost and Open Electroencephalogram and Facial Expression (Wellton Costa de Oliveira, Ernani Gottardo, Andrey Ricardo Pimentel)....Pages C1-C1
Back Matter ....Pages 441-443

Citation preview

LNCS 12149

Vivekanandan Kumar Christos Troussas (Eds.)

Intelligent Tutoring Systems 16th International Conference, ITS 2020 Athens, Greece, June 8–12, 2020 Proceedings

Lecture Notes in Computer Science Founding Editors Gerhard Goos Karlsruhe Institute of Technology, Karlsruhe, Germany Juris Hartmanis Cornell University, Ithaca, NY, USA

Editorial Board Members Elisa Bertino Purdue University, West Lafayette, IN, USA Wen Gao Peking University, Beijing, China Bernhard Steffen TU Dortmund University, Dortmund, Germany Gerhard Woeginger RWTH Aachen, Aachen, Germany Moti Yung Columbia University, New York, NY, USA

12149

More information about this series at http://www.springer.com/series/7408

Vivekanandan Kumar Christos Troussas (Eds.) •

Intelligent Tutoring Systems 16th International Conference, ITS 2020 Athens, Greece, June 8–12, 2020 Proceedings

123

Editors Vivekanandan Kumar Athabasca University Athabasca, AB, Canada

Christos Troussas University of West Attica Egaleo, Greece

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-49662-3 ISBN 978-3-030-49663-0 (eBook) https://doi.org/10.1007/978-3-030-49663-0 LNCS Sublibrary: SL2 – Programming and Software Engineering © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

The 16th International Conference on Intelligent Tutoring Systems (ITS 2020) was held in Athens, Greece, during June 8–12, 2020. ITS 2020 took place on the scheduled dates as an online conference. The hosting institution of the ITS 2020 conference was the University of West Attica, Greece. The theme of ITS 2020 was “Artificial Intelligence and Beyond: From Alpha to Omega” with an objective to present academic and research achievements of computer and cognitive sciences, artificial intelligence (AI), and deep learning vis-a-vis the advances of Intelligent Tutoring Systems. ITS 2020 focused on the use of novel and sophisticated technologies blending with multifaceted research approaches for promoting and ameliorating tutoring systems. It provided a setting to discuss recent and high-quality developments in the broader area of AI in Education. In addition, it offered a good opportunity for participants to present and discuss topics in their respective research areas, allowing an online network of researchers. The call for scientific papers focused on a plethora of topics of interest in the area of ITS and beyond including the following: • • • • • • • • • • • • • • • • • • • •

Intelligent Tutoring AI in Education Educational Data Mining Machine Learning in Intelligent Tutoring Systems Deep Learning and Intelligent Tutoring Systems Informal Learning Environments and Learning as a Side Effect of Interactions Collaborative and Group Learning, Communities of Practice, and Social Networks Simulation-Based Learning and Serious Games Immersive and Virtual Reality Environments Dialogue and Discourse During Learning Interactions Ubiquitous, Mobile, and Cloud Learning Environments Empirical Studies of Learning with Technologies Understanding Human Learning on the Web Adaptive Support for Learning, Models of Learners, and Diagnosis and Feedback Intelligent Health Applications Modeling of Motivation, Metacognition, and Affect Aspects of Learning Recommender Systems for Learning Virtual Pedagogical Agents and Learning Companions Ontological Modeling, Semantic Web Technologies, and Standards for Learning Multi-Agent and Service Oriented Architectures for Learning and Tutoring Environments • Educational Exploitation of Data Mining and Machine Learning Techniques • Instructional Design Principles or Design Patterns for Educational Environments • Authoring Tools and Development Methodologies for Advanced Learning Technologies

vi

Preface

• Domain-Specific Learning Technologies, e.g. Language, Mathematics, Reading, Science, Medicine, Military, and Industry • Non-Conventional Interactions Between AI and Human Learning • Privacy and Security in e-Learning Environments • Affective Computing and Intelligent Tutoring Systems • Brain-Computer Interface applications in Intelligent Tutoring Systems • Analytics and Casual Modeling in Intelligent Tutoring The call for papers solicited work presenting substantive new research results in using advanced computer technologies and interdisciplinary research for enabling, supporting, and enhancing human learning. A Posters Track was also organized, which provided an interactive forum for authors to present research prototypes to conference participants, as well as work in progress. The international Program Committee consisted of 93 leading members of the Intelligent Tutoring Systems community (26 senior and 57 regular), as well as highly promising younger researchers. The conference general chair was Cleo Sgouropoulou from the University of West Attica, Greece, whereas the Program Committee chairs were Vive Kumar from Athabasca University, Canada, and Christos Troussas from the University of West Attica, Greece. Scientific papers were reviewed by three to five reviewers (one or more being senior) through a double-blind process. Only 27% of submitted papers were accepted as full papers, about 23% were accepted as short papers, and just 13% were accepted as posters. These percentages indicate that ITS 2020 is a top-flight, rather selective, high-quality conference. During the review process, the reviewers’ evaluations were generally respected, especially those made by the senior reviewers. A separate Doctoral Consortium (DC) provided a forum in which PhD students could present and discuss their work during its early stages, meet peers with related interests, and work with more senior members of the field (mentors). The DC chairs were Jason Harley from McGill University, Canada, and Christos Troussas from the University of West Attica, Greece. The accepted submissions of the DC were presented as posters. The management of the review process and the preparation of the proceedings was handled through EasyChair. On the basis of the ITS philosophy, the selected full papers described some very significant research, the short papers investigated some very interesting novel ideas, while the posters presented research in progress that deserves close attention. A variety of new techniques were introduced or revisited, including multimodal affective computing, explainable AI, mixed-compensation multidimensional item response, ensemble deep learning, cohesion network analysis, spiral of silence, conversational agent, semantic web, computer-supported collaborative learning, and social network analysis. The rigor of the reported work was ironclad and yielded several generalizable results. Moreover, it allowed room for the deployment of methods such as observational studies, longitudinal studies, and meta-analysis that may offer new perspectives in future ITS conferences. A conference is only as good as its authors, how they expand the frontiers, and the rigor with which they compel the rest of the community to explore the beyond. Just as it has been for the past four decades, the papers of ITS 2020 pushed the boundaries of

Preface

vii

intelligent tutoring further. Several of them had reported ground-breaking work in areas including conversational agents, gamification, social profiling, emotive computing, grades prediction, employment status prediction, theory generalization, virtual instruction, and community robotics. The ITS 2020 program was reinforced by the successful organization of a full-day workshop: “Intelligence Support for Mentoring Processes in Higher Education” by Ralf Klamma, Milos Kravcik, Elvira Popescu, and Viktoria Pammer-Schindler, and a half-day tutorial: “ASSET Learning Programme Model and Accompanying Tools” by Eleni-Aikaterini Leligou and Panagiotis Karkazis. They were both selected and managed by the workshop and tutorial chair, Athanasios Voulodimos, from the University of West Attica, Greece. Amid the difficult and unprecedented conditions of the COVID-19 repercussions, we would like to we express our gratitude to many different contributors: The successful preparation and implementation of the ITS 2020 conference was secured by the original work of all the authors, the devoted contribution of the various conference chairs, the members of the Program Committee, the Steering Committee, and in particular its chair, Prof. Claude Frasson. The organization, coordination, and online operation of the conference was achieved by the organizers and the organization chair, Kitty Panourgia. We would also like to address our special thanks to the conference sponsors, Microsoft Greece, and Cisco Greece for the support. Last but not least, we would like to acknowledge the Institute of Intelligent Systems (IIS) under the auspices of which this conference was held. Instead of an epilogue to this preface, we would like to stress that one of the basic outcomes of the ITS 2020 conference is the balance between new researchers and established researchers, novel topics and finely aged topics, theoretical extensions and commercial interests, breadthwise growth of topics and depth wise evolution of subgenres. This balance constitutes an absolutely essential dimension to sustain the field of Intelligent Tutoring Systems in the context of new endeavors by its academic and research community. April 2020

Vive Kumar Christos Troussas

Organization

Conference Committee General Chair Cleo Sgouropoulou

University of West Attica, Greece

Program Chairs Vivekanandan Kumar Christos Troussas

Athabasca University, Canada University of West Attica, Greece

Organization Chair Kitty Panourgia

Neoanalysis, Greece

Workshops and Tutorial Chair Athanasios Voulodimos

University of West Attica, Greece

Doctoral Consortium Chairs Jason Harley Christos Troussas

McGill University, Canada University of West Attica, Greece

Posters and Demos Chair Kyparisia Papanikolaou

School of Pedagogical and Technological Education, Greece

The conference was held under the auspices of the Institute of Intelligent Systems (IIS).

Program Committee Program Chairs Vivekanandan Kumar Christos Troussas

Athabasca University, Canada University of West Attica, Greece

x

Organization

Senior Program Committee Kevin Ashley Roger Azevedo Bert Bredeweg Stefano A. Cerri Maiga Chang Michaela Cocea Michel Desmarais Benedict Du Boulay Claude Frasson Gilles Gauthier Peter Groumpos Nathalie Guin Yugo Hayashi W. Lewis Johnson Charalambos Karagiannidis Kinshuk Siu-Cheung Kong Vivekanandan Kumar Jean-Marc Labat Susanne Lajoie Riichiro Mizoguchi Roger Nkambou Demetrios Sampson Stefan Trausan-Matu Christos Troussas Beverly Park Woolf

University of Pittsburgh, USA University of Central Florida, USA University of Amsterdam, The Netherlands LIRMM, University of Montpellier, and CNRS, France Athabasca University, Canada University of Porthsmouth, UK École Polytechnique de Montréal, Canada University of Sussex, UK University of Montreal, Canada Université du Québec à Montréal,, Canada University of Patras, Greece University of Lyon 1, France Ritsumeikan University, Japan Alelo Inc., USA University of Thessaly, Greece University of North Texas, USA The University of Hong Kong, China Athabasca University, Canada Université Paris 6, France McGill University, Canada Japan Advanced Institute of Science and Technology, Japan Université du Québec à Montréal, Canada University of Piraeus, Greece Politehnica University of Bucharest, Romania University of West Attica, Greece University of Massachusetts, USA

Program Committee Mohammed Abdel Razek Fabio Akhras Galia Angelova Silvia Margarita Baldiris Navarro Maria Lucia Barron-Estrada Maria Bielikova Emmanuel Blanchard François Bouchet David Boulanger Nicola Capuano Chih-Kai Chang Maher Chaouachi

King Abdulaziz University, Saudi Arabia Renato Archer Center of Information Technology, Brazil Bulgarian Academy of Sciences, Bulgaria Fundación Universitaria, Spain Instituto Tecnológico de Culiacán, Mexico Slovak University of Technology in Bratislava, Slovakia IDÛ Interactive Inc., Canada Université de Sorbonne and LIP6, France Athabasca University, Canada University of Salerno, Italy National University of Tainan, Taiwan McGill University, Canada

Organization

Min Chi Chih-Yueh Chou Evandro Costa Diego Demerval Cyrille Desmoulins Philippe Dessus Rachel Dickler Mark Floryan Davide Fossati Reva Freedman Benjamin Goldberg Sabine Graf Xiaoqing Gu Ben Abdessalem Hamdi Jason Harley Seiji Isotani Patricia Jaques Akihiro Kashihara Adamantios Koumpis Amruth Kumar Rita Kuo Nguyen-Thinh Le Elise Lavoué Blair Lehman Carla Limongelli Fuhua Oscar Lin Chao-Lin Liu Riccardo Mazza Tassos Mikopoulos Kazuhisa Miwa Kuo-Liang Ou Elvira Popescu Alexandra Poulovassilis Valéry Psyché Rod Roscoe Olga C. Santos Kaoru Sumi Thepchai Supnithi Michelle Taub Marco Temperini Ahmed Tlili Radu Vasiu

xi

North Carolina State University, USA Yuan Ze University, Taiwan Federal University of Alagoas, Brazil Federal University of Alagoas, Brazil Université Joseph Fourrier, France LSE Grenoble, France Rutgers University, USA University of Virginia, USA Carnegie Mellon University, Qatar North Illinois University, USA United States Army Research Laboratory, USA Athabasca University, Canada East China Normal University, China University of Montreal, Canada University of Alberta, Canada University of São Paulo, Brazil UNISINOS, Brazil The University of Electro-Communications, Japan Institut Digital Enabling (BFH), Switzerland Ramapo College of New Jersey, USA New Mexico Institute of Mining and Technology, USA Humboldt Universität zu Berlin, Germany University of Lyon, France Educational Testing Service, USA University of Rome 3, Italy Athabasca University, Canada National Chengchi University, Taiwan University of Luganoa and University of Applied Sciences of Southern Switzerland, Switzerland University of Ioannina, Greece Nagoya University, Japan National Hsin-Chu University of Education, Taiwan University of Craiova, Romania Birkbeck University of London, UK TÉLUQ University, Canada Arizona State University, USA aDeNu Research Group, UNED, Spain Future University, Japan National Electronics and Computer Technology Center, Thailand North Carolina State University, USA Sapienza University of Rome, Italy Beijing Normal University, China Politechnica University of Timisoara, Romania

xii

Organization

Athanasios Voulodimos Li Wang Dunwei Wen

University of West Attica, Greece Open University of China, China Athabasca University, Canada

Organization Committee Organizations Chair Kitty Panourgia

General Coordination/Proceedings/Program

Members Aggelos Amarandos Alexia Kakourou Eliana Vassiliou Isaak Tselepis

Registration Coordination on Site Conference Publicity/Website Management Website Architect

Steering Committee Chair Claude Frasson

University of Montreal, Canada

Members Stefano A. Cerri Isabel Fernandez-Castro Gilles Gauthier Guy Gouardures Tsukasa Hirashima Marc Kaltenbach Alan Lesgold James Lester Alessandro Micarelli Roger Nkambou Giorgos Papadourakis Fabio Paragua Elliot Soloway Daniel Suthers Stefan Trausen-Matu Beverly Woolf

LIRMM, University of Montpellier, and CNRS, France University of the Basque Country, Spain Université du Québec à Montréal, Canada University of Pau, France University of Hiroshima, Japan Bishop’s University, Canada University of Pittsburgh, USA North Carolina State University, USA Roma Tre University, Italy Université du Québec à Montréal, Canada Technological Educational Institute Crete, Greece Federal University of Alagoas, Brazil University of Michigan, USA University of Hawai, USA University Politehnica of Bucharest, Romania University of Massachusetts, USA

Advisory Committee Members Luigia Carlucci Aiello Maria Grigoriadou Judith Kay Demetrios G. Sampson

University University University University

of of of of

Rome, Italy Athens, Greece Sydney, Australia Piraeus, Greece

Learning and Retention Through VR and Serious Games: Aviation Safety as a Paradigmatic Example (Abstract of Keynotes)

Luca Chittaro Human-Computer Interaction Lab, Department of Mathematics, Computer Science, and Physics, University of Udine, Italy http://hcilab.uniud.it Abstract. Virtual reality (VR) and serious games are increasingly used in a variety of domains, including health and safety. This keynote will present research efforts to improve the theoretical grounding as well as the practical effectiveness of serious games and educational VR. Aviation safety will be used as a paradigmatic example, and novel pedagogical techniques will be illustrated through practical demonstrations of real-world applications we deployed. The presentation will discuss their effects on user’s knowledge (learning, transfer, retention) as well as on different psychological constructs such as engagement, presence, self-efficacy, and locus of control. Keywords. Virtual Reality  Serious games  Learning  Education  Training  Aviation safety

Outline of the Keynote Virtual reality (VR) experiences and serious games, i.e. digital games to further education objectives, are increasingly used in a variety of domains, including health and safety. However, compared to entertainment games, their design and evaluation can be more complex because it needs to take into account additional, multidisciplinary factors such as knowledge retention and attitude change. This keynote talk will introduce and discuss recent research and development efforts [1–7] by our lab to improve the theoretical grounding as well as the practical effectiveness of serious games and educational VR. The presentation will use aviation safety as a paradigmatic example, and demonstrate novel pedagogical techniques through practical examples of real-world, deployed applications that have a very large user base. For example, “Prepare for Impact”, the most popular of our publicly released serious games for air traveler education [8], has been installed and played by more than 8 million persons as of April 2020. The keynote will also highlight the main findings [1–6] about serious games and educational VR obtained in international aviation research projects we carried out under US Federal Aviation Administration (FAA) grants. In these projects, we tackled

xiv

L. Chittaro

the notoriously difficult problem of effectively educating air travelers about safety. This problem is relevant, because the level of aviation safety knowledge in air travelers is a key factor to survive aircraft accidents. Airlines currently educate passengers about safety through the routine preflight briefing and printed safety cards. Unfortunately, both methods suffer from a serious lack of effectiveness, as shown by empirical studies, interviews of accident survivors, and reports by government investigative agencies. Two major reasons for the failure of current methods is lack of engagement that leads passengers not to pay attention to the safety information, and lack of comprehension that leads to misunderstand the information even when attention is paid [5]. Moreover, information that has been understood correctly is subject to rapid decay. We investigated the creation of novel digital experiences to educate air travelers about safety with different methods: exploring different game genres and VR contexts; identifying psychological theories that could guide the design of the experiences; conducting user studies of the developed serious games to assess their effects on player’s knowledge and competence (learning, transfer, retention) as well as effects on psychological constructs such as engagement, self-efficacy, and locus of control; publicly deploying multiple, freely available apps. While studies in the VR and serious games literature often evaluate engagement and immediate learning, retention is rarely considered. For this reason, the talk will give particular emphasis to knowledge retention.

References 1. Buttussi. F., Chittaro, L., Effects of different types of virtual reality display on presence and learning in a safety training scenario. IEEE Trans. Vis. Comput. Graph. 24, 1063–1076 (2018) 2. Buttussi, F., Chittaro, L.: Humor and fear appeals in animated pedagogical agents: an evaluation in aviation safety education. IEEE Trans. Learn. Technol. 13, 63–76 (2020). 3. Chittaro, L., Buttussi, F.: Assessing knowledge retention of an immersive serious game vs. a traditional education method in aviation safety. IEEE Trans. Vis. Comput. Graph. 21, 529–538 (2015) 4. Chittaro, L.: Designing serious games for safety education: “learn to brace” vs. traditional pictorials for aircraft passengers. IEEE Trans. Vis. Comput. Graph. 22, 1527–1539 (2016) 5. Chittaro, L., Corbett, C., McLean, M., Zangrando, N.: Safety knowledge transfer through mobile virtual reality: a study of aviation life preserver donning. Saf. Sci. 102, 159–168 (2018) 6. Chittaro, L., Buttussi, F.: Exploring the use of arcade game elements for attitude change: two studies in the aviation safety domain. Int. J. Hum. Comput. Stud. 127, 112–123 (2019) 7. Chittaro, L., Sioni, R.: Serious games for emergency preparedness: evaluation of an interactive vs. a non-interactive simulation of a terror attack. Comput. Hum. Behav. 50, 508–519 (2015) 8. HCI Lab, University of Udine. Aviation Safety Apps. http://hcilab.uniud.it/aviation/apps.html. Accessed 27 Apr 2020

Contents

Multi-sensual Augmented Reality in Interactive Accessible Math Tutoring System for Flipped Classroom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dariusz Mikułowski and Jolanta Brzostek-Pawłowska Adaptive Learning to Support Reading Skills Development for All: Using a Single-Case Experimental Design to Monitor, Describe, and Assess the Impact of Adaptive Learning on Language Development of a Diversity of K-12 Pupils . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lionel Alvarez and Thierry Geoffre General ITS Software Architecture and Framework . . . . . . . . . . . . . . . . . . . Nikolaj Troels Graf von Malotky and Alke Martens Let the End User in Peace: UX and Usability Aspects Related to the Design of Tutoring Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juliano Sales, Katerina Tzafilkou, Adamantios Koumpis, Thomas Gees, Heinrich Zimmermann, Nicolaos Protogeros, and Siegfried Handschuh

1

11 17

23

Developing a Multimodal Affect Assessment for Aviation Training . . . . . . . . Tianshu Li, Imène Jraidi, Alejandra Ruiz Segura, Leo Holton, and Susanne Lajoie

27

Scaling Mentoring Support with Distributed Artificial Intelligence. . . . . . . . . Ralf Klamma, Peter de Lange, Alexander Tobias Neumann, Benedikt Hensen, Milos Kravcik, Xia Wang, and Jakub Kuzilek

38

Exploring Navigation Styles in a FutureLearn MOOC . . . . . . . . . . . . . . . . . Lei Shi, Alexandra I. Cristea, Armando M. Toda, and Wilk Oliveira

45

Changes of Affective States in Intelligent Tutoring System to Improve Feedbacks Through Low-Cost and Open Electroencephalogram and Facial Expression . . . . . . . . . . . . . . . . . . . . . . . Wellton Costa de Oliveira, Ernani Gottardo, and Andrey Ricardo Pimentel

56

Computer-Aided Grouping of Students with Reading Disabilities for Effective Response-to-Intervention . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chia-Ling Tsai, Yong-Guei Lin, Ming-Chi Liu, and Wei-Yang Lin

63

SHAPed Automated Essay Scoring: Explaining Writing Features’ Contributions to English Writing Organization . . . . . . . . . . . . . . . . . . . . . . David Boulanger and Vivekanandan Kumar

68

xvi

Contents

Probabilistic Approaches to Detect Blocking States in Intelligent Tutoring System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jean-Philippe Corbeil, Michel Gagnon, and Philippe R. Richard

79

Avoiding Bias in Students’ Intrinsic Motivation Detection . . . . . . . . . . . . . . Pedro Bispo Santos, Caroline Verena Bhowmik, and Iryna Gurevych

89

Innovative Robot for Educational Robotics and STEM . . . . . . . . . . . . . . . . Avraam Chatzopoulos, Michail Papoutsidakis, Michail Kalogiannakis, and Sarantos Psycharis

95

Supporting Students by Integrating an Open Learner Model in a Peer Assessment Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gabriel Badea and Elvira Popescu Explaining Traffic Situations – Architecture of a Virtual Driving Instructor. . . Martin K. H. Sandberg, Johannes Rehm, Matej Mnoucek, Irina Reshodko, and Odd Erik Gundersen MOOCOLAB - A Customized Collaboration Framework in Massive Open Online Courses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ana Carla A. Holanda, Patrícia Azevedo Tedesco, Elaine Harada T. Oliveira, and Tancicleide C. S. Gomes Mixed Compensation Multidimensional Item Response Theory . . . . . . . . . . . Béatrice Moissinac and Aditya Vempaty Data-Driven Analysis of Engagement in Gamified Learning Environments: A Methodology for Real-Time Measurement of MOOCs . . . . . . . . . . . . . . . Khulood Alharbi, Laila Alrajhi, Alexandra I. Cristea, Ig Ibert Bittencourt, Seiji Isotani, and Annie James Intelligent Predictive Analytics for Identifying Students at Risk of Failure in Moodle Courses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Theodoros Anagnostopoulos, Christos Kytagias, Theodoros Xanthopoulos, Ioannis Georgakopoulos, Ioannis Salmon, and Yannis Psaromiligkos Prediction of Users’ Professional Profile in MOOCs Only by Utilising Learners’ Written Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tahani Aljohani, Filipe Dwan Pereira, Alexandra I. Cristea, and Elaine Oliveira Cohesion Network Analysis: Predicting Course Grades and Generating Sociograms for a Romanian Moodle Course . . . . . . . . . . . . . . . . . . . . . . . . Maria-Dorinela Dascalu, Mihai Dascalu, Stefan Ruseti, Mihai Carabas, Stefan Trausan-Matu, and Danielle S. McNamara

105 115

125

132

142

152

163

174

Contents

xvii

A Study on the Factors Influencing the Participation of Face-to-Face Discussion and Online Synchronous Discussion in Class . . . . . . . . . . . . . . . Lixin Zhao, Xiaoxia Shen, Wu-Yuin Hwang, and Timothy K. Shih

184

Applying Genetic Algorithms for Recommending Adequate Competitors in Mobile Game-Based Learning Environments. . . . . . . . . . . . . . . . . . . . . . Akrivi Krouska, Christos Troussas, and Cleo Sgouropoulou

196

Dynamic Detection of Learning Modalities Using Fuzzy Logic in Students’ Interaction Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christos Troussas, Akrivi Krouska, and Cleo Sgouropoulou

205

Adaptive Music Therapy for Alzheimer’s Disease Using Virtual Reality . . . . Alexie Byrns, Hamdi Ben Abdessalem, Marc Cuesta, Marie-Andrée Bruneau, Sylvie Belleville, and Claude Frasson Improving Cognitive and Emotional State Using 3D Virtual Reality Orientation Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manish Kumar Jha, Marwa Boukadida, Hamdi Ben Abdessalem, Alexie Byrns, Marc Cuesta, Marie-Andrée Bruneau, Sylvie Belleville, and Claude Frasson A Multidimensional Deep Learner Model of Urgent Instructor Intervention Need in MOOC Forum Posts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Laila Alrajhi, Khulood Alharbi, and Alexandra I. Cristea Should We Consider Efficiency and Constancy for Adaptation in Intelligent Tutoring Systems?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pedro Manuel Moreno-Marcos, Dánae Martínez de la Torre, Gabriel González Castro, Pedro J. Muñoz-Merino, and Carlos Delgado Kloos An Interactive Recommender System Based on Reinforcement Learning for Improving Emotional Competences in Educational Groups . . . . . . . . . . . Eleni Fotopoulou, Anastasios Zafeiropoulos, Michalis Feidakis, Dimitrios Metafas, and Symeon Papavassiliou Can We Use Gamification to Predict Students’ Performance? A Case Study Supported by an Online Judge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filipe D. Pereira, Armando Toda, Elaine H. T. Oliveira, Alexandra I. Cristea, Seiji Isotani, Dion Laranjeira, Adriano Almeida, and Jonas Mendonça

214

220

226

237

248

259

AFFLOG: A Logic Based Affective Tutoring System . . . . . . . . . . . . . . . . . Achilles Dougalis and Dimitris Plexousakis

270

Towards a Framework for Learning Systems in Smart Universities . . . . . . . . Konstantinos Chytas, Anastasios Tsolakidis, and Christos Skourlas

275

xviii

Contents

Interweaving Activities, Feedback and Learner Model in a Learner Centered Learning Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Agoritsa Gogoulou Enriching Synchronous Collaboration in Online Courses with Configurable Conversational Agents . . . . . . . . . . . . . . . . . . . . . . . . . . Stergios Tegos, Georgios Psathas, Thrasyvoulos Tsiatsos, Christos Katsanos, Anastasios Karakostas, Costas Tsibanis, and Stavros Demetriadis Where the Competency-Based Assessment Meets the Semantic Learning Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Khaled Halimi and Hassina Seridi-Bouchelaghem Towards CSCL Scripting by Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andreas Papasalouros and George Chatzimichalis Educators’ Validation on a Reflective Writing Framework (RWF) for Assessing Reflective Writing in Computer Science Education . . . . . . . . . Huda Alrashidi, Mike Joy, Thomas Daniel Ullmann, and Nouf Almujally Validating the Reflective Writing Framework (RWF) for Assessing Reflective Writing in Computer Science Education Through Manual Annotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Huda Alrashidi, Mike Joy, Thomas Daniel Ullmann, and Nouf Almujally Recommender System for Quality Educational Resources. . . . . . . . . . . . . . . Wafa Bel Hadj Ammar, Mariem Chaabouni, and Henda Ben Ghezala Intelligent Tutoring Systems for Psychomotor Training – A Systematic Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Laurentiu-Marian Neagu, Eric Rigaud, Sébastien Travadel, Mihai Dascalu, and Razvan-Victor Rughinis Employing Social Network Analysis to Enhance Community Learning . . . . . Kyparisia Papanikolaou, Maria Tzelepi, Maria Moundridou, and Ioannis Petroulis Is MOOC Learning Different for Dropouts? A Visually-Driven, Multi-granularity Explanatory ML Approach . . . . . . . . . . . . . . . . . . . . . . . Ahmed Alamri, Zhongtian Sun, Alexandra I. Cristea, Gautham Senthilnathan, Lei Shi, and Craig Stewart Self-construction and Interactive Simulations to Support the Learning of Drawing Graphs and Reasoning in Mathematics . . . . . . . . . . . . . . . . . . . Sonia Palha, Anders Bouwer, Bert Bredeweg, and Siard Keulen

280

284

295 306

316

323 327

335

342

353

364

Contents

WebApriori: A Web Application for Association Rules Mining . . . . . . . . . . Konstantinos Malliaridis, Stefanos Ougiaroglou, and Dimitris A. Dervos

xix

371

Dialogue Act Pairs for Automated Analysis of Typed-Chat Group Problem-Solving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Duy Bui, Jung Hee Kim, and Michael Glass

378

Long Term Retention of Programming Concepts Learned Using a Software Tutor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amruth N. Kumar

382

Towards a Template-Driven Approach to Documenting Teaching Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nouf Almujally and Mike Joy

388

A Knowledge Sharing System Architecture for Higher Education Institutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nouf Almujally and Mike Joy

397

Reducing Cognitive Load for Anatomy Students with a Multimodal ITS Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reva Freedman, Ben Kluga, Dean Labarbera, Zachary Hueneke, and Virginia Naples Learning Analytics in Big Data Era. Exploration, Validation and Predictive Models Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ioannis C. Drivas, Georgios A. Giannakopoulos, and Damianos P. Sakas

403

407

Learning Analytics Dashboard for Motivation and Performance . . . . . . . . . . Damien S. Fleur, Wouter van den Bos, and Bert Bredeweg

411

Educational Driving Through Intelligent Traffic Simulation . . . . . . . . . . . . . Bogdan Vajdea, Aurelia Ciupe, Bogdan Orza, and Serban Meza

420

Quality Assurance in Higher Education: The Role of Students . . . . . . . . . . . George Meletiou, Cleo Sgouropoulou, and Christos Skourlas

427

Evolutionary Learner Profile Optimization Using Rare and Negative Association Rules for Micro Open Learning . . . . . . . . . . . . . . . . . . . . . . . . Geng Sun, Jiayin Lin, Jun Shen, Tingru Cui, Dongming Xu, and Huaming Chen Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

432

441

Multi-sensual Augmented Reality in Interactive Accessible Math Tutoring System for Flipped Classroom Dariusz Mikułowski1(&)

and Jolanta Brzostek-Pawłowska2

1

2

University of Natural Sciences and Humanities, Konarskiego 2, 08-110 Siedlce, Poland [email protected] NASK National Research Institute, Kolska 12, 02-045 Warsaw, Poland

Abstract. Evermore widespread “flipped classroom” learning model is associated with increased independence of learning. The problem is the independence of learning math by students with visual impairments, especially the blind. Mathematical content includes spatial objects such as formulas and graphics, inaccessible to blind students and hardly accessible to low vision students. They prevent independent learning. The article presents a method that increases students’ independence in recognising mathematical content in textbooks and worksheets. The method consists in introducing into the document elements of Augmented Reality (AR), that is texts and sounds extending information about the mathematical objects encountered in the content, beyond the information provided by WCAG guidelines and recommendations of the WAI-ARIA standard under development by the W3C consortium. Access to AR elements is gained through multi-sensual User Interface - hearing, the touch of a braille display, touch screen and touch gestures. The method was developed in cooperation with students with visual impairment and math teachers. It is currently undergoing valorisation in Poland, the Netherlands and Ireland. Keywords: Math accessibility  Multi-sensual augmented reality learning  Flipped classroom model  Blind students

 Self-

1 Introduction The electronic educational materials are indispensable for the increasingly widespread trend of the “flipped classroom” teaching model. They must be both attractive to keep the student’s attention and accessible to disabled students. Some groups of students, such as the blind, have limited access to electronic materials containing visualised spatial objects because the blind’s perception of the environment mainly focuses on sound and tactile senses. These problems are more apparent in subjects such as mathematics, physics, chemistry, that are rich in math formulas and drawings. The accessibility barrier is reduced or aligned by creating materials according to the WCAG (Web Content Accessibility Guidelines) and WAI-ARIA (WEB Reach Internet Application) standard [9, 10], under development by W3C consortium. The WCAG © Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 1–10, 2020. https://doi.org/10.1007/978-3-030-49663-0_1

2

D. Mikułowski and J. Brzostek-Pawłowska

and WAI-ARIA specify additional elements of web pages that increase accessibility as well as a user interface based on keyboard shortcuts for exploring them. So, we can say that by interacting with pages created in this way and using the interface according to the WCAG and ARIA recommendations, the blind users are in virtual reality, which for them becomes an audible and tangible reality. However, the WCAG and ARIA recommendations do not include support for finding, recognising, and exploring specific objects such as mathematical formulas, function graphs, drawings of geometric figures, mathematical quizzes, and tasks of the type “join into pairs”. The problem of access to mathematical content by visually impaired users, especially blind users is well known. Numerous scientific publications report further new solutions that help in recognising the mathematical content. The research on the latest AR devices has demonstrated their extreme usefulness for increasing the student motivation in the learning process [5–7]. In [3] an in-depth analysis of published research on AR and statistics on the use of AR in education and goals of their use is presented. The two most frequently indicated goals were additional explanations of the subject and extension of information. The latest book on the subject published in 2020 [2], a multi-author review of the state of AR applications in education, confirmed the effective impact of multi-sensory AR in early education, similar conclusions are in [8]. In early mathematics, as well as advanced mathematics, for the needs of sighted students, AR is used to visualise 3D mathematical expressions [1]. According to the authors’ knowledge, there are no reports on research on the multi-sensual AR in math education of blind and low vision students. Especially for this group of students using alternative interfaces, who instead of sight rely on the senses of hearing and touch, there can be effective solutions providing additional information about each mathematical object using the multi-sensual user interface. The article describes information superstructure that is received multi-sensually by the student, beyond the information recommended by WCAG and ARIA. It allows for interactive recognition of mathematical content. In conjunction with the developed technique, accessible for editing formulas, quizzes with math content, and creating graphics, it gives the blind student the possibility of creative, interactive, independent math learning.

2 Method In order to learn mathematics effectively, blind students should be as independent as possible, especially in the “flipped classroom” learning model. This means that they should have access to all elements of a mathematical multimedia document, e.g. a manual or worksheet. Visually impaired students must be able to analyse thoroughly, imagine, and understand each mathematical object. Otherwise, they are threatened with cognitive impairment. In their daily work with the computer, blind users use only the keyboard and special software called a screen reader. That is why the blind persons’ exploration of the

Multi-sensual Augmented Reality

3

document is different from that of a sighted. For example, instead of clicking a mouse pointer, they will use the tab key to highlight the button and then will press the space bar. The accessibility of different mathematical objects can be achieved by enriching pages with additional information elements forming the Augmented Reality (AR) layers in opposition to the cognitive reality resulting from the use of the WCAG and ARIA elements. We propose a method of enriching a mathematical document with multi-sensory AR elements, applied by the user in the subsequent stages of more and more explicit recognition of mathematical objects. There is one particular AR information layer associated with each recognition stage. There are three AR information layers supporting the user in three subsequent stages of exploring a mathematical document. There is also the fourth optional layer used depending on the teacher’s decision and/or the student’s needs. Additional AR information elements in the form of texts and sounds of various types are accessible through the multi-sensual user interface such as synthetic speech, touch gestures on the touch screen or by the touch of the haptic Braille display also called as a braille line (see Table 1). It is a device connected to the computer, presenting the characters in the form of protruding pins forming the letters of the Braille alphabet. 2.1

First AR Layer: General Recognition of Mathematical Objects in the Document

The first general information layer (additional to WCAG and ARIA requirements), contains text elements that are conveyed by synthetic speech, and it informs about encountering object such a graphics, mathematical formulas, quizzes, questions, answers, pairing fields and links to comments recorded by a teacher (see Table 1). The first layer user interface (UI) is a set of keyboard shortcuts that allows for simple finding locations of initially recognised objects in a document. It is also enriched with the haptic interface for the Braille display. For quickly locating mathematical objects of a particular type in a document (formulas, graphics, quizzes and others), there is a set of shortcut keys, the so-called hotkeys for quick access. An Example of Using the First AR Layer When the student, while navigating a document, using arrow keys on his keyboard, puts the cursor on a mathematical formula, the screen reader, through a speech synthesiser, will read to them that it is a formula. Similarly, he/she will get information on other types of elements such as graphics, pairing fields, questions and answers in quizzes. Moreover, each of this message is sent simultaneously to the Braille display in a shortened form to accelerate the speed of haptic reading. 2.2

Second AR Layer: More Information About Math Objects

After encountering a mathematical object and recognising its type, the student can examine it more thoroughly. It is possible, thanks to the AR elements placed in the

4

D. Mikułowski and J. Brzostek-Pawłowska

second layer (see Table 1). This layer contains information elements such as semantically readable formula text in Polish or English via speech synthesiser, graphic titles and descriptions, and content of pairing fields, questions or answers that can be a text, a formula or a graphic. The student can simultaneously read this information using the Braille display. UI of the second layer is a set of keyboard shortcuts used to obtain information about each math object in the document. Additionally, for purposes of haptic reading of formulas, this layer is equipped with converters of mathematical MathML web notation [11] to UEB - English mathematical Braille notation [12] and BNM - Polish mathematical Braille notation [13]. Table 1. An example of using the 1st and 2nd AR layers: finding formulas and learning the content of a selected formula AR Layer

1. Layer

UI

Augmented Reality

Reality on the screen

1.

formula

2.Ctrl+m

2.

formula

3.

3.

1.Ctrl+m

x equals fraction numerator minus b plus minus square root of b minus four a c end of the square root denominator two an end of fraction

2. Layer

Braille display

selected formula

The selected formula in UEB notation on the Braille display:

Multi-sensual Augmented Reality

5

An Example of Using the Second AR Layer When students encounter an object in a document and are informed that it is a mathematical formula, they can become familiar with its content. The text of semantic reading of the formula in Polish or English is automatically created and then read with synthetic speech. For example, for the formula 1/3 + 2/3 = 1, the text of the readout is: “one third plus two thirds equal one.” In addition to listening, the student can read the same formula haptic through touch using a Braille display. It is possible thanks to converters of MathML formula notation in HTML to Braille BNM (Polish) or UEB (English) notation. 2.3

Third AR Layer: Thorough Exploration of the Math Object

The third AR layer was developed for detailed exploration of formulas, function graphs and geometric drawings. It consists of the following items: texts of automatically generated or/and manually added descriptions of graphic elements; comments of the whole graph and its elements; texts of the formula elements; continuous sounds with constant and variable monotonicity describing graph elements; short technical sounds called audio-icons. UI access to the third layer is provided by four keyboard shortcuts, numeric keyboard keys, several on-screen touch gestures, a haptic touch of a braille display, synthetic speech and bolded lines outstanding graphic elements (see Fig. 1). The third informative layer enables detailed, interactive exploration of complex formulas and mathematical drawings by the following elements: 1. Reading parts of the structures of formulas and descriptions of drawings introduced during their editing (by synthetic speech and a haptic touch of the Braille display), 2. listening to the monotonic sound generated while the user is touching lines of geometrical figures and function graphs (through touch gestures), 3. listening to the sound of variable monotonicity describing the function graph (through touch gestures). Additionally, the user is informed with short beeps on zero’s points of a function and long beep about surpassing the drawing area during the exploration with touch gestures. An Example of Using the Third AR Layer The students need to analyse the complex formula more thoroughly (see Table 2). For this purpose, they can use the third AR layer, thanks to which the formula structure becomes fully accessible. The students use the keyboard shortcut shift + ctrl + 1 to enter the formula structure and navigate, like on a tree, through this structure while hearing the reading of the focused parts of the formula. They use the arrow keys or touch gestures to read the formula elements. In this way, the students can build a formula structure in their imagination. The thorough exploration of mathematical drawings is done with the help of sound signals generated when sliding one’s finger on the touch screen with displayed graphics on full screen. The student enters the drawing with two keyboard shortcuts, to interactively explore or to passively recognise it. By moving his finger on the screen, the student hears various sounds representing objects contained in the drawing.

6

D. Mikułowski and J. Brzostek-Pawłowska

Table 2. An example of using the 3rd AR layer: immersive learning about the elements of the formula (reading and modification)

UI

Reality on the screen

1.

2.

or

4.

or

5.

equals fraction numerator mi nus b plus minus square root of b squared minus four a c end of square root denominator two an end of fraction minus b plus minus square root of b squared minus four a c end of the square root minus

or

6.

or or

7. 8.

x equals fraction numerator minus b plus minus square root of b minus four a c end of square root denominator two an end of fraction x

Ctrl+Shift+1

3.

square root of b squared minus four a c end of the square root Immersive AR square root of b squared minus four a c end of square root

or

Ctrl+Shift+a

9.

Augmented Reality x equals fraction numerator minus b plus minus square root of b minus four a c end of square root denominator two an end of fraction

or

or

Ctrl+Shift+b

the selected element of the formula virtual Braille keyboard emulated on the QWERTY

square root of b squared minus four a c end of the square root

touch gestures

Multi-sensual Augmented Reality

7

Thanks to the feature automatically supporting graphics creation process, parameters given during its construction, such as coordinates of points, length and width of the polygon or radius of a circle are transformed into texts describing the given object. This automatically generated description is read after the tap gesture when the sound generated along the line or at the point is heard. This automated drawing creation mechanism is fully accessible to a blind student, thanks to which he/she can create or modify mathematical graphics on his own. When the student is “immersed” in graphics by touching its elements, then thanks to the 3rd AR layer, he/she can also modify it on his/her own. Thanks to this, immersive graphics learning is implemented. Next, we present a detailed description of the three methods of recognising graphics in the third AR layer. Method 1 When the student hears a sound while sliding finger on the screen, then as a result of the finger tapping gesture he/she will hear information with synthetic speech, relating to the encountered element of the drawing: “axis x”, “axis y”, of the touched graphics elements, , , in addition to those generated automatically are entered by the teacher or by the student while editing the drawing. Method 2 Listening to monotonic sound while sliding the finger over the bolded line of a figure or a function graph. The displayed drawing line is specially thickened to keep the finger moving within its area. Method 3 Listening to automatically generated sound with variable monotonicity, describing the function graph. The pitch informs the student about the Y coordinate. Students, knowing that the graph is always played from the smallest to the largest values on the X-axis, can build in their minds an image of the function graph they are listening to. Additionally, with particular sounds, students are informed about zeros of a function and on finger movement outside the drawing area. The elements and UI of the part of the third AR layer (for exploration of the function graph) are shown in Fig. 1.

Fig. 1. Third AR layer: elements and UI for detailed exploration of a function graph

8

2.4

D. Mikułowski and J. Brzostek-Pawłowska

Fourth AR Layer: Recorded Teacher Comments

The fourth layer, optional, contains information recorded by the teacher as hints, tips, or opinions in the form of static video (just sound) embedded in the content as a link to a resource placed on YouTube. This layer is supported by standard keyboard shortcuts that allow for opening links in a browser. It is up to the teacher whether to post comments and up to the student whether to use this form of teacher’s hints. Thanks to this combination of AR reality providing the right tools, the students in their cognitive reality can enrich their mathematical knowledge.

3 Results In the proposed method, the mathematical content can be independently, interactively, explored by students with visual impairment due to the addition of several layered AR, which is formed by subsequent information layers. The presented multi-sensual AR was implemented in the online environment EuroMath, in English and Polish, for independent learning of mathematics for blind and visually impaired students, also for the creation of mathematical content by teachers and students. The EuroMath environment consists of a WEB application for creating and exploring interactive multimedia mathematical content and portal supporting the repository of open educational math resources (OER) in the form .epub files [4]. OER consists of math content accessible to students with visual impairments (blind, with low vision) and methodological materials for teachers. Math teachers from Poland, the Netherlands, and Ireland have already created over 300 multimedia mathematical documents using the WEB application and uploaded them to the OER repository. Mathematical educational resources (OER) are described in detail with metadata that classifies resources by mathematical subject, type of school, level of education, visual impairment. Students can search for materials helpful in learning mathematics using the search engine available in the repository. Moreover, they can also support cooperation in a group of students on a discussion forum available on the EuroMath portal. EuroMath is an intelligent system supporting the learning process of students with visual dysfunction as well as teachers in creating educational materials: • In the EuroMath system, instead of several versions of a given document, for example, a work card, intended for the sighted student, the blind student and the low vision student, the teacher creates one universal version, which is adapted to the needs of each group of students, as a result of the application of conversion algorithms; • The EuroMath is teacher-friendly because: a) shortens the time needed to create universal materials by automatically generating and supplementing information about mathematical objects contained in the document (types of objects, the content of formulas and their elements, coordinates of graphic elements, zero points, and function graphs), b) enables the teacher to choose tools for editing and reading formulas and graphics, in extreme cases a blind math teacher (there are some) can

Multi-sensual Augmented Reality

9

create formulas and mathematical graphics available for each student according to their needs; • EuroMath is useful for blind and visually impaired students who can independently read and edit documents with formulas figures drawings and function graphs. The capacity to read is facilitated by the three described AR layers and the special UI. The possibility of interactive immersion in the environment of mathematical objects, which a student with visual impairment can “touch” and cause their change, is made by a set of developed algorithms for converting mathematical notation (such as UEB/MathML/UEB, AsciiMath/MathML/UEB, AsciiMath/MathML/AsciiMath), translation of MathML notation into semantic texts of the readout, translation of vector graphic inscriptions into sounds of constant and variable monotonicity and special UI adapted to the needs of the student; • Mathematical resources in the EuroMath repository allow each student to follow the appropriate educational path. So that all material from this path is tailored to the needs of the student in terms of methodology and technical availability. EuroMath was developed as part of a project (2017–2020) under the UE Erasmus + program.

4 Conclusion The presented multi-sensual AR was designed together with blind students and math teachers of the final technical classes of secondary schools. While this method has been developed with the participation of blind and visually impaired students and their math teachers, we can conclude that it has been accepted by potential users. It is currently being evaluated by mid-2020 by teachers with students in Poland, Ireland, and the Netherlands. Due to the narrow target group, we can only conduct qualitative research. EuroMath valorisation was undertaken by mathematics teachers with their students from the only special centre for children with visual impairment in Ireland, from two, out of nine existing, special centres in Poland and from five schools run in the Netherlands by an expert centre for blind and low vision children. Since the outbreak of the COVID19 epidemic, when schools and universities have been closed, the interest in ICT learning supporting tools in the reverse class model has increased as well as the interest in the EuroMath system as a tool that could be useful in this model. The third special centre in Poland and one of the universities of technology in Ireland will also partake in the valorisation of EuroMath. Currently, EuroMath is being tested by 13 mathematics teachers and students: four from the Netherlands, four from Ireland, including three academic teachers, five teachers from Poland, altogether 12 educational units. The effectiveness of the AR + UI method increasing the availability of mathematical content implemented in EuroMath has been confirmed by teachers from Ireland and Poland. One centre in Poland is already using EuroMath in its ongoing operations. The abovementioned University of technology has expressed interest in implementing the EuroMath system in its infrastructure. Teachers from the Netherlands also sent a

10

D. Mikułowski and J. Brzostek-Pawłowska

preliminary positive opinion about the method of sharing mathematical content described in this paper. In connection with the closing of schools, blind students from special centres began remote education, in which a visually impaired student is forced to become more independent in the learning process. To what extent our method has increased this independence in practice. We will be able to examine after the COVID 19 epidemic. It is worth noting that the students’ self-reliance is influenced not only by the AR + UI method itself but also by the quality and attractiveness of the materials prepared in the EuroMath application - work cards, exercises and tests available in the repository. To facilitate teachers’ work in this area, help materials in the form of instructional videos have been made available on the You-Tube channel dealing with the EuroMath system.

References 1. Aldon, G., Raffin, C.: Mathematics learning and augmented reality in virtual school. In: Prodromou, T. (ed.) Augmented Reality in Educational Settings, pp. 126–146. Brill Academic Publishers, Leiden (2018) 2. Prodromou, T. (ed.): Augmented Reality in Educational Settings. Brill Academic Publishers, Leiden (2020) 3. Bacca, J., Baldiris, S., Fabregat, R., Graf, S.: Kinshuk: augmented reality trends in education: a systematic review of research and applications. Educ. Technol. Soc. 17(4), 133– 149 (2014) 4. Brzostek-Pawłowska, J., Rubin, M., Salamończyk, A.: Enhancement of math content accessibility in EPUB3 educational publications. New Rev. Hypermedia Multimedia 25(1– 2), 31–56 (2019) 5. Bujak, K.R., Radu, I., Catrambone, R., MacIntyre, B., Zheng, R., Golubski, G.: A psychological perspective on augmented reality in the mathematics classroom. Comput. Educ. 68, 536–544 (2013) 6. Chang, K.-E., Chang, C.-T., Hou, H.-T., Sung, Y.-T., Chao, H.-L., Lee, C.-M.: Development and behavioral pattern analysis of a mobile guide system with augmented reality for painting appreciation instruction in an art museum. Comput. Educ. 71, 185–197 (2014) 7. Di Serio, Á., Ibáñez, M.B., Kloos, C.D.: Impact of an augmented reality system on students’ motivation for a visual art course. Comput. Educ. 68, 586–596 (2013) 8. Meldrum, A.: Multi-sensory math activities that really work (2018). https://www.theliteracy nest.com/2018/11/multisensory-math.html. Accessed 13 Jan 2020 9. Web Content Accessibility Guidelines (WCAG) 2.1. https://www.w3.org/TR/WCAG21/. Accessed 29 Mar 2020 10. WAI-ARIA overview. https://www.w3.org/WAI/standards-guidelines/aria. Accessed 29 Mar 2020 11. Mathematical Markup Language (MathML) version 3.0, 2nd edn. Accessed 29 Mar 2020 12. Unified English Braille (UEB). https://uebmath.aphtech.org/. Accessed 29 Mar 2020 13. Świerczek, J. (ed.): Brajlowska notacja matematyczna fizyczna i chemiczna (BNM). Kraków, Łódź (2011)

Adaptive Learning to Support Reading Skills Development for All Using a Single-Case Experimental Design to Monitor, Describe, and Assess the Impact of Adaptive Learning on Language Development of a Diversity of K-12 Pupils Lionel Alvarez1,2(&)

and Thierry Geoffre1

1

2

University for Teacher Education (UTE), Rue de Morat 36, 1700 Fribourg, Switzerland [email protected] University of Fribourg, Rue P.-A. de Faucigny 2, 1700 Fribourg, Switzerland [email protected]

Abstract. The article presents a methodological approach thought to monitor, and assess adaptive learning impact on teaching and learning French to a diversity of pupils. First, the development process of the web platform is detailed, including the didactic foundation. Second, the single-case experimental design elaborated for the testing phase is presented. Four cohorts of pupils with various profiles – including special needs – will participate. So, the assessment of the adaptive learning potential to respond to every need will be systematically documented. Keywords: Adaptive learning skills

 Single-case experimental design  Reading

1 Adaptive Learning to Support Reading Public schools are facing the challenge of offering an inclusive educational system which accommodates all pupils, regardless of their language(s) or their learning abilities. Access to reading & comprehension is then a priority, particularly when international evaluations assess the regular lowering of pupils’ performances at school [17]. Digital learning environments, notably those with adaptive learning, are promising facilitators in this school mutation towards personalized teaching and learning (T&L). They may allow to work within the paradigm of a universal design for learners defined as “a framework that provides guidelines to support children with diverse needs in the classroom” [13: 670]. According to this paradigm, the learning environment has to offer a plurality of curricula: various means of representation, of expression and of engagement [18]. It includes mediations (a variety of resources available to the user) and remediations (personalized resources recommended or pushed by adaptive learning).

© Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 11–16, 2020. https://doi.org/10.1007/978-3-030-49663-0_2

12

L. Alvarez and T. Geoffre

If adaptive learning could accurately personalize the curriculum – i.e., a differentiation of learning paths, or an individualization of remediations – it would potentially promote better learning progressions for all pupils, including learning to read and understand written words and texts. However, the ability of adaptive learning to respond to students’ diverse needs must be tested before considering a system-wide implementation. Project ALoRS – Adaptive Learning to Read at School – is a rigorous research design to develop and test adaptive technologies.

2 Methods Project ALoRS is composed of (phase a) the development of a web platform including adaptive learning, and (phase b) its rigorous test thanks to its implementation in public schools. The research design for this testing phase is twofold: (1) an experimental design with a control group and (2) a single-case experimental design with multiple baselines (SCED-MBL) randomized through cohorts. So, the learning paths will be described, and the skills development will be assessed for each cohort. This short paper focuses on the SCED-LBM which is rarely used in that kind of study. 2.1

Development of the Web Platform Based on Didactic Foundations (Phase A)

Adaptive learning involves computer-based development of individual learners’ curricula, considering their knowledge mastery within an ontology. This might be achieved by modeling the knowledge and the related didactic challenges. According to didactics, there are four prerequisites to create such personalized curricula [16]: (a) a knowledge organization repository; (b) the knowledge implemented by the students; (c) evaluation criteria; (d) a diagnostic model of knowledge mastery. Teaching and learning should share the same ontology for a fruitful environment. The anthropological theory of didactics [3, 4] makes it possible to describe the relationships between T&L items, types of tasks, techniques and technologies and therefore allows development of such an ontology. The web platform uses four related levels of knowledge repositories (T&L items, types of tasks, techniques, errors) [7]. Then, the adaptive learning can analyze student’s activity to create his/her personal praxeology (success and error categories; missing prerequisites; …) and manage his/her next learning path with automatic (proposed or mandated) remediations. To carry out this project, the team is multidisciplinary. It consists of (1) a research unit in language didactics (DidaLang – UTE Fribourg) partnering with five primary teachers, (2) a research center on digital technology for education (CRE/ATE – UTE Fribourg), (3) an Institute of Multilingualism (U-Fribourg – Switzerland), (4) LIRIS (U-Lyon – France) for computer sciences and development of the platform, and (5) CEAP (UQAM – Québec) for French didactics and the user experience’s study.

Adaptive Learning to Support Reading Skills Development for All

2.2

13

Use of the Independent Variable in the Testing Phase (Phase B)

The web platform has three modes: Mode A – Predetermined learning path (using standards). The student works on the platform on an assigned path, according to a scenario planned in advance. Mediations are available and the student’s activity is recorded, but recommendation system is off (no adaptive learning, no adaptive teaching). Mode B – Personalized remediations, based on student activity (self-referenced), to support teacher decision-making. The student works on the platform on an assigned path, mediations are available, the student’s activity is recorded, and the recommendation system is activated: relevant recommendations for adaptations or possible remediations are sent to the teacher (no adaptive learning, but adaptive teaching). Mode B’ – Automatic personalized remediations based on student activity (selfreferenced), to substitute teacher decision-making. The student works on the platform on an assigned path, mediations are available, the student’s activity is recorded and used for the automatic management of the path (complete adaptive learning). The main users of the web platform will be 3rd primary grade Swiss students (pupils aged 8–9) with diverse learning needs. The minimal skills requirement to be designated as a user is the ability to interact with the digital tool on one’s own. To ensure ecological validity, the research will take place in mainstream and specialized classrooms of public facilities with French as the school language. Four cohorts of pupils have been identified so that the participants represent the diversity of classrooms in public schools: cohort 1 – students without special needs and enrolled in an ordinary school, cohort 2 – students with a dyslexia diagnosis or following speech therapy, and enrolled in an ordinary school, cohort 3 – students with French as L2, and enrolled in an ordinary school, and cohort 4 – students from a special education track. To have enough participants in each cohort, at least 15 ordinary classrooms (*300 pupils) and 6 special education classrooms (*50 pupils) will take part in the experiment. Project ALoRS will thus involve *350 pupils (research design with a control group), and among these, 100 will be randomly selected for a thorough analysis of learning paths (SCEB-MBL). To ensure a diversity of profiles, this randomized selection will be organized so that 25 students correspond to each of the four cohorts. During at least 10 weeks, each user will work on the platform with either the adaptive learning activated (modes B and B’) or deactivated (mode A). Each pupil will perform four courses, and pupils will spend three times 30′ each week working on the web platform. As data are collected automatically throughout the implementation phase, it will be possible to document implementation fidelity of the intervention [22]. 2.3

Dependent Variables

Two categories of data are automatically collected by the web platform: (A) Usage modalities – time spent on item or module, used mediations, and (B) Responses – accuracy by item and module, errors. The platform will automatically generate files of collected data after each session.

14

2.4

L. Alvarez and T. Geoffre

Single-Case Experimental Design

Such a design is considered rigorous [12], it allows the analysis of a functional relationship between an intervention and several dependent variables [8, 10], and enables the description of learning paths thanks to the continuous and iterative data collection for each pupil. A SCED-MBL is preferred because a reversal design would not be possible (learning and development are expected). Besides, internal validity is considered better with SCED-MBL [21] and randomization tests are possible [6]. For this latter purpose, a double randomization attribution (Koehler-Levin Method) has been integrated into the present design. It will further improve internal validity [14]. First, the order of intervention starts has been randomized between each cohort. Sequence 1342 has been chosen among 24 possible cohorts permutations. Second, the starting point has been selected between three possible permutations for each cohort. Thus, the selected design is 1 of 1944 (34 * 24) possible permutations and foreshadows statistical tests that can be conducted. An ABB’ and an AB’B setup – where A is the baseline; B is mode B activated, B’ is mode B’ activated – have been chosen to assess the impact of the intervention introFig. 1. ABB’ and AB’B multiple baseliduction order. Each phase will have at least nes designs randomized through cohorts five data points (two weeks per phase are planned). See Fig. 1 for details. 2.5

Analyses of Single-Case Design Data

Because a consensus has emerged in the literature about the need for complementarity between a visual and statistical analysis in SCED [9], both will be conducted. First, a visual analysis will include the investigation of level, slope, variability, superposition, immediacy, and consistency analysis [15] for each variable. Then, three types of statistical tests will be considered and adopted based on the visual analysis results: • Percentage of points exceeding the median or the median trend [19]; • Tau-U, with or without phase A correction [20]; • Randomization tests on the levels, the variability or the slopes [2, 5]. These tests will quantify the functional relations between the intervention and various dependent variables (e.g., number of errors, time on task, number of used mediations). Thus, data collected will (1) provide a detailed visualization of learning paths made possible for each cohort and each case, (2) allow a systematic visual analysis of these data, and (3) enable statistical tests that quantify how the intervention impacts the dependent variables for each cohort. This research design will make it

Adaptive Learning to Support Reading Skills Development for All

15

possible to evaluate the implementation time and dosage needed to make an effect of the adaptive learning visible. The SCED-MBL as designed in project ALoRS, follows standards without reservation [11, 12]. Indeed, more than 6 phases are included, and at least 5 points per phase are planned.

3 Conclusion Considering French teaching and learning in primary classes, the ability of an adaptive learning device to respond to students’ diverse needs has still to be systematically tested. This paper gives a short view of our project and explains our methodological investigation on how a single-case experimental design may allow the documentation of learning paths, (re)mediations used, number and types of errors, as well as the rigorous evaluation of adaptive learning based on didactic foundations. Such a project is at the intersection of research and practice, between didactics and implementation, and opens to a fruitful documentation of the impacts of personalized (re)mediation automatically pushed to students (to improve skill development) or proposed to teachers (to assist them in reducing the language’s access gap across heterogeneous groups of pupils). This should help to shape the future of digital devices for teaching and learning.

References 1. Anderson, J., Franklin Boyle, C., Reiser, B.J.: Intelligent tutoring systems. Science 228, 456–461 (1985). https://doi.org/10.4324/9781315617572 2. Bulté, I., Onghena, P.: Randomization tests for multiple-baseline designs: an extension of the SCRT-R package. Behav. Res. Methods 41(2), 477–485 (2009). https://doi.org/10.3758/ BRM.41.2.477 3. Chaachoua, H.: La praxéologie comme modèle didactique pour la problématique EIAH. Étude de cas: la modélisation des connaissances des élèves. University of Grenoble, Grenoble (2011) 4. Chevallard, Y.: Concepts fondamentaux de la didactique: perspectives apportées par une approche anthropologique. Rech. en Didact. des Mathématiques 12(1), 73–112 (1992) 5. Ferron, J.M., Levin, J.R.: Single-case permutation and randomization statistical tests: present status, promising new developments. In: Kratochwill, T.R., Levin, J.R. (eds.) Single-Case Intervention Research. Methodological and Statistical Advances, pp. 153–183. American Psychological Association, Washington (2014) 6. Ferron, J.M., Sentovich, C.: Statistical power of randomization tests used with multiplebaseline designs. J. Exp. Educ. 70(2), 165–178 (2002) 7. Geoffre, T., Colombier, N.: Français langue de scolarisation: Réflexions sur les référentiels de compétences et l’adaptive learning. In: Actes de la 9e Conférence sur les Environnements Informatiques pour l’Apprentissage Humain, pp. 199–204 (2019) 8. Gresham, F.M., Vanderwood, M.L.: Quantitative research methods and designs in consultation. In: Erchul, W.P., Sheridan, S.M. (eds.) Handbook of Research School Consultation, pp. 63–87. Routledge, Mahwah (2008)

16

L. Alvarez and T. Geoffre

9. Heyvaert, M., Wendt, O., Van den Noortgate, W., Onghena, P.: Randomization and dataanalysis items in quality standards for single-case experimental studies. J. Spec. Educ. 49(3), 146–156 (2015) 10. Horner, R.H., Carr, E.G., Halle, J., Mcgee, G., Odom, S.L., Wolery, M.: The use of singlesubject research to identify evidence-based practice in special education. Except. Child. 71 (2), 165–179 (2005) 11. Institute of Education Sciences: WWC Procedures and Standards Handbook. What Works Clearinghouse (2014). http://ies.ed.gov/ncee/wwc/DocumentSum.aspx?sid=19 12. Institute of Education Sciences: Find What Works. What Works Clearinghouse (2016). http://ies.ed.gov/ncee/wwc/findwhatworks.aspx 13. Kennedy, J., Missiuna, C., Pollock, N., Wu, S., Yost, J., Campbell, W.: A scoping review to explore how universal design for learning is described and implemented by rehabilitation health professionals in school settings. Child Care, Heal. Dev. Heal. 44, 670–688 (2018). https://doi.org/10.1111/cch.12576 14. Kratochwill, T.R., Levin, J.R.: Enhancing the scientific credibility of single-case intervention research: randomization to the rescue. Psychol. Methods 15(2), 124–144 (2010) 15. Kratochwill, T.R., Levin, J.R., Horner, R.H., Swoboda, C.M.: Visual analysis of single-case intervention research: conceptual and methodological issues. In: Kratochwill, T.R., Levin, J. R. (eds.) Single-Case Intervention Research. Methodological and Statistical Advances, pp. 91–125. American Psychological Association, Washington (2014) 16. Mandin, S., Guin, N.: Prise en compte d’une ontologie des savoirs dans la construction d’un profil d’apprenant. Rapport de recherche, CNRS (2014) 17. OCED: Résultats du PISA 2018. OECD Publishing, Paris (2019). https://doi.org/10.1787/ ec30bc50-fr 18. Meyer, A., Rose, H.D., Gordon, D.: Universal Design for Learning. Theory and Practice. CAST incorporation, Wakefield (2014) 19. Parker, R.I., Vannest, K.J.: An improved effect size for single-case research: nonoverlap of all pairs. Behav. Ther. 40(4), 357–367 (2009) 20. Parker, R.I., Vannest, K.J., Davis, J.L., Sauber, S.B.: Combining nonoverlap and trends for single-case research: Tau-U. Behav. Ther. 42(2), 284–299 (2011) 21. Rivard, V., Bouchard, S.: Les protocoles à cas unique. Une façon tout aussi intéressante de faire de la recherche. In: Bouchard, S., Cyr, C. (eds.) Recherche psychosociale pour harmoniser recherche et pratique, 2nd edn, pp. 207–243. Presse de l’Université du Québec, Québec (2010) 22. Hagermoser Sanetti, L.M., Kratochwill, T.M.: Treatment Integrity. A Foundation for Evidence-Based Practice in Applied Psychology. American Psychological Association, Washington (2014)

General ITS Software Architecture and Framework Nikolaj Troels Graf von Malotky(&) and Alke Martens University of Rostock, 18059 Rostock, Germany [email protected]

Abstract. ITSs are developing into more and more complex systems. The classic components of the general purpose ITS software architecture are used widespread but its focus is on the databases and the user interface. To reflect the functionalities and complexity needed to accomplish an ITS of today’s standards a new software architecture was developed which emphasis the definition of functionality components and component connections without ignoring the classical components. A 5 layer architecture with a detailed separation into components but still abstract definition of an ITS is being shown in UML. Upcoming general ideas of important functionality is included and split up into more abstract packages. Keywords: ITS

 Software architecture  Software framework

1 Introduction ITS as a tool for teaching and collecting research data for teaching strategies have the problem that the various implementations of ITS are difficult to compare and merge, while the their domains diversifies. Research data can be more easily merged, prototype components more easily reused and research results more easily compared if the internal architecture of the ITS and their data are commonly defined in the same way. To include as many ITS types as possible this has to be done in a general way for a software architecture. Given the current state of the art of development in ITS, newcomers have to face a plethora of systems, which in most cases can not be re-used or reimplemented. They should be guided and supported in their development by concepts and frameworks which can be used in various domains. The software architecture helps to explain the requirements and divide them into manageable and exchangeable components. The software framework helps to implement these components by already implementing them with base functionality and connections to only allowed other components. This reduces errors or misuse so that implemented systems avoid common mistakes in ITS development.

2 Development of the Software Architecture There are some standard software architectures for ITS which define very well the basis of what an ITS is. This has been stable for decades [1, 2]. There are different general software architecture approaches which are successful in multiple domains and are © Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 17–22, 2020. https://doi.org/10.1007/978-3-030-49663-0_3

18

N. T. Graf von Malotky and A. Martens

already used by publications about ITS software architectures. The most known ones are the client server architecture and the component architecture. The client server architecture is used in [3]. The component based architecture is almost in every one of these but most prominently seen with different nested components in [4]. Some midlevel use of the layered architecture in [9]. There is also the message bus architecture in [5]. The lesser used ones are peer to peer architecture used in [6], the repository architecture in [7] and the domain driven design architecture in [8]. The classical components were taken as a basis not to neglect the previous research. It consists of the user interface component, domain knowledge component (also called “expert knowledge”), pedagogical knowledge component (also called and the “pedagogical module”) and the student knowledge component (also called “student model”). Reducing the tasks that each component may only do to the core of what they were supposed to do. This means that the user interface being only that and the remaining components only manage their respective knowledge database. The last three of these can be grouped into database functionality. There are many implicit functionality requirements for an ITS that can not be in these four components anymore. This functionality has to be added into the ITS with new components. There are three groups: the user facing component, the need for functionality components and the database facing components. The needed functionality components would be in between the user interface and the database. A layer architecture would be perfect for that, filled with components to separate the concern of each layer into smaller reusable and exchangeable units. A reason the client server architecture is used that often in ITSs is because the user is was expected to have a web browser and the ITS is the server which delivers the client. This is too specific for a general ITS architecture. The component based architecture in combination with the layer architecture is a good choice for a general architecture. To not define the implementation specific details of the concrete user visible views or the databases, the upper and lower layer had to be split up and therefore adding two more layers to the architecture. The architecture is split into 5 layers. With the outer 2 layers including implementation specific details, so these outer layers can not be specified by this general architecture. They show a place where domain and implementation specific modelling for the user interface in the “Interaction layer” and for the data bases in the “Data layer” in future iterations can be integrated. The middle layer (“Functionality layer”) defines the functionality of the ITS. The “Temporary layer” defines which groups of views are available. Each group represent collective temporary data of multiple related views. The “Persistence layer” defines which groups of data exist and needs to be saved. It groups the data not for performance, but which content is similar. The components of the “Temporary layer” and the “Persistence layer” represent a new look on the requirements with respect to the classical components. That means there is already a rough structure of the database consisting of three components but the user interface stands on its own to represent a whole layer. This needs defining components which are all on the same detail of abstraction. Therefore the user interface component is split into two different components. One consisting of the teaching scenes (where each scene can include multiple views) named “teaching scenes” and the other one consisting of other scenes named “additional scenes”. This splitting of the layer can

General ITS Software Architecture and Framework

19

also be reused in the functionality layer, where there are functionality which is expected from an ITS and is necessary to directly fulfill the these demands and the functionality which is common in computer programs or not directly used to teach. Since the splitting of the database in the three classical components often used and accepted in the ITS community, they had to stay. But it was necessary to have a defined area to save all sorts of data from the software which can not be fitted into the other 3. A fourth database component was added named “program knowledge component”. In summary you see the components of the layers and their dependencies in Fig. 1.

Fig. 1. Abstract view of the new software architecture as a 5 layered architecture

Taking the idea to split the learning material as described in [9] to fulfill the requirements of guidance and interactivity. Each of them has different requirements in how it is presented, stored and evaluated when used with the student. Informative learning material is the basis fulfill its role it has to give an introduction to what should be learned. Interactive ITS has to verify that the presented information is not only been viewed but also understood correctly. Solving different tasks to the same material satisfies the claim that the student has understood the needed domain knowledge. Splitting of the learning material also needs to be respected in the components of the teaching scenes and in the subcomponents of the knowledge components of the persistence layer. The architecture is based on different researches in the field of software architectures and ITS architectures [10]. One basic advantage of the new architecture is that it implements an ITS process (e.g. the general ITS teaching process [11]) in its own component to be modular replaceable without effecting the rest of the architecture. The steps of the general process build basic requirements on the functionalities which should be represented in the components of the ITS core functionalities. Additionally included ideas are automatic learning content generation, central knowledge repository, authors of knowledge as users of the system, subjective evaluation of the student, summary for human analysis, profiles, supervisor authorities and collaborative learning.

20

N. T. Graf von Malotky and A. Martens

3 The Resulting Detailed Software Architecture It is more sophisticated than other architectures without loosing its generality, a visual and formal representation in form of UML and the definition of the allowed connections between the components of each abstraction level. Each component may only communicate with components to which it has defined connections. The overview of the components and their dependencies are shown in Fig. 2.

Fig. 2. General purpose ITS software architecture components and dependencies

Only the newly created components of the Functionality layer will be described here. The process steering component has the task to have control over the teaching process, it can be exchanged and is explicitly defined. The bigger next step of what to do is handled in the process but there are still core functionalities which are important for ITSs. Since ITSs should adapt they first need parameters on which to adapt. The content of the student knowledge is used to create a statistic summary in the summary component which can be evaluated by humans. This analysis can also take place through the profile generation component to create a profile for the ITS to get the traits to which the ITS can adapt to. With the profile the suggestion component is able to suggest learning material that matches the abilities and preference of the student. The learning material itself can be adapted through the adapt component. While in a learning session the input from the student (especially in interactive problem solving tasks) needs to be understood (e.g. categorized, tagged, etc.) by the system. Comprehension is necessary if the students can make actions which are semantically not already predefined for the system before, e.g. a free text input field, and happens in the Comprehend component. When the systems understands what the content of the action of the student was it can be rated to the goal in context, if the action was showing that

General ITS Software Architecture and Framework

21

the student was able to progress and he was able to correctly answer. The ITS is able to create a feedback for the student through the gathered data of the input and the previous knowledge about him. In case the student needs help by being inactive or needing more fundamental knowledge the Help component would jump in to activate, gather material not out of the focus and fundamental to the task. The system itself also has functionality which are optional but helpful for an ITS. The functionality to edit the knowledge base of an ITS while only using the user interface is important for an ITS in the long run to create new learning material. It is a special role that needs its own requirements for the functionality which are enclosed in the Author component. The corresponding user interface is enclosed in the Author scene. In some cases the creation of material can also be a responsibility of the system itself. The Teaching enrichment component is able to create new learning material from existing domain knowledge. Since there are possibilities that not only the students are interacting with the ITS but also supervisors or teachers, the possibility to examine and control the student actions needs functionality requirements which are completely different from the author or students. This functionality is included in the Authority component and the corresponding user interface is Supervisor scene or the Teacher scene, depending on the authority role. The network capabilities are needed to use a features that are not included directly into the ITS, let it be remote supervisors/colleagues or a remote ITS itself. The functionalities of any network activity and transmission is encapsulated in the Network component. To allow research, high quantities of data helps to get clues what are interesting topics. An included feature of an ITS to upload collected data or download updates of new methods to/from central repository. For the ability to be able to manage multiple users there is the Multiple user component. The functionality to allow a collaboration of multiple students to work together in the same learning session is inside the Collaboration component. The important part of the software architecture is of course not only the components itself, but how they are allowed to communicate to each other.

4 Usage of the Architecture The new architecture was used to implement a software framework to verify its applicability in object oriented programming languages and modern platforms. There are also software frameworks which support the creation of an ITS on the level of implementation details [12]. The built software framework presented in this paper allows to be as general as the software architecture defined here. Added default functionality to build upon is only one implementation option and is only subclassed without forcing the user to use anything of the default implementation. It allows the programmer of the ITS to focus on the features to implement and not how an ITSs can be built or which pitfalls are common. Currently, the framework is also used in an ITS in development for a project, where healthcare students shall be supported by an intelligent adaptive dialogue system. This project has started but even after six month of project runtime, it is obvious that the architecture and the framework are a good support structure for the ITS design process.

22

N. T. Graf von Malotky and A. Martens

References 1. Anderson, J.R., Boyle, D.G., Reiser, B.: Intelligent tutoring systems. Sci. 228, 456–462 (1985) 2. Ahuja, N.J., Sille, R.: A critical review of development of intelligent tutoring systems: retrospect, present and prospect. IJCSI Int. J. Comput. Sci. 10(4), number 2 (2013). ISSN (Print) 1694-0814, ISSN (Online) 1694-0784 3. Alpert, S.R., Singley, M.K., Fairweather, P.G.: Deploying intelligent tutors on the web: an architecture and an example. Int. J. Artif. Intell. Educ. (IJAIED) 10(2), 183–197 (1999). HAL hal-00197339 4. Clancey, W.J.: Methodology for building an intelligent tutoring system. In: Methods and Tactics in Cognitive Science (1984) 5. Zouaq, A., Frasson, C., Rouane, K.: The explanation agent. In: Gauthier, G., Frasson, C., VanLehn, K. (eds.) ITS 2000. LNCS, vol. 1839, pp. 554–563. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45108-0_59 6. Mitsuru, I., Riichiro, M.: FITS a framework for ITS - a computational model of tutoring. J. Interact. Learn. Res. 5(3), 319 (1994) 7. Sunandan, C., Devshri, R., Anupam, B.: Development of knowledge based intelligent tutoring system. TMRF e-Book Adv. Knowl. Based Syst. Model Appl. Res. 1, 74–100 (2010) 8. Crowley, R., Medvedeva, O.: SlideTutor: a model-tracing intelligent tutoring system for teaching microscopic. Artif. Intell. Educ. Shaping Future Learn. Through Intell. Technol. 97, 157 (2003) 9. Song, J.S., Hahn, S.H., Hahn, K.Y., KIM, J.H.: An intelligent tutoring system for introductory C language course. Comput. Educ. 28(2), 93–102 (1997) 10. von Malotky, N.T.G., Martens, A.: Analyzing the usage of the classical ITS software architecture and refining it. In: 15th International Conference on Intelligent Tutoring Systems (ITS) (2019) 11. von Malotky, N.T.G., Nicolay, R., Martens, A.: Centralizing the teaching process in intelligent tutoring system architectures. In: 19th International Conference on Advanced Learning Technologies, ICALT (2017) 12. Lelouche, R., Ly, T.T.: Using a framework in the development of an intelligent tutoring system, information reuse and integration. In: IEEE International Conference on IRI 2003, pp. 291–298 (2003). https://doi.org/10.1109/iri.2003.1251428

Let the End User in Peace: UX and Usability Aspects Related to the Design of Tutoring Systems Juliano Sales1 , Katerina Tzafilkou2 , Adamantios Koumpis3(B) , Thomas Gees3 , Heinrich Zimmermann3 , Nicolaos Protogeros2 , and Siegfried Handschuh1 1

University of St. Gallen, M¨ uller-Friedberg-Strasse 8, 9000 St. Gallen, Switzerland [email protected] 2 University of Macedonia, Egnatia 156, 546 36 Thessaloniki, Greece [email protected] 3 Institut Digital Enabling, Berner Fachhochschule, Br¨ uckenstrasse 73, 3005 Bern, Switzerland [email protected]

Abstract. In the paper we address the research question of whether non-experienced and untrained end-users may efficiently design and customise their own interaction experiences as part of a tutoring system. To this end, we present the main end-user development (EUD) approaches highlighting user engagement, usability and user experience (UX) design principles. We examine aspects related to the triptych of efficiency, efficacy, satisfaction of all user development and learning activities. To this direction, we offer suggestions on how such a EUD approach for an Intelligent Tutoring System (ITS) setting can form part of a maker space for providing sustainable and hands-on learning experiences. Keywords: End-User Development · Explainable AI (X-AI) analysis · Usability & user experience

1

· Gender

Introduction

There seems to exist a mismatch between End-User Development (EUD) environments and expectations from the industry side and how these are - suboptimally, as we believe and aim to show - met by the vendors of systems. As identified by Protogeros and Tzafilkou [1] “the main research question is: can non-experienced and untrained end-users efficiently design relational schemes to develop database-driven applications?”. In the case of a tutoring system and especially one that is aspiring to offer intelligence as part of its behaviour and features to the end users, the question is still the same: do the end users need to be specially trained for the customisation they wish to experience? Based on this c Springer Nature Switzerland AG 2020  V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 23–26, 2020. https://doi.org/10.1007/978-3-030-49663-0_4

24

J. Sales et al.

question, an essential concern is to render non-experienced end users capable of building their individually tailored tutoring experiences and efficiently design the required service and application components without the need to be trained in the discipline of learning management systems or e-learning environments. Engaging end-users, and particularly female users, in efficiently building their own applications is addressed by several works [2,3] mainly focusing on the system design characteristics. In this respect, we suggest that EUD environments should be highly user-centred, encapsulating all the main User Experience (UX) design principles to finally evoke UX. Doing so, end users would be highly engaged to design their own web applications and hence tutoring systems.

2

Defining the EUD Space: UX and Usability Aspects

EUD is usually defined as a set of methods, techniques, and tools that allow users of software systems, who are acting as non-professional software developers, at some point to create, modify or extend a software artefact [4]. Table 1 summarizes some popular EUD approaches and examples defined in the literature review. Table 1. Examples of EUD approaches. Approach

Example

GUI database design CRIUS

Source [5]

WYSIWYG

Surprise-explain-reward strategy

Wizard-logic

Simple-talking

[1]

NLP

Semantic parsing text follows

[7]

[6]

Experiential learning EUD tasks for web development and data science [2, 8]

Even when considering the case of IKEA end-user customisation, customers may have the chance to customise e.g. an IKEA wardrobe twice: 1. once when they buy and build it i.e. how many drawers to have, colours, design, etc; and 2. once when they start using it by e.g. putting together socks and underwear or separating them according to criteria: – socks here - underwear there; or – winter socks and winter underwear here; or – summer socks and summer underwear there; or – sexy socks and underwear here; or – everyday socks and underwear there; or – etc. So the idea is that all EUD tasks relate to either the design customisation or the use of customisation. But all these are relatively limited. Important would be from the end user’s perspective to allow for seamless integration of needs coming from the everyday use into the design of the product itself. Back to the IKEA wardrobe analogy, what does this mean is that the more a user uses a drawer, the bigger it should become, taking space from other drawers

Let the End User in Peace

25

that are underused. And the more clothes one department of the wardrobe needs to store, the more this department expands, possibly changing the overall space that the wardrobe takes within the user’s room. EUD may need to reflect principles both of usability and UX design, the latter being defined as “all aspects of the end-users’ interaction with the company, its services and its products” [9]. There are three main usability issues that need to bother us when considering the case of EUD in a practical setting - and all three happen to be the governing principles of UX design [10], namely: Efficiency (does the system make the right thing?), Effectiveness (does it do it with a minimum effort?) and Satisfaction (does it make the end user feel good?). The main aim is to not reduce the distance between design and use - this distance will always exist as design relates to the expectations of the user from the system and use reflects the user’s actual wants or capabilities.

3

An Application Scenario

An Intelligent Tutoring System that would allow for the feelings of the end users to be taken into account, exhibiting the type of empathy that the user likes, needs and allows for bringing in the interaction with the system may increase the efficacy of the learning experience and also help meet the set learning goals. Facenet [11] and Sphereface [12] have left a lasting mark in the way that a face recognition system may be deployed to improve interaction with end users. Similar has been the case of the Blue Frog Robotics Buddy [13] that is an emotional companion robot that has a range of emotions that are expressed naturally throughout the day based on his or her interactions with family members. For example, Buddy will be happy to give a warm welcome when one comes home, and will sometimes be grumpy if people have not paid attention to him/her, or sometimes without any particular reason, just because that morning, s/he is not in a good mood. To this end, a EUD environment for building such intelligent tutoring systems could encapsulate the same principles.

4

Summary

In short, we see as real challenges for UX optimised EUD regarding Intelligent Tutoring Systems to empower users to shape their individual interaction styles by means of allowing the ITS to capture multi-modal information that will result into processing of all three channels, namely face, speech and language, in a way that will improve both the developing and the learning experience for the users. For instance, end-user personas and emotional design shall be taken into consideration during the developing process. Speech and face effective recognition would allow high EUD customisation engaging end users developing engaging tutoring systems. Gender HCI [3] forms the basic ground towards female-friendly or gender-neutral EUD design addressing the low engagement of women in developing activities.

26

J. Sales et al.

References 1. Nicolaos, P., Katerina, T.: Simple-talking database development: let the end-user design a relational schema by using simple words. Comput. Hum. Behav. 48, 273– 289 (2015) 2. Tzafilkou, K., Chouliara, A., Protogeros, N., Karagiannidis, C., Koumpis, A.: Engaging end-users in creating data-intensive mobile applications: a creative ‘elearning-by-doing’ approach. In: 2015 International Conference on Interactive Mobile Communication Technologies and Learning (IMCL), pp. 274–278 (2015). https://doi.org/10.1109/IMCTL.2015.7359602 3. Beckwith, L.: Gender HCI issues in end-user software engineering. In: Proceedings of IEEE Symposium on Human Centric Computing Languages and Environments, pp. 273–274 (2003). https://doi.org/10.1109/HCC.2003.1260246 4. Lieberman, H., Patern´ o, F., Klann, M., Wulf, V.: End-user development: an emerging paradigm. In: Lieberman, H., Patern´ o, F., Wulf, V. (eds.) End-User Development. Human-Computer Interaction Series, vol. 9, pp. 1–8. Springer, Dordrecht (2006). https://doi.org/10.1007/1-4020-5386-X 1 5. Qian, L., LeFevre, K., Jagadish, H.V.: CRIUS: user-friendly database design. Proc. VLDB Endow. 4(2), 81–92 (2010) 6. Ko, A.J., et al.: The state of the art in end-user software engineering. ACM Comput. Surv. 43(3), 1–44 (2011) 7. Sales, J.E., Freitas, A., Handschuh, S.: An open vocabulary semantic parser for end-user programming using natural language. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pp. 77–84 (2018). https://doi.org/10. 1109/ICSC.2018.00020 8. Serrano, E., Molina, M., Manrique, D., Baumela, L.: Experiential learning in data science: from the dataset repository to the platform of experiences. In: 13th International Conference on Intelligent Environments (2017) 9. Norman, D., Nielsen, J.: The definition of User Experience (UX) (2014). https:// www.nngroup.com/articles/definition-user-experience. Accessed 24 Feb 2020 10. Soegaard, M.: What is User Experience (UX) design? (2020). https://www. interactiondesign.org/literature/topics/ux-design. Accessed 20 Jan 2020 11. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823 (2015). https://doi.org/10.1109/CVPR. 2015.7298682 12. Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: deep hypersphere embedding for face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 212–220 (2017) 13. Milliez, G.: Buddy: a companion robot for the whole family. In: Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, p. 40. ACM (2018)

Developing a Multimodal Affect Assessment for Aviation Training Tianshu Li(&), Imène Jraidi, Alejandra Ruiz Segura, Leo Holton, and Susanne Lajoie McGill University, 3700 McTavish Street, Montreal, QC, Canada [email protected]

Abstract. This paper presents a multimodal affect assessment protocol developed for aviation training, which consists of physiological, behavioral measures of affect and subjective self-report of affective correlates. Data convergence is examined by comparing physiological and behavioral data output with selfreport variables. We found significant correlations between arousal inferred from electro-dermal activity (EDA) and self-reported workload, fatigue and effort. We also found that the intensities of emotions inferred from facial expression correlate with self-reported variables. These findings support the validity of EDA and facial expression as measures of affect in aviation training context. Keywords: Affect activity

 Learning  Aviation  Facial expression  Electro-dermal

1 Introduction Affect can be defined as emotional states, moods and stress responses [1]. It has an impact on learning by influencing cognitive, psychomotor and motivational processes [2]. Its impact is pertinent in aviation training, during which trainees are required to execute proper psychomotor skills and make decisions under complex and stressful situations [3]. Stress and emotions are among the six main causes of fatal accidents in aviation [4]. Ill-regulated affect can interfere with pilots’ judgement, situation awareness and action execution [5]. Flight safety requires pilots to learn to make effective decisions. Trainees should be trained in flying skills and affective control. The recent surge in affective computing technologies brings the possibility of incorporating an intelligent affect monitoring system to aviation training [6]. Such system could allow instructors and trainees to identify maladaptive affects by examining post-analysis data and system prompts. Currently, affect is only assessed subjectively by pilot instructors who are also tasked with instructing, scaffolding and evaluating trainees’ performance. Subjective reports require additional time to complete and may disrupt the training, which is costly in aviation simulators. Therefore, alternative methodologies are needed to measure affect during aviation training. Some empirical aviation research that has measured stress and workload using physiological and neurological measures (e.g. [7]). However, there is a lack of research that examines the behavioral cues of affect and the measurement of emotions in aviation training. © Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 27–37, 2020. https://doi.org/10.1007/978-3-030-49663-0_5

28

T. Li et al.

Considering the multi-componential manifestations of affect and its situationspecificity [1], affective states should be measured using multiple data sources. Inferences of the affective states should be triangulated during the analysis. Therefore, we aim to develop a multimodal protocol for assessing affect in the context of simulated aviation training. This protocol includes measures of physiological, behavioral and experiential affective states in simulated aviation context.

2 Previous Research We reviewed relevant literature in three domains: 1) an overview of the educational psychology literature on the role of affect in learning, to identify potential research questions and hypotheses; 2) relevant literature in aviation psychology that demonstrate how educational psychology theories on affect transfer into the aviation context; and 3) previous research on using biometric assessments of affect in aviation, to select the affect measurements to include in our protocol. Role of Affect in Learning. Affect can be described using two dimensions: valence and arousal. Valence refers to how negative or positive the affective experience is, for example, negative as frustration, or positive as joy [8]. Arousal is the activation of the sympathetic nervous system. High arousal is coupled with intense activating emotions and physiological changes (i.e. electro-dermal activity) and has been linked to influencing cognition [9]. Arousal also influences psychomotor skills, compromising coordination and reaction when arousal is excessive or insufficient [10]. Affect is situation specific as it embodies appraisal, which is one’s valuation system and perceived control over the object of focus [2, 10, 11]. Students’ self-concept and valuation system determines their affective reactions in a learning process, hence leads to further differences in cognitive, motivational, and behavioral outcomes [11]. The strong link between appraisal and affect could help us interpret physiological or behavioral data. Therefore, we include a self-report measure of appraisal in this study. We refer to two appraisal domains, perceived control and value, as the ‘grounded truth’, or standard for determining one’s affective experience [6]. Education researchers have examined the role of affect in many learning contexts. In math homework sessions, negative emotions negatively predict students effort and achievement [12]. In physics problem solving, prolonged confusion leaded to frustration and task disengagement [19]. Negative emotions such as surprise, confusion and frustration allegedly follow discrepant events, whereas positive emotions, as happiness, are induced by solving events [13]. These findings confirm our belief about the importance to integrate affect assessment in aviation training to enhance learning efficiency and ameliorate training experience. Affect Research in Aviation. Flight safety requires pilots to make complex decisions under stressful situations. Pilots need to accurately define the situation according to their knowledge and the environmental cues, manage their stress and minimize risks [14]. Previous research has explored the effect of affective valence on cognitive factors. For instance, decision making could be impaired by negative affect [5].

Developing a Multimodal Affect Assessment for Aviation Training

29

Despite the theoretical and empirical acknowledgement of the importance of affect, its integration to aviation training is limited. Current self-report measures of affect in aviation, such as retrospective interview [15] and NASA Task Load Index (NASA TLX) [16], are time-consuming. These measures will disrupt students’ training and may overtask instructors. Therefore, our goal is to explore real-time automatic measures of affect. To evaluate data convergence, we integrate NASA TLX as part of the ‘grounded truth’ assessment, which assess cognitive and behavioral correlates of affect, such as mental workload and fatigue. Biometrics Assessment of Affect in Aviation. Our literature review was conducted using a multi-disciplinary database, Scopus [17], to find previous research using biometric measures in the aviation context. Our search yielded 120 valid peer-reviewed papers. We found that physiological measures such as heart rate variability, electrodermal activity (EDA) and blood pressure, are the most frequently used. Among the 120 papers, only 13 of them examine affect through biometric measures, from which 11 measure stress (e.g. [7]). Aviation research using biometrics to infer affect is still rare, especially regarding discrete emotional states that are relevant in learning. Furthermore, there is a lack of assessment using behavioral measures, such as facial expression, voice and posture analysis.

3 Research Questions and Hypotheses This study is a part of a bigger project also involving primarily Concordia University, University of Montreal, National Research Council and CAE Inc. to evaluate the use of various biometric measures. This study is the preliminary step to design a multimodal affect assessment feasible for aviation training. In our multimodal protocol, we assessed real-time physiological (EDA) and behavioral (facial expression) measures of affect while participants perform simulated flight maneuvers. Subjective experience of affect was assessed by semi-structured interview and validated questionnaires [16, 18, 19] on correlates of affect, including workload, fatigue, perceived control and value for the tasks. Self-report is used as ‘grounded truth’ of the participants’ affective experience. For accessibility, we chose a computer-based environment and participants with limited aviation experience, alike beginner trainees. We aim to assess the convergence among the data sources through correlational analysis. We evaluate the validity of using EDA and FaceReader by asking these research questions: 1. Is there a relationship between EDA features and self-reported appraisal, workload, fatigue and effort? We expect arousal inferred from EDA features to correlate positively with workload, fatigue and effort [20, 21]. 2. Is there a relationship between emotions inferred from facial expression analysis and self-reported variables? We expect negative, activating emotions to correlate positively with workload, fatigue and effort and negatively with perceived control and value. We expect opposite trends for positive emotions [2, 5, 22]. 3. Are there any individual differences on how biometric measures (EDA and facial expression) relates to self-reported variables? We hypothesize that there will be

30

T. Li et al.

some variation among individuals on these relationships between biometric measures of affect and self-reported affect correlates, as affect is influenced by one’s self-concept and appraisal of the task [9].

4 Methodology The experiment environment and tasks are designed in collaboration with subjectmatter experts from CAE Inc. We conducted an experiment using flying tasks in XPlane, a flight simulation software commonly employed for pilot training [23]. X-Plane aviation tasks are adapted to beginner pilot trainee level by CAE technology consultants to fit the participants’ skill level. Our acquisition protocol includes three-modality features: (1) physiological arousal is inferred from electro-dermal activity (EDA). (2) The behavioral cues of affect are analyzed through facial expression recording. (3) The experiential manifestation of affect (‘grounded truth’) is assessed through selfreport questionnaire administrated on a laptop. This study received IRB approval from McGill University. 4.1

Procedure

The experiment starts with participant briefing. Afterwards, participants complete a demographic questionnaire and provide consent. They are introduced to the X-Plane interface and the use of joystick. They get a training session in which they practice the maneuvers involved: maintaining or changing altitude, speed and heading. Once they can return to the baseline parameters, the experimental tasks begin. The session includes ten tasks. Each task initiates with verbal instruction from the experimenter (e.g. ‘turn right at 30-degree banking angle to heading 0, maintain speed and altitude’), then the participant executes maneuvers until reaching the objective. The first task is of minimal difficulty: maintain baseline parameters. After, participant complete a selfreport questionnaire. Then, participants perform eight tasks grouped in pairs by difficulty level. The second task of each pair is the reverse maneuver of the first task. After each pair, participants complete questionnaires and take a break if needed. For the last task, participants must maintain baseline parameters for 60 s and complete a final questionnaire. Details on the questionnaire is described in Sect. 4.2. In each experiment session, six tasks are followed by retrospective questionnaires. Participants. 14 participants (9 females) were recruited from the undergraduate and graduate student population at McGill University. They come from different ethnic backgrounds: 4 Asians, 5 Whites and 5 from other ethnicities. Their age ranges from 19 to 35. They are compensated ten dollars per hour for their participation. 4.2

Measurements

Physiological Arousal: Electro-Dermal Activity and BioPac. Electrodermal activity is measured by BioNomadix EDA module [24] with a sampling rate of 1000 Hz.

Developing a Multimodal Affect Assessment for Aviation Training

31

Normalized SCR features and skin conductance level are extracted through Makowski’s algorithm for EDA processing, implemented in a python-based toolkit, Neurokit [25]. SCR, otherwise referred to as phasic EDA, is the rapid changes of the electrical conductivity of the skin. It is a product of the fluctuation of sympathetic arousal level [24]. The SCR features extracted include mean of SCR per trial (phasic), total number of SCR peaks (significant increases of skin conductance) per trial, and mean amplitude of SCR peaks per trial. Behavioral Cues: Facial Expression and FaceReader 6.0. Facial expression is recorded by a camera, Microsoft LifeCam hd5000 during the experiment with a frame rate at 30 Hz. The recording is processed using commercial software FaceReader 6.0 [26] at a sampling rate of 15 Hz. From the FaceReader output, we analyzed the intensities of basic emotions: happy, surprised, neutral, sad, angry, scared and disgusted. Experiential Measurement and ‘Grounded Truth’: Subjective Self-reports. Subjective self-reports include three parts: 1) demographic questionnaire 2) appraisal questionnaire on perceived control and value of the task 3) NASA TLX. The demographic questionnaire, including items on gender, ethnicity, age and past related experience is completed before the start of tasks. Participants rate their experience on relevant activities (e.g. video games with joysticks) on a 5-point Likert scale. The appraisal questionnaire and the NASA TLX are completed between tasks. We will use them as the ‘grounded truth’ measurement to compare EDA and facial expression results and assess the convergence among them. Perceived Control and Value. Two appraisal constructs are assessed as affect correlates to evaluate multimodal data convergence: perceived control and perceived value. Perceived control is the subjective ‘causal agency on one’s actions’ [9]. Items on perceived control are adapted from the Perceived Control Scale [18]. Perceived value is the ‘perceived valence’ on the activity based on three domains: usefulness, importance and interest [9]. Items on perceived value are adapted from the expectancy-value theory of achievement motivation [19]. NASA Task Load Index (NASA TLX). The NASA TLX is a questionnaire established to assess the subjective experience of workload related elements such as mental workload and fatigue. It is adapted to the aviation context where both physical and mental tasks are required. We include relevant items on the following constructs: mental and physical workload, fatigue and effort. Workload is the overall cost incurred in executing a task. It can be experienced physically (e.g. body control or movements) or mentally (e.g. thinking or deciding). Fatigue refers to the subjective feeling of tiredness. Effort is the amount of mental or physical activity executed during a task [16].

5 Results The experimental results are presented in two sections: the first examines learners’ physiological arousal; the second examines the inference from facial expressions.

32

5.1

T. Li et al.

Physiological Arousal

Firstly, to answer our first research question on the relationship between physiological arousal and experiential affective correlates, bivariate correlations are conducted between EDA features (SCR peak count, SCR peak amplitude and phasic EDA) and the ‘grounded truth’ self-report variables (mental and physical workload, effort, fatigue, perceived value and perceived control). Bicorrelations are conducted across all participants and all X-Plane tasks followed by questionnaires (N = 84). Significant positive correlations are found between SCR peak count and self-report variables, namely physical workload (r = .355, p < .01), effort (r = .220, p < .05) and fatigue (r = .222, p < .05). This suggests that, at a general level (i.e. across all participants), peak frequency of physiological arousal increased as participants’ experienced physical workload, effort and fatigue increased.

Fig. 1. Bicorrelations between SCR peak count and subjective effort, fatigue and physical load.

Secondly, we address our third research question on individual differences in the physiological manifestation of affect. That is, although these significant correlations confirm a general positive association between SCR peaks and fatigue, effort and physical workload, we would like to examine the variation of these correlations among individuals. We conduct bivariate correlations between SCR peak count, fatigue, effort and physical workload for all valid tasks (N = 6) for each individual (N = 14). The medians and ranges of the correlation coefficients are plotted in Fig. 1. The r coefficients for the correlation between SCR peak count and effort ranges from −.880 to .882 (M = .13). The correlation between SCR peak count and fatigue ranges from −.707 to .627 (M = .03). With physical workload, correlation of SCR peak count ranges from −.782 to .848 (M = 0). Consistent with our third hypothesis, the wide ranges of correlation coefficients on the individual level indicate variation of physiological manifestation of affect among participants. 5.2

Facial Expressions

For the second research question on the relationship between behaviorally expressed emotions (FaceReader) and experiential subjective measurements, bivariate

Developing a Multimodal Affect Assessment for Aviation Training

33

correlations are conducted between facial expression and self-report variables, across all participants and all tasks. Significant correlations (Table 1) are found between several emotions and self-report variables. Anger is positively correlated with mental workload (r = .267, p < .05) and physical workload (r = .268, p < .05). Since anger is a negative activating emotion, it could be an outcome of high mental and physical workload. Anger also positively correlates with effort (r = .316, p < .01) and fatigue (r = .369, p < .01), which confirms our hypothesis that negative activating emotions increase as effort and fatigue increase. We also found statistically significant positive correlations between anger and the appraisal variables: perceived value (r = .222, p < .05) and perceived control (r = .283, p < .05), which might be a result of the increase in effort input indicated in the previous correlation. In addition, surprise negatively correlates with fatigue (r = −.242, p < .05). Scared state negatively correlates with perceived control (r = −.218, p < .05).

Table 1. Bivariate correlations between the intensities of emotions in facial expression analysis and self-report variables Measures

Mental workload Angry .267* Surprised −.022 Scared .071 *p < .05, **p < .01

Physical workload .268* −.095 −.059

Effort

Fatigue

.316** .043 .076

.369** −.242* .126

Perceived value .222* .139 −.064

Perceived control .283* .036 −.218*

With regards to our third hypothesis on the individual differences in behavioral affect manifestation, a follow-up analysis is conducted for each participant to assess the variation of each correlation at the individual level. Different patterns of the relationships between facial expression analysis features (angry, surprised, scared) and selfreport variables are found across participants. Figure 2 demonstrates the medians and ranges of the correlation coefficients and outliers. As the outliers are nearly or over 10% of the sample size, we practice caution by also examining the range and mean including them: the correlation ranges from −.113 to .466 for anger and effort (M = .101). The correlation between anger and fatigue ranges from −.382 to .491 (M = .150). The correlation between anger and mental workload ranges from −.753 to .420 (M = .05). For physical workload, the correlation with anger ranges from −.597 to .691 (M = .098). The correlation between anger and perceived value ranges from −.471 to .494 (M = −.019). The correlation between scared state and perceived control ranges from −.666 to .548 (M = −.124). The correlation between surprise and fatigue ranges from −.702 to .741(M = −.228). The ranges of correlation coefficients on the individual level supports our third hypothesis on the variation of the behavioral manifestation of affect across participants.

34

T. Li et al.

Fig. 2. Bivariate correlations between the facial expression variables and self-report variables.

6 Discussion and Conclusion This research addresses the gap in the literature regarding mechanisms to demonstrate convergence in multimodal affect data. We conducted an experiment to assess the validity of using EDA and facial expression as affect indicators while trainees interact with flight simulation. Our results support our hypotheses that affect inferences from both EDA and facial expression converge with the ‘grounded truth’. The significant correlations between physiological arousal (SCR peak count) and subjective effort, fatigue and physical workload confirm our first hypothesis: physiological arousal indicated by EDA correlates positively with workload moderately, with fatigue and effort weakly. SCR peak count indicates the level of activation. Prolonged activation and stress will lead to physical workload and fatigue (e.g. [20, 21]). Elevated stress and activation also associate with higher level of effort in comparison with a lowactivation state such as boredom. Therefore, our results suggest that the arousal dimension of affect in aviation training could be accounted for by EDA as a physiological measure. On the individual level, the variation in the relationship between physiological arousal and effort may be due to the various motivational and behavioral outcomes of physiological arousal. Participants may be more motivated and driven to work hard under elevated stress, but they may lose hope under prolonged stress and disengage from the task [22]. Although it is out of the scope of this paper, this finding could lead to further research on potential mediators of the relationship between physiological arousal and effort. The convergence between behavioral inference of affect (facial expression) and ‘grounded truth’ supports the validity of facial expression analysis as an affect measure. The positive correlation between anger and workload and effort could be an outcome of encountering a cognitively destabilizing experience (e.g. a hard, confusing task). This experience could motivate the participant to try harder to return to a cognitive equilibrium and a pleasant emotional state [22]. Since effort, mental workload and physical workload all correlate with fatigue [16], it is expected that anger also positively associates with fatigue. In addition, through our communication with the FaceReader

Developing a Multimodal Affect Assessment for Aviation Training

35

developers, we understood that a face of intense focus (e.g. frowning) can be coded as anger in FaceReader due to the similarity of the facial expression. As our experimental tasks are attention-drawing, a proportion of the ‘anger’ output could be interpreted as focus. This interpretation is also consistent with previous research demonstrating the association between attention and workload [16]. Moreover, it explains the positive correlation between anger and the perceived value and control: as one finds a task more important and interesting, more attention is paid to the task, which could lead to better control of the situation. Furthermore, we found negative correlations between surprise with workload and fatigue. No significant correlation is found between surprise and control or value. This is consistent with our expectation as surprise is considered a neutral emotion, which is unlikely to correlate with appraisal [27]. However, the role of surprise in learning is dependent on context. In mathematics learning, surprise negatively predicts metacognitive and cognitive strategies, hence leads to worse learning outcomes. Researchers speculated that students in this context experienced excessive confusion after surprise, which led to disengagement [28]. In our context, surprise might be a reaction to accomplishing a task faster or better than expected, leading to less workload and fatigue. Consistent with previous research [9], scared state has a significant negative correlation with perceived control. These results provide supporting evidence for the validity of facial expression analysis in measuring affect in the aviation context. Although the correlations between some facially-expressed emotions and selfreport variables are found significant across participants, on the individual level, there is a lot of variability indicated by the wide ranges of coefficients and their mean and median around 0. This finding further confirms our expectation that there are large individual differences and emphasizes the need for future research exploring mediators between the behavioral cues of affect and their psychological correlates. Our results are limited by the small sample size. Future research with larger sample size could examine the distribution of these correlational relationships on individual level. In sum, our findings support the validity of EDA and facial expression analysis as measurements of affect in aviation training context. A comprehensive assessment of affect could help aviation training instructors to allocate appropriate scaffolding to pilots where needed, thereby improving the overall training experience and effectiveness. In future studies, we hope to replicate the current findings with pilots in high fidelity environments to consolidate the connection we established among affect-related modalities. Acknowledgement. This study is jointly funded by Natural Sciences and Engineering Research Council of Canada (514052-17), Consortium for Aerospace Research and Innovation in Canada (CARIC) and Quebec (CRIAQ) (OPR-1618). We thank Alain Bourgon and Hugh Grenier (CAE Inc.) for proposing and configuring the X-plane environment and tasks, and Maher Chaouachi (CAE Inc.) for proposing the EDA analysis algorithm. We thank all collaborators from the InLook project which this study is part of.

36

T. Li et al.

References 1. Gross, J.J.: Emotion regulation: affective, cognitive, and social consequences. Psychophysiology 39(3), 281–291 (2002) 2. Pekrun, R., Linnenbrink-Garcia, L.: Introduction to emotions in education. In: International Handbook of Emotions in Education, pp. 11–20. Routledge, Abingdon (2014) 3. Kaempf, G.L., Klein, G.: Aeronautical decision making: the next generation. In: Aviation Psychology in Practice, p. 223 (2017) 4. Jensen, R.S.: Pilot Judgment and Crew Resource Management. Routledge, Abingdon (2017) 5. Causse, M., et al.: The effects of emotion on pilot decision-making: a neuroergonomic approach to aviation safety. Transp. Res. Part C Emerg. Technol. 33, 272–281 (2013) 6. Harley, J.M.: Measuring emotions: a survey of cutting edge methodologies used in computer-based learning environment research. In: Emotions, Technology, Design, and Learning, pp. 89–114. Elsevier (2016) 7. Regula, M., et al.: Study of heart rate as the main stress indicator in aircraft pilots. In: Proceedings of ME 2014. IEEE (2014) 8. Pekrun, R., Linnenbrink-Garcia, L.: International Handbook of Emotions in Education. Routledge, London (2014) 9. Pekrun, R., Perry, R.P.: Control-value theory of achievement emotions. In: International Handbook of Emotions in Education, pp. 130–151. Routledge (2014) 10. Duffy, E.: Activation and Behavior. Wiley, New York (1962) 11. Storbeck, J., Clore, G.L.: Affective arousal as information: how affective arousal influences judgments, learning, and memory. Soc. Pers. Psychol. Compass 2(5), 1824–1843 (2008) 12. Dettmers, S., et al.: Students’ emotions during homework in mathematics: testing a theoretical model of antecedents and achievement outcomes. Contemp. Educ. Psychol. 36 (1), 25–35 (2011) 13. D’Mello, S., et al.: Confusion can be beneficial for learning. Learn. Instr. 29, 153–170 (2014) 14. Murray, P.S., Martin, W.L.: Beyond situational awareness: a skill set analysis for situational control. In: AAvPA Symposium, Sydney, Australia (2012) 15. Flin, R., et al.: Human factors in the development of complications of airway management: preliminary evaluation of an interview tool. Anaesthesia 68(8), 817–825 (2013) 16. Hart, S.G.: NASA-task load index (NASA-TLX); 20 years later. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting. Sage Publications, Los Angeles (2006) 17. Burnham, J.F.: Scopus database: a review. Biomed. Digit. Libr. 3(1), 1 (2006) 18. Perry, R.P., et al.: Perceived academic control and failure in college students: a three-year study of scholastic attainment. Res. High. Educ. 46(5), 535–569 (2005) 19. Wigfield, A., Eccles, J.S.: Expectancy-value theory of achievement motivation. Contemp. Educ. Psychol. 25(1), 68–81 (2000) 20. Nittala, S.K., et al.: Pilot skill level and workload prediction for sliding-scale autonomy. In: 2018 17th IEEE (ICMLA). IEEE (2018) 21. Shiomi, K., Itano, K., Suzuki, A.: Development and evaluation of the fatigue and drowsiness predictor. In: Archives of 28th ICAS (2012) 22. D’Mello, S., Graesser, A.: Dynamics of affective states during complex learning. Learn. Instr. 22(2), 145–157 (2012) 23. X-Plane 11 (2020). https://www.x-plane.com/

Developing a Multimodal Affect Assessment for Aviation Training

37

24. Braithwaite, J.J., et al.: A guide for analysing electrodermal activity (EDA) & skin conductance responses (SCRs) for psychological experiments. Psychophysiology 49(1), 1017–1034 (2013) 25. Makowski, D.: NeuroKit (2016) 26. Den Uyl, M., Van Kuilenburg, H.: The FaceReader: online facial expression recognition. In: Proceedings of Measuring Behavior. Citeseer (2005) 27. Mauss, I.B., Robinson, M.D.: Measures of emotion: a review. Cogn. Emot. 23(2), 209–237 (2009) 28. Muis, K.R., et al.: The role of epistemic emotions in mathematics problem solving. Contemp. Educ. Psychol. 42, 172–185 (2015)

Scaling Mentoring Support with Distributed Artificial Intelligence Ralf Klamma1(B) , Peter de Lange1 , Alexander Tobias Neumann1 , Benedikt Hensen1 , Milos Kravcik2 , Xia Wang2 , and Jakub Kuzilek2 1

RWTH Aachen University, Aachen, Germany {klamma,lange,neumann,hensen}@dbis.rwth-aachen.de 2 DFKI, Berlin, Germany {milos.kravcik,xia.wang,jakub.kuzilek}@dfki.de

Abstract. Mentoring is the activity when an experienced person (the mentor) supports a less knowledgeable person (the mentee), in order to achieve the learning goal. In a perfect world, the mentor would be always available when the mentee needs it. However, in the real world higher education institutions work with limited resources. For this, we need to carefully design socio-technical infrastructures for scaling mentoring processes with the help of distributed artificial intelligence. Our approach allows universities to quickly set up a necessary data processing environment to support both mentors and mentees. The presented framework is based on open source standards and technologies. This will help leveraging the approach, despite the organizational and pedagogical challenges. The deployed infrastructure is already used by several universities. Keywords: Mentoring support · Learning analytics · Cloud computing · Distributed architectures · Infrastructuring · Intelligent mentoring bots

1

Introduction

Mentoring is the process of the mentor supporting the mentee, in order to make the learning experience more effective and efficient. Psychological and emotional support are at the heart of the mentoring relationship, underpinned by empathy and trust [15]. In modern higher education institutes, mentoring has become challenging due to the mass of students and the lack of resources. It has raised the interest in socio-technical support for mentoring processes, which include peer mentoring and technological processes. Intelligent Tutoring Systems (ITS) have a long tradition, focusing on cognitive aspects of learning in a selected domain. They were successfully applied especially in those areas, where domain knowledge can be well formalized with the help of experts. However, motivations, emotions and meta-cognitive competences play a crucial role in education. They can be monitored through big educational data and a wide spectrum of available sensors, bearing the potential c Springer Nature Switzerland AG 2020  V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 38–44, 2020. https://doi.org/10.1007/978-3-030-49663-0_6

Scaling Mentoring

39

to also improve the mentoring process. In this paper, we look at these various aspects and investigate how they can be technologically supported, in order to specify the requirements for Intelligent Mentoring Systems (IMS). This helps us answer the following questions: 1. How can we design IMSs to cover typical challenges and to scale up mentoring support in universities? 2. How can we design an infrastructure to exchange data between universities in a private and secure way to scale up on the inter-university level? 3. How can we integrate heterogeneous data sources to facilitate services supporting mentoring processes? With our contributions, we aim at providing an Open Source Software (OSS) infrastructure and ecosystem for mentoring support from distributed Artificial Intelligence (AI). In the remainder of this paper, we first explore roles in the mentoring process and present our learner model that builds the basis for our approach to scale mentoring support (Sect. 2). We then present related work (Sect. 3), before we describe our scalable mentoring support infrastructure, based on a socio-technical approach (Sect. 4) and conclude our paper (Sect. 5).

2

The Pedagogy of Mentoring

Mentoring can be viewed from both the side of the mentor and mentee. The latter needs prompt and effective assistance, which means that the interventions should be not only without long delays, but also personalized to the individual needs and context. On the other side, the mentor is often overwhelmed with too many questions and requests of various complexity. Some of them can be automated. Others might be successfully answered by peers. The third group requires the unique competences of a mentor. Thus, the challenge is to properly categorize the requests of mentees and delegate them to an appropriate respondent: machine, peer or mentor. From such a solution, both sides can benefit. The mentee gets the required support faster and the mentor can concentrate on those requests that really require her expertise. Considering this, we propose knowledge services for the automation of the mentoring process, based on the traditional models of domain, pedagogy and learner. The domain models are represented by RDF graphs, which can be manually created or extracted by tools, like for example T-MITOCAR [13]. The pedagogical models are based on rules, which will be enhanced with machine learning approaches at a later stage of our research. Our learner model is designed to serve the following purposes: 1. Assessing the learner’s performance and knowledge level by using learning results to estimate her competence. A mathematical model combining Item Response Theory (IRT) and Transferable Belief Model (TBM) can be used for computing competence [4].

40

R. Klamma et al.

2. Dynamically and adaptively providing learning content, based on the learner’s current knowledge competence and coverage. Only the unlearned, misunderstood and most related knowledge material will be suggested for further learning. 3. Evaluating the learners’s learning process against personal goals. The learning is periodically evaluated and intelligent assistance will be provided if the learner is found to be behind or away from her goal. 4. Providing real-time interactive feedback to learner activity and recommending more related assignments and learning resources.

3

Related Work

ITS [1,3] have already a rather long tradition in university teaching, and their role as virtual mentors in mentoring processes has lately been recognized [5]. Topics like peer mentoring, virtual mentors, affective and emotional support [16], but also minorities [9] and modeling [2] have been discussed. Mentoring can provide multiple roles [15]: counseling, instruction, training, activation, motivation, socio-emotional support, networking and example. There are also other success factors that make mentoring effective, like similar values, demographic proximity, trust and respect. Many of them have been already considered in existing approaches, including affect detection, meta-cognitive support, lifelong mentoring or prediction [7]. The role of a mentor can be taken over by a chatbot, a software program conducting auditory or textual conversations. Natural Language Understanding (NLU) can be applied to analyze speech, and intelligent responses can be created by designing an engine to provide appropriate human-like responses. The results of a systematic literature review [17] show that chatbots have only recently been introduced in education, but they offer opportunities to create individual learning experiences. This can lead to an increase in learning outcomes and can support lecturers, as well as their teaching staff [11]. Chatbots have also been extended in the field of mixed reality, which describes a spectrum between the real world and a purely computer generated world with the intermediate forms of augmented reality and augmented virtuality [10]. Measures of infrastructuring [12] allow to deploy fast software development processes that provide high quality tools for mentors, to set up intelligent virtual agents for supporting the mentees’ needs. A modular approach, driven by containerized microservices, simplifies the setup of massively distributed systems. Such systems fall in the area of Infrastructure as a Service (IaaS), which are characterized by the provision of an infrastructure in which applications can be developed and set up. The Learning Record Store (LRS) is a server responsible for receiving, storing and accessing learning activities. It can be operated as a stand-alone system, but it can also be integrated into a Learning Management System (LMS). In the latter case the LMS takes over the reporting and analysis using the data of the LRS, while the LRS connects to other activity providers. It also provides

Scaling Mentoring

41

the possibility to connect to another LRS for further analysis of data from other sources. A LRS builds the basis for an Experience API (xAPI)1 ecosystem, an OSS specification of a data format for learning data. Any interaction from a tool via the xAPI is done through the LRS, allowing the system to store and retrieve xAPI statements. Those statements contain information about the actor, verb, and object. When an actor interacts with the tracked system, the verb describes the type of activity and the object describes what is being interacted with. In addition to these predefined data fields of the xAPI, several extensions are available to allow a highly customizable storage.

4

A Distributed Architecture to Scale up Mentoring

Figure 1 gives an overview of our current infrastructure. Applications are installed and operated within a kubernetes cluster. The storage of the data is done decentrally by las2peer [6], a decentralized, OSS environment for community-oriented microservice development, deployment and monitoring. las2peer respects the demands of users for privacy, security and informational self-determination. The las2peer architecture consists of nodes connected by an underlying peer-to-peer network without central authority and hosts several services that can communicate with each other. The decentralized data storage and the communication within the network is protected by asymmetric encryption. This means that the control of the stored data remains with the respective stakeholder group (community). A Blockchain-based service registry [8] allows secure archiving of services with different versions, which in turn makes it easier to find and access them. Our infrastructure is connected to the learning toolchain via so-called Data Proxies for different LMSs, which are located at the institutions of the respective testbed. The task of them is to transfer the data from the respective LMS to the cluster. Currently, this is integrated for the LMSs Moodle and Opal. The incoming data flow from the data proxies is aggregated within the cluster via a monitoring pipeline and streams into a collection of connected LRS. This data is currently analyzed by basic knowledge services, which provide traditional models of domain, learner and pedagogy, based on both rule-based and machine learning approaches. In the future, smart assistance services will use this data to implement a spectrum of supporting functionalities, including personalized recommendations, categorizations, predictions and reflections. As interface for both mentors and mentees, we use Intelligent Mentoring Bots (IMBots), chatbots tailored especially for mentoring processes. They integrate into common messaging platforms and provide just-in-time feedback. These IMBots are trained with the OSS RASA NLU2 . Additionally, a Mentoring Workbench integrates back into the respective LMS, providing the mentees with both the possibility to use the IMBot from within their known environment, as well as giving them feedback on their performance. We use an OpenID Connect (OIDC) server to access all services in a modern and secure fashion. By coupling the 1 2

https://xapi.com. https://rasa.com.

42

R. Klamma et al. Legend Encrypted Communication

Data Collector

xApi Statements

Data Flow las2peer

Learning Record Store

Learning Record Store

AI Tools

Moodle

Data Proxies

Opal/ Onyx

Container

RESTful API

Microservice Adapter (Reverse Proxy)

Further Data Sources

Distributed Storage

RESTful API

Social Bot Mentoring Workbench

Fig. 1. Mentoring support infrastructure.

OIDC identity and end-to-end encryption, the need for a central instance through which the communication takes place is eliminated.

5

Conclusions and Outlook

We have a proven technological platform based on international standards, existing OSS tools and a track record of EU-funded projects. The cluster has been set up as a public infrastructure and every external entity is able to connect to it. Data is currently coming from several German universities within a project funded by the German Federal Ministry of Education and Research (BMBF) and from a Massive Open Online Course (MOOC) supported by an ERASMUS+ project in the field of augmented reality. Our strong socio-technical conceptual framework allows us to develop our infrastructure, which supports both mentor and mentee in various organizations, following different legislative and organizational procedures, different LMSs as well as diverse target groups. Our development process, based on OSS commitment, allows us to quickly react on changing user requirements and organizational restrictions. In particular, end user involvement is supported from the beginning by the public formulation and elicitation of requirements in the Web-based Requirements Bazaar platform [14]. Given the early status of the data processing, with first example data sets that have been successfully evaluated, we will report on them at a later point. Acknowledgments. The authors would like to thank the BMBF for their kind support within the project “Personalisierte Kompetenzentwicklung durch skalierbare Mentoringprozesse” (tech4comp) under the project id 16DHB2110.

Scaling Mentoring

43

References 1. Brusilovsky, P., Peylo, C.: Adaptive and intelligent web-based educational systems. Int. J. AI Educ. 13, 159–172 (2003) 2. Dimitrova, V., Brna, P.: From interactive open learner modelling to intelligent mentoring: STyLE-OLM and beyond. Int. J. Artif. Intell. Educ. 26(1), 332–349 (2015). https://doi.org/10.1007/s40593-015-0087-3 3. Dugenie, P., Jonquet, C., Cerri, S.A.: The principle of immanence in GRIDmultiagent integrated systems. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM 2008. LNCS, vol. 5333, pp. 98–107. Springer, Heidelberg (2008). https://doi.org/ 10.1007/978-3-540-88875-8 29 4. Faulhaber, A., Melis, E.: An efficient student model based on student performance and metadata. In: Proceedings of ECAI 2008. Frontiers in Artificial Intelligence and Applications, pp. 276–280. IOS Press (2008) 5. Hoffman, R.R., Ward, P.: Mentoring: a leverage point for intelligent systems? IEEE Intell. Syst. 30(5), 78–84 (2015) 6. Klamma, R., Renzel, D., de Lange, P., Janßen, H.: las2peer - a primer. In: ResearchGate. ACIS Working Group Series (AWGS) (2016) 7. Kravˇc´ık, M., Schmid, K., Igel, C.: Towards requirements for intelligent mentoring systems. In: ABIS 2019, pp. 19–21. ACM Press (2019) 8. de Lange, P., Janson, T., Klamma, R.: Decentralized service registry and discovery in P2P networks using blockchain technology. In: Bakaev, M., Frasincar, F., Ko, I.-Y. (eds.) ICWE 2019. LNCS, vol. 11496, pp. 296–311. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-19274-7 22 9. Mack, N.A., Cummings, R., Huff, E.W., Gosha, K., Gilbert, J.E.: Exploring the needs and preferences of underrepresented minority students for an intelligent virtual mentoring system. In: Stephanidis, C., Antona, M. (eds.) HCII 2019. CCIS, vol. 1088, pp. 213–221. Springer, Cham (2019). https://doi.org/10.1007/978-3-03030712-7 28 10. Milgram, P., Kishino, F.: A taxonomy of mixed reality visual displays. IEICE Trans. Inf. Syst. E77-D(12), 1321–1329 (1994) 11. Neumann, A.T., de Lange, P., Klamma, R.: Collaborative creation and training of social bots in learning communities. In: IEEE CIC 2019. IEEE (2019) 12. Pipek, V., Wulf, V.: Infrastructuring: towards an integrated perspective on the design and use of information technology. J. Assoc. Inf. Syst. 10(5), 447–473 (2009) 13. Pirnay-Dummer, P., Ifenthaler, D.: Automated knowledge visualization and assessment. In: Ifenthaler, D., Pirnay-Dummer, P., Seel, N. (eds.) Computer-Based Diagnostics and Systematic Analysis of Knowledge, vol. 1, pp. 77–115. Springer, Boston (2010). https://doi.org/10.1007/978-1-4419-5662-0 6 14. Renzel, D., Behrendt, M., Klamma, R., Jarke, M.: Requirements bazaar: social requirements engineering for community-driven innovation. In: RE 2013, pp. 326– 327. IEEE (2013) 15. Risquez, A., Sanchez-Garcia, M.: The Jury is still out: psychoemotional support in peer e-mentoring for transition to university. Internet High. Educ. 15(3), 213–221 (2012)

44

R. Klamma et al.

16. Toala, R., Gon¸calves, F., Dur˜ aes, D., Novais, P.: Adaptive and intelligent mentoring to increase user attentiveness in learning activities. In: Simari, G.R., Ferm´e, E., Guti´errez Segura, F., Rodr´ıguez Melquiades, J.A. (eds.) IBERAMIA 2018. LNCS (LNAI), vol. 11238, pp. 145–155. Springer, Cham (2018). https://doi.org/10.1007/ 978-3-030-03928-8 12 17. Winkler, R., S¨ ollner, M.: Unleashing the potential of chatbots in education: a state-of-the-art analysis. In: Academy of Management (2018)

Exploring Navigation Styles in a FutureLearn MOOC Lei Shi1(&), Alexandra I. Cristea1, Armando M. Toda2, and Wilk Oliveira2 1

Durham University, Durham, UK {lei.shi,alexandra.i.cristea}@durham.ac.uk 2 University of São Paulo, São Paulo, Brazil {armando.toda,wilk.oliveira}@usp.br Abstract. This paper presents for the first time a detailed analysis of finegrained navigation style identification in MOOCs backed by a large number of active learners. The result shows 1) whilst the sequential style is clearly in evidence, the global style is less prominent; 2) the majority of the learners do not belong to either category; 3) navigation styles are not as stable as believed in the literature; and 4) learners can, and do, swap between navigation styles with detrimental effects. The approach is promising, as it provides insight into online learners’ temporal engagement, as well as a tool to identify vulnerable learners, which potentially benefit personalised interventions (from teachers or automatic help) in Intelligent Tutoring Systems (ITS). Keywords: MOOCs

 Navigation  Learning Styles  Learning Analytics

1 Introduction Having emerged in recent years, especially with the rise of big data, Learning Analytics (LA) [1] hugely impacts on the area of Intelligent Tutoring Systems (ITS). Common goals include predicting performance and retention, as well as improving assessment and engagement [2, 3]. Effective LA practice often involves statistical modelling for meaningful insights. The ever-increasing amount of learner data from MOOCs (Massive Open Online Courses) brings unprecedented opportunities to enhance LA. In turn, LA provides methods to determine factors influencing learner behavior, allowing improvements of learning context, design and pedagogies [4]. Patterns in general have fascinated humankind [5]. Learning patterns have been studied for a long time, both offline and, since the advent of the Internet, online [6]. Navigation Styles is more recently of interest, especially related to self-directed learning in MOOCs, placing the control with the learners. This recent interest brings insights into data-driven re-examination of traditional theories of Learning Styles. MOOCs are notoriously known for low completion [7], so we aim to re-examine in depth actual learner behavior and understand how to better help, by answering 4 cumulative research questions: RQ1 what are the real navigation styles of MOOC

© Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 45–55, 2020. https://doi.org/10.1007/978-3-030-49663-0_7

46

L. Shi et al.

learners? RQ2 how do these navigation styles relate to traditional theories of Learning Styles? RQ3 how do different navigation styles affect the course completion? RQ4 which are the learners that are particularly in need of help?

2 Related Work Many studies have been conducted using Learning Analytics (LA) to understand learner behavior thus improve online learning engagement. For example, Pardo et al. [8] addressed the challenges impeding the capacity of instructors to provide personalized feedback at scale. Zhang et al. [9] explored the role of Slack in collaborative learning engagement. Shoufan [10] investigated how YouTube videos could support online learning. Cristea et al. [11] predicted dropout for earlier interventions. Shi et al. [12] analyzed behavioral and demographic patterns. Most of these studies grouped learners and compared patterns amongst the groups, aiming at a deeper understanding of how online learners engage and perform. Our study, presented in this paper, also takes the grouping approach, but, differently, we group learners based on how they follow the pre-defined directed linear learning path. Time is a fundamental dimension of learning. When breaking down a course into temporal phases, existing relationships amongst various parameters or variables may not continue along the learning process [13]. Some studies have taken into consideration this temporal dimension. For example, Yang et al. [14] clustered learners based on their characteristics and their interactions with learning materials to understand how their cluster membership changed along the course. Laer and Elen [15] examined changes in learner behavior and outcomes to test if providing cures for calibration affects self-regulated learning. In the current study, differently, we analyze how learners navigate in the pre-defined directed linear learning path, group them using this information, and compare their engagement and performance. The effects of learning style models including the well-known Felder-Silverman [16] and Kolb [17] models have been well studied. Despite criticism on the concept of learning styles [18], many studies e.g. [19] incorporated the concept to support personalized learning, and claimed their findings strongly approve the concept. In our study, we group learners by navigation styles rather than learning styles and investigate how the identified navigation styles relate to theories of learning styles. We map navigation styles to the sequential-global dimension of Felder-Silverman Learning Styles [16], which determines how learners prefer to progress toward understanding information. Sequential learners prefer linear thinking and learn in small incremental steps, whereas global learners prefer holistic thinking and learn in large leaps. Note that sequential and global styles exist on a continuum – with some learners heavily favoring one or the other, and others using a little bit of both.

3 Experimental Settings The dataset was taken from “The Mind is Flat”, a 6-week course on the FutureLearn MOOCs platform. Note that, only one course was chosen, because learning styles rely on subject of learning thus combining courses may cancel out the effect of the

Exploring Navigation Styles in a FutureLearn MOOC

47

navigation style identification. Each week contained several steps – basic learning units which could be an article, a discussion forum, or an assessment. There were {14, 12, 14, 12, 12, 18} steps in {Week 1, …, Week 6}. Each week had an Introduction step, 8– 12 “Main” steps (articles or videos, the main learning content), a Discussion step, an Assessment step, and a Review step. Week 6 contained 2 additional steps: a “Further Reading” step, recommending academic papers to read after the course, and a “Certificates” step, promoting buying a printed statement of participation. The pre-defined linear learning path for learners to follow each week was to start with the Introduction step, then to visit the “Main” step(s), participate in the Discussion step, visit the Experiment and Assessment steps, and finish with the Review step. Initially, 14,240 learners enrolled in the course, with 2,030 unenrolled, thus 14,240 − 2,030 = 12,210 remaining. FutureLearn defines active learner as “a learner who goes on to mark at least one step as complete in a course” [20]. Using this definition, our dataset had 5,204 active learners. As the rest 12,210 − 5,204 = 7,006 learners barely did anything, our analysis focused on only the activity logs of 5,204 active learners. We first inspected Week 1, identifying navigation styles and exploring whether and how these styles might correlate to engagement and performance. Engagement was measured by how learners accessed learning content, e.g., reading articles, and how they participated in discussion forums and took assessments. Performance was measured via the assessment results (correct answer rate). Note that, as all the weeks in the course were structured the same, navigation styles can be identified at the week-level and explored for stability and persistence in further weeks. Moreover, identifying navigation styles early in Week 1 potentially allows early intervention and help for learners. Hence, we started by analyzing Week 1 (Sect. 4.1). We then moved onto the analysis of whether and how navigation styles might change across weeks (Sect. 4.2). The temporal analysis aimed to thus understand temporal stability of the learning process and analyze consequences of the style choice.

4 Results and Discussions 4.1

Week 1

In Week 1, there were 14 steps including 1 Introduction step, 9 Main steps, 1 Discussion step, 1 Experiment step, 1 Assessment step, and 1 Review step. How did learners start? The majority (90.3%; 4,699/5,204) active learners started with the Introduction step. Among the rest 506 learners, 438 (86.6%) started with a Main step; 19 (3.8%) the Experiment step, 2 (0.4%) the Discussion step, 29 (5.7%) the Assessment step, and 18 (3.6%) the Review step. Despite 506 learners not starting with the Introduction step, 96.0% (4,998/5,204) of them did visit it (after visiting one or a few other steps). This implies the perceived importance of the Introduction step, despite learners’ navigation styles. Indeed, it introduced both what and how to learn – helpful in providing learners with the ‘big picture’ of the course, including what to expect now and later, and how to proceed.

48

L. Shi et al.

Number of learners

How did learners follow the pre-defined directed linear learning path? 5,135 (98.7% of 5,204) active learners visited at least one Main step. Figure 1 shows the number of Main steps visited by the learners in Week 1, with only 72 (1.4%) learners not visiting any Main step, and 2,286 (43.9%) visiting all (9). 2,172 (41.7% of 5,204) active learners visited the Discussion step; and surprisingly, 99.7% (2,165/2,172) participated in the discussion. This indicates that Discussion, in itself, is very important to learners in the course, and it might generally be important to learners online, as they might come to online courses partially for social reasons. 2,400 1,600 800 0

2,286 72 0

456

355

1

2

486

462

414

283

3 4 5 6 Number of Main steps visited

186

204

7

8

9

Fig. 1. Distribution of frequency of accessing Main steps

Average number of Main steps

Whilst the Discussion step was clearly designed to be visited after having visited all the Main steps (the pre-defined directed linear learning path, as explained in Sect. 3), 1.6% (83/5,204) active learners visited one or a few Main steps after the Discussion step. These learners might be in the online course for discussion only. 2.9% (153/5,204) visited some Main step(s) after the Experiment step. 4.4% (229/5,204) visited some Main step(s) after the Assessment step. These learners might have actively been trying to use their current knowledge to test themselves, before starting learning, or might have wanted to check how challenging the tests were, before committing to learning. The fact that they returned to learning (the Main steps) afterwards seems to suggest their current knowledge was not (yet) sufficient to pass. 1.9% (98/5,204) visited some Main step(s) after the Review step. They might have wanted to know the main goals of the course before committing to learning, for a holistic/global view. Nonetheless, it is interesting that the majority of the learners conformed with the linear expectations on visiting the Main step first. Figure 2 shows the average number of Main steps visited before (blue columns) and after (red columns) visiting the Discussion, Experiment and Assessment steps, respectively. Overall, learners visited more Main steps before one of these special steps, which they should have visited after visiting all the Main steps. Yet, there was still a large percent of Main steps visited afterwards. This shows more learners than the above extreme ones felt the need to have an overview, to know the learning goals, or to see if the Experiment and/or the Assessment was something for them. it also shows, in average at least, learners tended to be non-linear, as defined below.

3.71 2.42

3.66 2.52

3.57 2.66

3.36 2.81

Discussion

Experiment

Assessment

Review

Before After

Fig. 2. The average number of Main steps visited BEFORE and AFTER visiting other steps (Color figure online)

Exploring Navigation Styles in a FutureLearn MOOC

49

We define Sequential learners as those active learners who strictly followed the pre-defined directed linear learning path, rendered closest to Felder-Silverman’s “sequential” – one extreme end of the sequential-global continuum [16]. Among 5,204 active learners, 1,552 (29.8%) were Sequential. We define Global learners as those active learners who visited at least one Main step for the first time after either the Discussion, the Experiment, the Assessment, or the Review steps, which were successors of all the Main steps in the learning path, and we map this learner group over Felder-Silverman’s “global” – the other extreme end of the sequential-global continuum [16]. There were 350 (6.7% of 5,204) Global learners. The rest of the active learners’ navigation styles didn’t comply with either extreme end but fell in between. We thus define them as Middle learners. Next, we compare engagement and performance (as introduced in Sect. 3) among these three learner groups. How did learners participate in Discussion and Assessment steps? Sequential learners had the highest mean value for both comments (l = 0.3) and attempts (l = 13.6), followed by Global learners (lcomments = 0.22; lattempts = 9.1); while Middle learners had the lowest mean value for both comments (l = 0.02) and attempts (l = 0.8). It means Sequential learners, who followed the course in the prescribed manner, might also be the most dedicated ones, whereas Global learners might be wasting time going back and forth thus missing activities. Middle learners seemed to be the least engaged. This, together with the fact that these learners represent the great majority, may not be surprising in the MOOCs context – it is a known fact in such elearning systems, the majority of learners do drop out from the course [21]. Figure 3 (left) compares the distribution of comments amongst these three learner groups on a logarithmic scale (base = 10). They share a similar distribution pattern: the majority did not leave any comment, and most of them left very few. However, it is obvious from the image that Sequential learners commented the most, followed by Global learners; whilst no Middle learner commented more than 2 comments. It is interesting that, whilst Sequential learners were indeed, again, the most active, Global learners were only marginally more active than Middle learners in terms of commenting (social). This may show a different focus of Global learners, less on social aspects, prioritizing the learning. Figure 3 (right) compares the distribution of attempts made in the Assessment step on a logarithmic scale (base = 10), showing a similar pattern, yet many more Sequential learners made much more attempts. A high number of learners in each group did not do any attempt. For the rest, the distribution is almost Gaussian, with peaks between 12–14 attempts for all groups.

Number of learners

4096 1024 256 64 16 4 1

Sequential Global Middle 0 1 2 3 4 5 6 7 8 9 101112131415161718 Number of comments

0

2

4

6

8 10 12 14 16 18 20 22 24 26 28 Number of attempts

Fig. 3. Distribution of comments (left) and attempts (right)

50

L. Shi et al.

Percent of learners

The result from a Kolmogorov-Smirnov test indicated a non-normal distribution for comments in Week 1 (Discussion step; D(5,204) = 0.502, p < 0.001); this is also the case for the attempts to answer the questions in the Assessment step (D(5,204) = 0.399, p < 0.001). Thus, the nonparametric Kruskal-Wallis H and Mann-Whitney U tests were used for the comparisons amongst these three learner groups. The result from a Kruskal-Wallis H test showed statistically significant differences in both comments (v2(2) = 681.33, p < 0.001) and attempts (v2(2) = 3,579.24, p < 0.001). MannWhitney U tests for pairwise comparisons were conducted, showing both comments (Z = −4.13, U = 144,385.0, p < 0.001) and attempts (Z = −8.71, U = 191,226.0, p < 0.001) were significantly higher for Sequential than for Global learners; both comments (Z = −14.602, U = 510,566.00, p < 0.001) and attempts (Z = −32.498, U = 241,255.00, p < 0.001) being significantly higher for Global than for Middle learners. How did learners perform in the Assessment? Overall, Sequential learners had at least one attempt to answer the questions in the Assessment step; whereas that number for Global and Middle learners was of 223 (63.7% of 350) and 194 (5.9% of 3,302), respectively. Here, only learners who had at least one attempt were considered: 1,485 + 223 + 194 = 1,885 learners. The correct answer rate was defined as the number of questions correctly answered, divided by the number of questions attempted. Sequential learners had the highest mean and median for correct answer rate (l = 70.0%, r = 16.8%, ~x = 71.4%), followed by Global learners (l = 64.3%, r = 21.8%, ~x = 66.7%) and then Middle learners (l = 59.3%, r = 23.1%, ~x = 61.3%). Figure 4 shows the distribution of the correct answer rates amongst the three learner groups. In general, at a higher correct answer rate category, i.e. [60%, 70%), [70%, 80%), [80%, 90%), and [90%, 100%], the proportion of Sequential learners is the highest, followed by Global learners. At the low end, only 0.2% (3/1,458) of Sequential learners didn’t answer any question correctly, with a much larger 3.1% (7/223) proportion of Global and 3.6% (7/194) Middle learners. Interestingly, the proportion of Global learners who correctly answered all (100%) questions attempted was the highest, 5.4% (12/223), in comparison with Sequential learners (4.5%; 65/1,458) and Middle learners (4.1%, 8/194). This may be because some percentage (although small) of Global learners may have had prior knowledge, although chance may play a role (as the result is not statistically significant).

30% 24% 18% 12% 6% 0%

Sequential % Global % Middle % [0,10%)

[10%,20%) [20%,30%) [30%,40%) [40%,50%) [50%,60%) [60%,70%) [70%,80%) [80%,90%) [90%,100%]

Correct answer rate

Fig. 4. Distribution of correct answer rate

Exploring Navigation Styles in a FutureLearn MOOC

51

The Kolmogorov-Smirnov test result indicated a non-normal distribution of the correct answer rates, D(1,902) = 0.10, p < 0.001, so the Kruskal-Wallis H and MannWhitney U tests were used to compare the correct answer rate for learner groups. The Kruskal-Wallis H test result suggested a statistically significant difference amongst these three learner groups in correct answer rate, v2(2) = 44.7, p < 0.001. The MannWhitney U test result suggested correct answer rates to be significantly higher for Sequential learners than for both Global and Middle learners. How did learners finish in Week 1? As stated in Sect. 3, the Review step was the end of the pre-defined directed linear learning path. Among the 5,204 active learners, 4,331 (83.22%) visited it, but only 1,847 (35.49% of 5,204) finished the current week with it. For the rest, 84 (1.61% of 5,204) learners finished with the Introduction, 2,864 (55.03%) with a Main, 97 (1.86%) with the Discussion, 132 (2.54%) with the Experiment, and 180 (3.46%) with the Assessment. Comparing learning groups, in terms of if the last step the learners visited in Week 1 was the Review, whilst, by definition, 100% Sequential learners did visit the Review step as the last step, there were only 39.43% (138/350) for Global and 4.8% (157/3302) for Middle learners. This, again, suggests Middle learners were the least active. 4.2

Across-Weeks

Changing navigation style. As presented in Sect. 4.1, in Week 1, there were 1,552 (29.82% of 5,204) Sequential, 350 (6.73%) Global and 3,302 (63.45%) Middle learners. These numbers and proportions changed however in Weeks 2 to 6 (Fig. 5). Fewer and fewer learners behaved as Sequential; more and more as Middle; whereas Global fluctuated – the peek appeared in Week 2 and the smallest number appeared in Week 5 (neither the first nor last week). This could be due to the fact that learners who enrolled had less time and were trying to see beforehand if they could perform the final activities in weeks, to finish potentially earlier. The fact that more learners acted as Middle learners is, however, a more worrying trend, and is potentially an early indicator for learners hovering on the brink of leaving the course altogether. 100% 80% 60% 40% 20% 0%

1,552 350

1,045 866

757 495

685 314

576 249

364 386

3,302

3,293

3,952

4,205

4,379

4,454

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Squential Global Middle

Fig. 5. Learner groups changes over weeks

Figure 6 shows how learners switched from one group to another, i.e. the transition probabilities between navigation styles. The strength of the arrows represents the proportion of learners switching; e.g., from Weeks 1 to 2, 57.15% (887) Sequential learners remained Sequential, whilst 29.45% (457) became Global and 13.40% (208) became Middle. Across weeks, the majority (57.15%–77.15%) of Sequential learners remained i.e. kept their navigation style; yet, for Global learners, the largest percentage (41.42%–67.55%) became Middle, 21.02%–39.76% remained Global, and

52

L. Shi et al.

only 8.03%–20.29% became Sequential. The Middle group is the most stable, with 89.04%–98.86% remaining, and the smallest percent (0.16%–2.63%) becoming Sequential. Interestingly there is less of a “loss” between Sequential and Global learner groups, when compared to the “loss” between Global and Middle.

Fig. 6. Learners switching between groups

Each learner’s Navigation style in each week was labelled: S for Sequential, G for Global, and M for Middle, resulting in 236 distinct navigation style changing paths; e.g. “SGSSGM” refers to learners who were Sequential in Week 1 and became Global in Week 2, etc. In total, 3,076 (59.11%) learners kept navigation style, i.e. remaining “SSSSSS”, “GGGGGG” or “MMMMMM”; whereas 2,128 (40.89%) learners changed navigation style at least once. The majority (2,824; 54.27%) kept being Middle i.e. “MMMMMM”. They were the least active: on average only visiting 5.50 (r = 2.90) and completing 4.20 (r = 2.74) steps. Another 252 learners also kept their navigation style across weeks: 234 “SSSSSS” and 18 “GGGGGG” learners; the former visited 82 (r = 0) and completed 81.17 (r = 5.04) steps, and the latter visited 75.39 (r = 12.40) and completed 70.22 (r = 22.26) steps. Other popular navigation style changing paths include “SGMMMM” for 286 learners, “SMMMMM” for 192 learners, and “MGMMMM” for 177 learners. Completion. Figure 7 shows the distribution of the steps that learners stopped the course. For Sequential and Global groups, peaks appear towards the end of the course, meaning the largest proportion of learners in these two groups stopped after having visited some part of the course. In particular, 23.07% (358/1,552) Sequential learners stopped after having visited the last step - they completed the whole course; whilst 8.29% (29/350) Global learners completed the whole course. The majority of the Sequential learners (87.82%; 1,363/1,552), and more than half of the Global learners (61.71%; 216/350) went on visiting step(s) in Week 2 and later weeks. On the contrary, the largest proportions of Middle learners stopped in Week 1: 12.78% (422/3302) stopped at Step 1.2, and {9.06%, 12.75%, 12.11%, 10.84%} stopped at Step {1.3, 1.4, 1.5, 1.6}; and only 14.51% of them went on visiting other step(s).

Exploring Navigation Styles in a FutureLearn MOOC

53

Percent of learners

30.00% Sequencial Global Middle

20.00% 10.00%

1.1 1.3 1.5 1.7 1.9 1.11 1.13 2.1 2.3 2.5 2.7 2.9 2.11 3.1 3.3 3.5 3.7 3.9 3.11 3.13 4.1 4.3 4.5 4.7 4.9 4.11 5.1 5.3 5.5 5.7 5.9 5.11 6.1 6.3 6.5 6.7 6.9 6.11 6.13 6.15 6.17

0.00% The dropout step (the step last visited)

Fig. 7. Distribution of dropout steps

Interestingly “peaks” for Sequential learners appear periodically at the end of each week. This means if a Sequential learner were to stop, they likely stopped after visiting all steps in the current week; but, for Global and Middle learners, there isn’t a clear periodical pattern of such – in line with the sequential learning style (compared to the global learning style [16]), where learners who have this preference would complete one subject at the time. Presumably, they don’t like to stop in the middle of a subject but have the need (or compulsion) to finish it first, before they either move on to a new subject, or to a different course/endeavor.

5 Conclusions We’ve analyzed in-depth learners’ navigation styles on a FutureLearn MOOC with massive learner presence. We’ve also investigated the changes of navigation styles along the course. The results and discussions have answered four research questions raised in Sect. 1. For RQ 1 what are the real navigation styles of MOOC learners? we’ve identified 3 navigation styles including Sequential, Global and Middle, based on whether and how the learners followed the pre-defined directed linear learning path. For RQ 2 how do these navigation styles relate to traditional theories of Learning Styles? we’ve mapped the 3 identified navigation styles over the well-known FelderSilverman Learning Styles [16]. In particular, Sequential and Global are mapped over the 2 extreme ends of the sequential-global continuum, and the third style, i.e. Middle, represents the learners whose navigation style did not comply with either extreme end but fell in between. For RQ 3 how do different navigation styles affect the course completion? we’ve shown the navigation style chosen is strongly related to the chance of completion, and the variability in styles is related to this outcome, too. For RQ 4 which are the learners that are particularly in need of help? we’ve identified learners in need for help, especially amongst those that have potentially better chances of completion e.g. Global learners. Overall, this study shows clearly navigation styles are important for the success in MOOCs, and they can be related with dropout/completion. However, this study also shows styles aren’t constant for learners, and, whilst they might prefer studying in a specific style in general, swaps between styles still can happen relatively often. We show which styles are more “stable” in this sense than others. Unfortunately, the most stable style, Middle, is also the one most likely to lead to disengagement and ultimately leaving the course. The Sequential style is the most successful, but still has a great loss towards other styles. Global is the most unstable style – such learners still have a

54

L. Shi et al.

chance to succeed, in terms of completing either the whole or a good proportion of the course, but they clearly need help in remaining on track and being focused. This brings us to the other main finding, that of finding potentially the most vulnerable category of learners, identifiable by their navigation style early on starting from week 1. Therefore, online teachers or personalized systems can focus on helping them. On the other hand, the Middle style is broad and mostly unsuccessful - in need of further analysis in terms of refining the type of behavior encountered within it, to provide specialized and personalized help and direction.

References 1. 1st International Conference on Learning Analytics and Knowledge 2011 | Connecting the Technical, Pedagogical, and Social Dimensions of Learning Analytics. https://tekri. athabascau.ca/analytics/. Accessed 01 Mar 2020 2. Shi, L., Cristea, A.I.: In-depth exploration of engagement patterns in MOOCs. In: Hacid, H., Cellary, W., Wang, H., Paik, H.-Y., Zhou, R. (eds.) WISE 2018. LNCS, vol. 11234, pp. 395–409. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02925-8_28 3. Papamitsiou, Z., Economides, A.A.: Learning analytics and educational data mining in practice: a systematic literature review of empirical evidence. J. Educ. Technol. Soc. 17, 49– 64 (2014) 4. Ferguson, R., Clow, D.: Examining engagement: analysing learner subpopulations in massive open online courses (MOOCs). In: Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, LAK 2015, pp. 51–58. ACM Press, Poughkeepsie, New York (2015). https://doi.org/10.1145/2723576.2723606 5. Alexander, C.: A Pattern Language: Towns, Buildings, Construction. OUP, New York (1978) 6. Romero, C., Ventura, S.: Educational data mining: a survey from 1995 to 2005. Expert Syst. Appl. 33, 135–146 (2007). https://doi.org/10.1016/j.eswa.2006.04.005 7. Alamri, A., et al.: Predicting MOOCs dropout using only two easily obtainable features from the first week’s activities. In: Coy, A., Hayashi, Y., Chang, M. (eds.) ITS 2019. LNCS, vol. 11528, pp. 163–173. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22244-4_20 8. Pardo, A., Jovanovic, J., Dawson, S., Gašević, D., Mirriahi, N.: Using learning analytics to scale the provision of personalised feedback. Br. J. Educ. Technol. 50, 128–138 (2019). https://doi.org/10.1111/bjet.12592 9. Zhang, X., Meng, Y., Ordóñez de Pablos, P., Sun, Y.: Learning analytics in collaborative learning supported by Slack: from the perspective of engagement. Comput. Hum. Behav. 92, 625–633 (2019). https://doi.org/10.1016/j.chb.2017.08.012 10. Shoufan, A.: Estimating the cognitive value of YouTube’s educational videos: a learning analytics approach. Comput. Hum. Behav. 92, 450–458 (2019). https://doi.org/10.1016/j. chb.2018.03.036 11. Cristea, A.I., Alamri, A., Kayama, M., Stewart, C., Alshehri, M., Shi, L.: Earliest predictor of dropout in MOOCs: a longitudinal study of futurelearn courses. Presented at the 27th International Conference on Information Systems Development (ISD2018), Lund, Sweden, 22 August (2018) 12. Shi, L., Cristea, A., Toda, A., Oliveira, W.: Revealing the hidden patterns: a comparative study on profiling subpopulations of MOOC students. In: The 28th International Conference on Information Systems Development (ISD2019). Association for Information Systems, Toulon, France (2019)

Exploring Navigation Styles in a FutureLearn MOOC

55

13. Zhu, M., Bergner, Y., Zhang, Y., Baker, R., Wang, Y., Paquette, L.: Longitudinal engagement, performance, and social connectivity: a MOOC case study using exponential random graph models. In: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, LAK 2016, pp. 223–230. ACM Press, Edinburgh (2016). https:// doi.org/10.1145/2883851.2883934 14. Yang, B., Shi, L., Toda, A.: Demographical changes of student subgroups in MOOCs: towards predicting at-risk students. Presented at the 28th International Conference on Information Systems Development (ISD2019), Toulon, France, August (2019) 15. Van Laer, S., Elen, J.: The effect of cues for calibration on learners’ self-regulated learning through changes in learners’ learning behaviour and outcomes. Comput. Educ. 135, 30–48 (2019). https://doi.org/10.1016/j.compedu.2019.02.016 16. Felder, R.M., Silverman, L.K.: Learning and teaching styles in engineering education. Eng. Educ. 78, 674–681 (1988) 17. Kolb, A.Y., Kolb, D.A.: Learning styles and learning spaces: enhancing experiential learning in higher education. Acad. Manag. Learn. Educ. 4, 193–212 (2005) 18. Kirschner, P.A.: Stop propagating the learning styles myth. Comput. Educ. 106, 166–171 (2017). https://doi.org/10.1016/j.compedu.2016.12.006 19. Hassan, M.A., Habiba, U., Majeed, F., Shoaib, M.: Adaptive gamification in e-learning based on students’ learning styles. Interact. Learn. Environ. 1–21 (2019). https://doi.org/10. 1080/10494820.2019.1588745 20. O’Grady, N.: Are Learners Learning? (and How do We Know?). https://about.futurelearn. com/research-insights/learners-learning-know. Accessed 23 Feb 2019 21. Clow, D.: MOOCs and the funnel of participation. In: Proceedings of the Third International Conference on Learning Analytics and Knowledge, LAK 2013, p. 185. ACM Press, Leuven (2013). https://doi.org/10.1145/2460296.2460332

Changes of Affective States in Intelligent Tutoring System to Improve Feedbacks Through Low-Cost and Open Electroencephalogram and Facial Expression Wellton Costa de Oliveira1(B) , Ernani Gottardo2 , and Andrey Ricardo Pimentel3 1

Universidade Tecnol´ ogica Federal do Paran´ a, Francisco Beltr˜ ao, PR, Brazil [email protected] 2 Instituto Federal do Rio Grande do Sul, Erechin, RS, Brazil [email protected] 3 Universidade Federal do Paran´ a, Curitiba, PR, Brazil [email protected]

Abstract. Many works in the literature show that positive emotions improve learning. However, in the educational context, the affective dimension is often not adopted in the teaching-learning process. One of them is that there are many students for a teacher, making the practice of adapting the didactics and individualized feedbacks practically impossible. The low or sometimes no emotion analysis of those involved in learning also becomes a obstacle. One possibility to circumvent this problem is the use of Intelligent Tutoring Systems (ITS), to understand the student individually and adapt environments according to their use. It also adds the theories of emotions so that the ITS can understand the affective dimension of the student during activities. This paper aims to present a way to infer changes in a student’s affective states to improve feedbacks in ITS For this, facial expressions and brain waves (using a lowcost equipment called openBCI) were studied for acquisition and emotions. In the initial tests, the methodology has met what was expected, however, more studies with experiments must be carried out. Keywords: Intelligent Tutoring System · Affective states · Feedbacks · EEG · Brain-computer interface · Facial expressions

1

Introduction

Emotion is very important for most people’s activities, such as learning. The number of scientific articles about the influence of emotions on learning has been growing. Research suggests that a student learns best when he feels positive emotions (such as joy), which results in enthusiasm, curiosity, involvement c Springer Nature Switzerland AG 2020  V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 56–62, 2020. https://doi.org/10.1007/978-3-030-49663-0_8

Changes of Affective States in Intelligent Tutoring System

57

or challenge, while negative emotions (such as sadness, anger or fear) cause anxiety, apathy, frustration and with that, resulting in a low learning [1,4]. In other words, identifying the emotions of each student during a specific period is extremely important for the educational system to achieve its objective, which is to achieve social quality for all, systematically ensuring the appropriation of knowledge accumulated by humanity and the development of skills. However, it becomes an impractical activity for a teacher to understand and to monitor learning with the emotions of each of the students in the classroom due to the number of students in relation to the number of teachers. Another factor that prevents this is that, although some teachers know the emotion of some students when solving a test, identifying what emotion is happening at a certain moment in a person is very difficult. In short, we have two serious problems: 1) teachers are not adapted to each student because it is impractical and 2) recognizing students’ emotions is a complex task. The use of digital technologies (such as desktop computers or smartphones/tablets) has been growing exponentially in recent years. Such growth directly affects schools/universities, changing the teaching-learning process and certain educational paradigms. For example, the problem of the impossibility of each teacher adapting to the student’s learning can be compensated with the creation of Intelligent Tutoring System (ITS), which is a computer system that provides personalized instructions or feedback.

2

Proposed Methodology

The proposed methodology consists of a system that identifies valence and arousal values through brain waves from a low-cost and open Brain-Computer Interface called OpenBCI and through facial expressions with WebCam. After that, a decision-level fusion happens and thus, the mixed values of valence and arousal are sent to the learner module, so that the tutor module choose and send the feedbacks to the learner according to this emotion. The Fig. 1 illustrates the proposed methodology. 2.1

Valence and Arousal Through EEG

For signal acquisition, OpenBCI was used. It is an Open Source e Open Hardware Brain-computer interface. Many researches used OpenBCI, obtaining satisfactory results [2,10,17]. The Fig. 2 shows the OpenBCI Manufacturing by authors. To measure valence and arousal, four channels in the frontal lobe area [i.e., electrodes AF3 and F3 in the left area and electrodes AF4 and F4] are used. After, mean PSD values of alpha and beta frequency ranges were calculated for these signals. PSD measures the average energy distribution as a function of frequency of a signal. Here, the mean PSD is expressed as a logarithmic value (decibel), which can have either positive or negative value. Then, the valence and arousal dimensions of emotion are calculated using the mean PSD and the FEA indicators, based on Eqs. (1) and (2). Specifically, a positive valence is associated

58

W. C. de Oliveira et al.

Fig. 1. Proposed methodology

Fig. 2. OpenBCI manufacturing

with relatively greater activation of the left frontal area, whereas a negative valence is more related to relatively greater activation of the right frontal area. Equation (1) thus indicates a relative difference of activations between two areas to show the valence level. Although there is no predefined range of valence levels, a more positive valence value means more pleasant emotion with more activation of a left area than a right one. On the other hand, Eq. (2) indicates the arousal level by calculating the alpha/beta ratio [3,11]. Similarly, a greater arousal value indicates individual’s more aroused emotional state α(F 4) α(F 3) − β(F 4) β(F 3)

(1)

α(AF 3 + AF 4 + F 3 + F 4) β(AF 3 + AF 4 + F 3 + F 4)

(2)

V alence = Arousal =

Changes of Affective States in Intelligent Tutoring System

59

where alpha(i) and β(i) = PSD of alpha and beta frequency range obtained from ith channel of the EEG signal. 2.2

Valence and Arousal Through Facial Expression

Several works using deep learning were successful in their results [5,16]. In this paper, AffectNet was used, which is by far the largest database of facial expression, valence and arousal enabling research in automated facial expression recognition in two different emotion models [13]. Two baseline deep neural networks are used to classify images in the categorical model and predict the intensity of valence and arousal. Various evaluation metrics show that our deep neural network baselines can perform better than conventional machine learning methods and off-the-shelf facial expression recognition systems [9,13]. In this paper, a webcam is used to carry out tests. 2.3

Fusion Method

Two fusion approaches are well established: Decision-level and Feature-level fusion [6–8]. In feature-level fusion, the features from each modalities are united and the classification proceeds the same as for the singles modalities. In decisionlevel fusion, first it classifies the modalities individually as described above and then it combines the classifier outputs in a linear fashion. In this paper, we used the Decision Level fusion with equal weights fusion (WEQ). W-EQ is the most straightforward method, where the output probabilities for each class are an equal weighting of the class probabilities from each single modality (α = 0.5) [8]. That is, px0 = 0.5pxe + p.5pxf .

(3)

where px0 represents the result of valence or arousal after fusion. pxe represents the results of valence or arousal from EEG, and pxf represents the results of valence or arousal from Facial Expression.

3

Adaptive Feedback According to Emotion

In [15], it was developed a system for programming practice that provides adaptive feedback based on the presence of confusion on the student. In [14], it describes an approach to help students involved in a programming tutoring system, providing them with feedback during the code of problem-solving activities. The purpose of this work is similar to the works mentioned above, however, feedbacks are automatically displayed through texts in a message in the virtual learning environment, staying a few seconds and then erasing. This approach is similar to messages on social networks like facebook. Once the tutor module verifies the student’s emotion at that moment, it makes a selection from a feedback database (based on [12]) so that he displays it to the

60

W. C. de Oliveira et al.

student in text form. The proposal of the work is that it only sends feedbacks when the emotion is negative, so that the changes of emotion take place. Thus, the study happens when verifying the impact of that feedback on the student’s emotion while studying. Figure 3 displays a flowchart of how the tutor module selects feedback.

Fig. 3. Flowchart of how the tutor module selects feedback

4

Discussion

Emotions play a fundamental role in the teaching-learning process. This dimension (affective) is often overlooked and ignored by teachers. This work aims to build a system that identifies emotions using EEG signals, through a low cost BCI, and using facial expression, through webcam, to be used in an ITS to improve feedbacks. The methodology presented in this work makes it possible to study the affective changes of a student when using an ITS and, through feedbacks, to verify how the emotion behaves, and thus, to raise which feedbacks are more positive for that student. In the inicial tests, the methodology has met what was expected, however, more studies must be carried out. The next step of this work is to carry out experiments with undergraduate students from a university to test the system with more samples and create a database with feedbacks and their respective affective changes.

References 1. Ainley, M.: Connecting with learning: motivation, affect and cognition in interest processes. Educ. Psychol. Rev. 18(4), 391–405 (2006)

Changes of Affective States in Intelligent Tutoring System

61

2. Aldridge, A., et al.: Accessible electroencephalograms (EEGs): a comparative review with openBCI’s ultracortex mark IV headset. In: 2019 29th International Conference Radioelektronika (RADIOELEKTRONIKA), pp. 1–6, April 2019. https://doi.org/10.1109/RADIOELEK.2019.8733482 3. Blaiech, H., Neji, M., Wali, A., Alimi, A.M.: Emotion recognition by analysis of EEG signals. In: 13th International Conference on Hybrid Intelligent Systems (HIS 2013), pp. 312–318, December 2013. https://doi.org/10.1109/HIS.2013.6920451 4. Brand, S., Reimer, T., Opwis, K.: How do we learn in a negative mood? Effects of a negative mood on transfer and learning. Learn. Instr. 17(1), 1–16 (2007). https://doi.org/10.1016/j.learninstruc.2006.11.002. http://www.sci encedirect.com/science/article/pii/S0959475206001150 5. Chang, W., Hsu, S., Chien, J.: FATAUVA-Net: an integrated deep learning framework for facial attribute recognition, action unit detection, and valence-arousal estimation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1963–1971, July 2017. https://doi.org/10.1109/ CVPRW.2017.246 6. Huang, Y., Yang, J., Liao, P., Pan, J.: Fusion of facial expressions and EEG for multimodal emotion recognition. Comput. Intell. Neurosci. 2017, 2107451 (2017). https://doi.org/10.1155/2017/2107451. http://europepmc.org/ articles/PMC5625811 7. Huang, Y., Yang, J., Liu, S., Pan, J.: Combining facial expressions and electroencephalography to enhance emotion recognition. Future Internet 11(5), 105 (2019) 8. Koelstra, S., Patras, I.: Fusion of facial expressions and EEG for implicit affective tagging. Image Vis. Comput. 31(2), 164–174(2013). https://doi. org/10.1016/j.imavis.2012.10.002. http://www.sciencedirect.com/science/article/ pii/S0262885612001825. Affect Analysis in Continuous Input 9. Kollias, D., et al.: Deep affect prediction in-the-wild: aff-wild database and challenge, deep architectures, and beyond. Int. J. Comput. Vis. 127(6–7), 907–929 (2019) 10. Lakhan, P., et al.: EDOSE: emotion datasets from open source EEG with a realtime bracelet sensor. arXiv abs/1810.04582 (2018) 11. Lewis, R.S., Weekes, N.Y., Wang, T.H.: The effect of a naturalistic stressor on frontal EEG asymmetry, stress, and health. Biol. Psychol. 75(3), 239–247 (2007). https://doi.org/10.1016/j.biopsycho.2007.03.004. http://www.sciencedirect.com/ science/article/pii/S0301051107000506 12. Mohanan, R., Stringfellow, C., Gupta, D.: An emotionally intelligent tutoring system. In: 2017 Computing Conference, pp. 1099–1107, July 2017. https://doi.org/ 10.1109/SAI.2017.8252228 13. Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2019). https://doi.org/10.1109/TAFFC.2017.2740923 14. Silva, P., Costa, E., de Ara´ ujo, J.R.: An adaptive approach to provide feedback for students in programming problem solving. In: Coy, A., Hayashi, Y., Chang, M. (eds.) ITS 2019. LNCS, vol. 11528, pp. 14–23. Springer, Cham (2019). https://doi. org/10.1007/978-3-030-22244-4 3 15. Tiam-Lee, T.J., Sumi, K.: Adaptive feedback based on student emotion in a system for programming practice. In: Nkambou, R., Azevedo, R., Vassileva, J. (eds.) ITS 2018. LNCS, vol. 10858, pp. 243–255. Springer, Cham (2018). https://doi.org/10. 1007/978-3-319-91464-0 24

62

W. C. de Oliveira et al.

16. Turabzadeh, S., Meng, H., Swash, R.M., Pleva, M., Juhar, J.: Facial expression emotion detection for real-time embedded systems. Technologies 6(1) (2018). https://doi.org/10.3390/technologies6010017. https://www.mdpi. com/2227-7080/6/1/17 17. Yohanandan, S.A.C., Kiral-Kornek, I., Tang, J., Mshford, B.S., Asif, U., Harrer, S.: A robust low-cost EEG motor imagery-based brain-computer interface. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5089–5092, July 2018. https://doi.org/10.1109/ EMBC.2018.8513429

Computer-Aided Grouping of Students with Reading Disabilities for Effective Response-to-Intervention Chia-Ling Tsai1(&), Yong-Guei Lin2, Ming-Chi Liu2, and Wei-Yang Lin2 1

Queens College, CUNY, Queens, NY 11367, USA [email protected] 2 Chung Cheng University, Chiayi, Taiwan

Abstract. Our research work focuses on computer-aided grouping of students based on questions answered in an assessment for effective reading intervention in early education. The work can facilitate placement of students with similar reading disabilities in the same intervention group to optimize corrective actions. We collected ELA (English Language Arts) assessment data from two different schools in USA, involving 365 students. Each student performed three mock assessments. We formulated the problem as a matching problem—an assessment should be matched to other assessments performed by the same student in the feature space. In this paper, we present a study on a number of matching schemes with low-level features gauging the grade-level readability of a piece of writing. The matching criterion for assessments is the consistency measure of matched questions based on the students’ answers of the questions. An assessment is matched to other assessments using K-Nearest-Neighbor. The best result is achieved by the matching scheme that considers the best match for each question, and the success rate is 17.6%, for a highly imbalanced data of only about 5% belonging to the true class. Keywords: Reading intervention

 Machine-learning  Students grouping

1 Introduction The Common Core Learning Standards (CCLS), adopted by forty-four USA states and the District of Columbia, define the knowledge and skills that a student should demonstrate by the end of each grade. One crucial skill emphasized by CCLS is the reading ability, which is the precursor for learning in all content areas, including science, technology, engineering, and mathematics (STEM). Close to 90% of the school-age population with learning disabilities have significant difficulties in reading comprehension [8]. The long-term goal of this study is to develop a computer algorithm to automate grouping of students based on error patterns in individual assessments. The work can facilitate placement of students with similar reading disabilities in the same intervention

© Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 63–67, 2020. https://doi.org/10.1007/978-3-030-49663-0_9

64

C.-L. Tsai et al.

group to optimize corrective actions in early elementary education. In our previous work [7], we focused on comparison of different feature representations. In this paper, we present our study on a number of matching schemes, with low-level features adopted by literacy experts for gauging the grade-level readability of a piece of writing. Similar research can be found in the area of personalized speech disorder therapy [2]— the system makes predictions on the outcome of a therapy based on the big volume of data collected from patients who have completed the therapy program.

2 Methodology The dataset consists of three parts. The first part is the set of 6 fourth grade New York State mock ELA examinations from year 2005 to 2010. Only multiple-choice questions are considered. The second part is the set of student assessments, involving a total of 365 fourth grade students from 17 reading intervention classes. Every participant was assigned an identification number and should participate in three mock examinations. No other background information, such as prior academic performance or demographic data, was collected to protect the privacy of the participants. The third part is a dictionary representing the lexical knowledge base for children up to fourth grade. One of the major challenges on grouping students is lack of groundtruth information, since students’ reading disabilities, which is defined as lack of certain fundamental cognitive skills, are unknown. To overcome the challenge of validating correct grouping of students with similar reading disabilities, we make an assumption about a student having similar performance in multiple assessments, so one assessment should be grouped with other assessments done by the same student, since the only piece of groundtruth information available is the temporal relationship among assessments done by the same student. The grouping problem is formulated as a supervised classification problem using KNearest-Neighbor (KNN) matching with very imbalanced data, since only two out of forty-four labeled sample points are considered the same class as the input sample. The set of features for matching contains attributes measuring words and sentence complexity. To match questions between two assessments, three schemes are examined: (a) Gale-Shapley algorithm [4], in which two questions, one from each assessment, are uniquely matched, (b) multiple matching algorithm, in which every question can have n number of best matching questions from another assessment, and (c) collective matching algorithm [7], in which all questions together define the feature vector of an assessment for matching with other assessments. 2.1

Feature Selection

An assessment is composed of a number of questions. In our study, all the questions are multiple choice questions. A set of features is computed for each question. Such measures include average word length (number of syllables), average sentence length (number of words), word frequency, and text cohesion. Many studies have supported the predicting powers of those measures [3, 5, 6]. To estimate text cohesion, we first convert a sentence to a speech graph [5], with each word being represented with its

Computer-Aided Grouping of Students with Reading Disabilities

65

part-of-speech (PoS) tag as a node, and the outgoing edge connecting to the PoS tag of the next word. The graph contains at most one node per PoS tag, so different words of the same tag share the same node. The number of loops and the number of nodes in the speech graph measure the text cohesion of a sentence. We extract the tags using Stanford Log-linear Part-Of-Speech Tagger system [1]. In addition, the average phonotactic probability is also computed for each sentence, which measures the likelihood of occurrence of sounds [9]. Each question/passage is assigned a feature vector. The dissimilarity function in the feature space is computed as the distance between two vectors. To take into consideration the variance in each feature dimension, the function is defined as the Mahalanobis distance. The population for deriving the covariance matrix consists of all the multiple-choice questions and original articles broken up into short pieces of about 100 words. 2.2

Matching Schemes

When matching questions between two assessments using Gale-Shapley (GS) algorithm, we view our problem as a stable marriage problem, in which the m men and w women are uniquely matched to one person of the opposite sex. If m 6¼ w, some individuals can find no match. When matching questions between two assessments A1 and A2 using Multiple Question Matching (MQM) algorithm, a list of ordered questions in A2 based on the dissimilarity measure is computed for every question Q in A1 . The top n closest matches are retained for Q. Using the matching scheme studied in [7], individual questions are not matched. Instead, a feature vector is generated from all questions collectively for an assessment. The final feature vector is normalized by the number of questions in an assessment. We name this algorithm Collective Matching (CM). For both Gale-Shapley and Multiple Question Matching algorithms, the similarity between two assessments is computed as the number of question pairs in which both questions are correctly or incorrectly answered, indicating consistent behavior. For Collective Matching algorithm, the similarity is the inverse distance between two feature vectors, representing two assessments. To classify an assessment, K-NearestNeighbor is applied.

3 Results and Evaluation The total number of students participated in the study is 365. However, only 281 attended all 3 mock examinations. Given a student assessment, it is matched against assessments of the other two examinations done by the same class. Table 1 shows the comparison among the three matching schemes with various K values for KNN. For MQM, n ¼ 1.

66

C.-L. Tsai et al. Table 1. Success rates of KNN matching with K  10 using various matching schemes K GS (%) MQM (%) CM (%)

1 12.5 17.6 10.5

2 12.6 17.6 10.5

3 4 5 6 11.5 10.9 10.4 10.3 16.3 15.8 14.0 13.1 10.3 9.9 9.6 9.9

7 8 9 10 10.2 9.6 9.5 9.5 12.6 11.2 11.1 10.3 10.3 10.6 9.9 9.7

MQM constantly performs the best, followed by GS. Based on the improvement from both algorithms over CM, it is shown that matching individual questions is more effective than merging all questions, when comparing student answers of two assessments. When matching individual questions, MQM has the advantage of two questions matching to the same one, which is more realistic than GS, since questions in the same assessment are not guaranteed to be far apart in the feature space. Such questions should be matched to the same question in another assessment, instead one being matched to a less similar question for uniqueness.

4 Discussion and Future Work We investigated a number of matching schemes for effective grouping of students for reading intervention. The best accuracy achieved is 17.6% by taking the best match for each question without the uniqueness constraint. The improvement over [7] is 7.54%. The reported accuracy is largely affected by the problem of imbalanced data—only 5% of samples belonging to the true class. The data also suffers from having low signal to noise ratio for two reasons. First, students are known to make random choices in an examination when feeling lost. Second, the assumption about students maintaining similar performance in multiple assessments violates the purpose of intervention, since students are expected to improve on reading abilities if the intervention is effective. For the future work, we should examine more well-known feature representations in natural language processing, such as word2vec and bag-of-words, to capture similarity in contexts. Original passages of reading comprehension should also be explored. It is also important to properly address the issue of imbalanced data for correct assessment of the algorithm performance.

References 1. Stanford Log-linear Part-of-Speech Tagger, stanford Log-linear Part-of-Speech Tagger 2. Danubianu, M., Socaciu, T.: Does data mining techniques optimize the personalized therapy of speech disorders? JACM 5, 15–18 (2009) 3. Fry, E.: A readability formula that saves time. J. Reading 11, 513–516 (1968) 4. Gale, D., Shapley, L.: College admission and the stability of marriage. Am. Math. Mon. 69 (1), 9–14 (1962) 5. Mota, N.B., Vasconcelos, N.A.P., et al.: Speech graphs provide a quantitative measure of thought disorder in psychosis. PLoS ONE 7(4), e34928 (2012). https://doi.org/10.1371/ journal.pone.0034928

Computer-Aided Grouping of Students with Reading Disabilities

67

6. Ryder, R.J., Slater, W.H.: The relationship between word frequency and word knowledge. J. Educ. Res. 81(5), 312–317 (1988) 7. Tsai, C.-L., Lin, Y.-G., Lin, W.-Y., Zakierski, M.: Computer-aided intervention for reading comprehension disabilities. In: Coy, A., Hayashi, Y., Chang, M. (eds.) ITS 2019. LNCS, vol. 11528, pp. 57–62. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22244-4_8 8. Vaughn, S., Levy, S., Coleman, M.: Reading instruction for students with LD and EBD: a synthesis of observation studies. J. Spec. Educ. 36, 2–13 (2002) 9. Vitevitch, M.S., Luce, P.A.: A web-based interface to calculate phonotactic probability for words and nonwords in English. Behav. Res. Methods 36(3), 481–487 (2004)

SHAPed Automated Essay Scoring: Explaining Writing Features’ Contributions to English Writing Organization David Boulanger(&) and Vivekanandan Kumar Athabasca University, Edmonton, AB, Canada {dboulanger,vivek}@athabascau.ca

Abstract. This study applies the state of the art in explainable AI techniques to shed light on the automated essay scoring (AES) process. By means of linear regression and Shapley values, SHAP (Shapley Additive Explanations) approximates a complex AES predictive model implemented as a deep neural network and an ensemble regression. This study delves into the essentials of the automated assessment of ‘organization’, a key rubric in writing. Specifically, it explores whether the organization and connections between ideas and/or events are clear and logically sequenced. Built on findings from previous work, this paper, in addition to improving the generalizability and interpretability of the AES model, highlights the means to identify important ‘writing features’ (both global and local) and hint at the best ranges of feature values. By associating ‘organization’ with ‘writing features’, it provides a mechanism to hypothesize causal relationships among variables and shape machine-learned formative feedback in human-friendly terms for the consumption of teachers and students. Finally, it offers an in-depth discussion on linguistic aspects implied by the findings. Keywords: Automated essay scoring  Explainable artificial intelligence Linguistic analysis  Deep learning  SHAP  Rubric  Organization



1 Background This study is a continuation of the work reported in [1], where the potential of deep learning and the state of the art in automated linguistic analysis is assessed and explained using one of the latest XAI (explainable artificial intelligence) algorithms, the Shapley Additive Explanations (SHAP). Explainable artificial intelligence or interpretable machine learning has been defined as “the use of machine-learning models for the extraction of relevant knowledge about domain relationships contained in data” [2]. Adhering to this, SHAP unifies into a single framework six additive feature attribution methods (LIME, DeepLIFT, and classic Shapley value estimation among others) and approximates predictive models through simpler but interpretable linear explanation models [3]. Given the extensiveness of the work to interpret the scoring process of a

© Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 68–78, 2020. https://doi.org/10.1007/978-3-030-49663-0_10

SHAPed Automated Essay Scoring

69

complex automated essay scoring (AES) model (ensemble regression and deep learning) and given the hotspots of AES research have been discussed in our previous papers [1, 4], we ask the reader to excuse the omission of a discussion on the latter here. The goal of SHAP is to explain the predicted score, in this case the rubric score, of an essay by computing the contributions of writing features to the prediction [5]. SHAP took inspiration from a coalitional game theory where the contribution of each team’s player is measured in unit of the game outcome (e.g., number of goals) based on their individual multi-dimensional performance (e.g., scores, passes, game time, etc.). Each contribution is quantified and the resulting value is called a SHAP value. SHAP brings together LIME’s linear explanation model approach [6] and Shapley values, which together meet the desirable properties of a robust explanation model as specified in [3, 5]. In sum, SHAP presents several advantages [5]: 1) it unifies the field of interpretable machine learning, 2) it is solidly grounded in game theory, and 3) feature contributions explaining predictions are fairly distributed among feature values. Several implementations of SHAP exist, including KernelSHAP and TreeSHAP. KernelSHAP is a model-agnostic kernel-based estimation approach for Shapley values. However, KernelSHAP may have some caveats. First, it requires access to the entire dataset to compute the SHAP values of new data. Second, it is computationally slower when estimating Shapley values to assess global feature importance over large datasets. Third, KernelSHAP suffers from the same problems as permutation-based methods such as ignoring feature dependence (interaction). Fourth, SHAP values are only estimated, not exactly computed. Although these issues are all addressed by the fast TreeSHAP implementation, designed for tree-based models, this study opted for the KernelSHAP implementation in order to explore and shed light on the black box of a deep learning AES model. The first two issues were, however, not detrimental to this study. No matter the SHAP implementation used, SHAP values are challenging to interpret, a key contribution of this paper. This paper is the first of its kind to engage in a deep interpretation of SHAP in AES, contributing to build trust in AES through transparency. Although this paper’s findings are not causal yet, the reader should find a way through SHAP to discover at large scale interaction, mediation, or confounding effects as well as linear and non-linear relationships among writing features, a step in the right direction toward causal AES models. Again, we ask the reader to be understanding; it is strictly impossible due to space constraints and the huge scope of the writing features feeding our AES model to describe all writing features and employ meaningful but lengthy tags; contrariwise, we were obligated to use code names for the sake of concision. However, for more information on the writing features, please consult [1]. Moreover, this study’s code, dataset, and documentation are available at https://osf.io/6ncsf/. Thus, this paper first presents SHAP insights and how they can be used to understand AES. Next, the linguistic implications of the SHAP analysis are discussed.

70

D. Boulanger and V. Kumar

2 Methodology A multi-layer perceptron neural network was trained using the same methodology reported in [1] plus the following improvements. First, it improved the AES model’s generalizability by selecting writing features through filter methods on the 894 ungraded essays of the Automated Student Assessment Prize (ASAP)’s seventh essay corpus, the original validation and testing sets, given that feature selection ought to be done on a dataset different from the one the model was trained and tested on (the 1567 essays of ASAP’s seventh training dataset). Second, it eliminated further collinearity among variables, among others filtering out one of each pair of features with a Pearson correlation coefficient greater than 0.7 (previously 0.9), which left 282 writing features out of an original pool of 1592 features. Third, it re-introduced word count-related variables because text length, despite controversy in the literature [4], may still be one of several criteria to score essays [7]. Fourth, to optimize further model training, the hyperparameter space has been enlarged to include SGD-specific hyperparameters. This paper dives into the explanation of the second rubric of ASAP’s seventh dataset by training an ensemble regression model. The reader is invited to consult [1, 7, 8] to compare this study’s outcomes against previous findings on assessment of writing organization.

3 Results This study showcases SHAP insights by analyzing a dataset of 1567 narrative essays, written by Grade-7 students in a state-wide assessment in the USA. The assessment prompted students to write a story about patience. A scoring model assessing the rubric ‘organization’ was trained over 1253 essays. A testing set of 314 essays was put aside to evaluate the performance of the trained rubric scoring model. The model was required to predict scores using integers between 0 and 6. Overall, the essay samples had an average text length of 171 words. In order to highlight how SHAP can explain each predicted rubric score, one essay (ID 18618) was selected from the testing set as shown in Fig. 1, 2 and 3. The essay’s rubric score predicted by the AES system was 4.58, which when rounded to the nearest integer produced an accurate prediction, that is a 5. Figure 1 is a force plot that highlights the basics of SHAP-value explanations. SHAP-value explanations are provided in relation to a certain base value 4.058, the mean of all rubric scores predicted by the model over all essay samples of the training set. Thus, the explanation model describes why it predicted 4.58 instead of the expected prediction 4.058. The writing features in red denote the reasons why the model assigned a score higher than the base value. In contrast, the features in blue describe why a score greater than 4.58 was not given. Each red or blue segment corresponds to a SHAP value and the difference between the length of all red segments and the length of all blue segments corresponds to the difference between the predicted score and the base value.

SHAPed Automated Essay Scoring

71

Fig. 1. Force plot of an individual essay indicating writing features’ contributions to prediction. (Color figure online)

Figure 2 shows the force plots of all testing set’s 314 essays, rotated 90°, stacked horizontally, and ordered by similarity of explanations. The essay represented in Fig. 1 is the dashed vertical line at rank 199. The grayish area around it represents a cluster of five essays (two to the left and two to the right) sharing similar explanations, while the other dashed vertical lines denote a cluster of essays having the closest rubric scores to 4.58, two smaller and two greater. The force plots also show the impact of text length (word count) on the grading of low-quality and high-quality essays in terms of organization.

Fig. 2. The force plots of all 314 essays of the testing set.

The two decision plots in Fig. 3 look into greater details at these clusters. A decision plot reads from bottom to top, each line representing an essay. The dashed line represents the essay in Fig. 1. Each line starts at the base value (not seen in Fig. 3 since only the 20 most important writing features are shown) and ends at the predicted rubric score. The vertical gray line is the base value 4.058. Writing features are listed and ordered by importance according to the aggregated SHAP values of each aforementioned cluster of five essays. Each line segment represents the contribution (SHAP value) of the corresponding writing feature to the predicted rubric score. It is advantageous to use SHAP to build explanation models because it provides a single framework to discover the writing features that are important to an individual essay (local) or a group of essays (global). While the decision plots list features of local importance, Fig. 4’s summary plot ranks writing features by order of global importance (from top to bottom). All testing set’s 314 essays are represented as dots in the scatter plot of each writing feature. The position of a dot on the horizontal axis corresponds to the importance (SHAP value) of the writing feature for a specific essay and its color

72

D. Boulanger and V. Kumar

Fig. 3. Decision plots. Left: essays with most similar explanations with feature values of the essay in Fig. 1 listed. Right: essays with most similar rubric scores.

Fig. 4. Summary plot listing 20 most globally important features and their contribution ranges. (Color figure online)

indicates the magnitude of the feature value in relation to the range of all 314 feature values. For example, a large or small number of words within an essay generally contributes to increase or decrease the rubric score by up to 1.25 and 0.75, respectively.

SHAPed Automated Essay Scoring

73

Although the summary plot in Fig. 4 is insightful to determine whether small or large feature values are desirable, the dependence plots in Fig. 5 prove essential to recommend whether a student should aim at increasing or decreasing the value of a specific writing feature. The dependence plots also reveal whether the student should directly act upon the targeted writing feature or indirectly on other features. The horizontal axis in each of the dependence plots in Fig. 5 is the scale of the writing feature and the vertical axis is the scale of the writing feature’s contributions to the predicted scores. Each dot in a dependence plot represents one of the testing set’s 314 essays, that is, the feature value and SHAP value belonging to the essay. The vertical dispersion of the dots on small intervals on the horizontal axis is indicative of interaction with other features. If the vertical dispersion is widespread (e.g., the [0.06, 0.12] horizontal-axis interval in the ‘all_logical’ dependence plot), then the contribution of the writing feature is most likely at some degree dependent on another writing feature.

Fig. 5. Dependence plots: measure of textual lexical diversity (MTLD) and logical connectives.

As disclosed by the summary plot in Fig. 4, the ‘word_count’ feature predominates the ranking of most important features. A simple linear regression was therefore trained to predict ‘organization’ rubric scores out of text length. Hence, Table 1 reports the performance of two baseline models, that is, a majority classifier (MC) and a wordcount (WC) regressor; the neural network model (NN) investigated in this paper; and the performance of human raters (H1H2). Table 1. Comparison of this study’s NN model against baseline models and human raters. Resolved scores MC WC NN QWK 0 0.49 0.68 Exact % 33.1 38.9 44.9 Adj. 1% 69.8 85.4 92.2

H1 MC 0 51.6 99.7

WC 0.35 58.6 99.7

NN 0.53 62.7 100

H2 MC 0 48.1 99.7

WC 0.34 57.0 99.0

H1H2 NN 0.53 0.54 61.5 60.8 99.7 97.5

74

D. Boulanger and V. Kumar

Basically, each essay is assigned a resolved score on a 0–6 scale, which is the sum of two human raters’ scores on a 0–3 scale. Given that 4 is the most frequent resolved rubric score and 2 is the most frequent score given by each human rater, the MC columns in the table report the performance of a majority classifier that would always give a resolved score of 4 to every student’s essay and a 2 per human rater. The first group of columns delineates performance in relation to resolved scores, the second and third column groups exhibit performance in relation to each human rater, and the last one reports agreement among human raters.

4 Discussion 4.1

Lexical Diversity

Several measures of lexical diversity are ranked among important features, both locally and globally. For instance, The D index (hdd42_fw), derived from the hypergeometric distribution, computes the average type-token ratio (TTR) of function words over multiple samples of 42 tokens randomly drawn from a text [9, 10]. This index of lexical diversity is measured non-sequentially, which avoids the bias of local clusters of function words. However, HD-D is dependent on text length and may become problematic when text length varies, as it is the case in this study. The measure of textual lexical diversity (MTLD) (mtld_ma_bi_cw) is evaluated sequentially making it sensitive to the order of words in the text, contrary to HDD which constantly produces the same index no matter how words are randomized in a text. While some features measure lexical diversity by calculating the average TTR over a fixed length of text (e.g., mean segmental TTR), MTLD measures the average text length (in words) with a fixed minimum TTR value (by default 0.72), segment after segment, making it an index independent of text length. This study’s model first considers the ratio of content words versus function words and determines that higher proportions of content words are more desirable and hence contribute to increase the ‘organization’ rubric score. Second, it looks at the diversity of both content words and function words using different methods. It leverages a feature independent of text length but sensitive to order of words to evaluate the variety of content words. Moreover, the dependence plot (left) in Fig. 5 reveals that text segments of at least 25 words having a TTR of content words of 0.72 or more contribute to improve the rubric score and that the size of the contribution just increases linearly as the mean length of text segments increases. On the other side, diversity in the usage of function words is assessed through a feature that is indifferent to word order and that rather cares about a more homogeneous usage. 4.2

Word Categories and Word Count

A few categories of words appear to play an important role in the assessment of writing organization. These categories, as per the General Inquirer (GI) category listings, include 2nd-person pronouns (You_GI) (9 words: thee, thou, thy, you, your, yours, yourself, yourselves), words reflecting tendency to use abstract vocabulary (Abs_2_GI)

SHAPed Automated Essay Scoring

75

(185 words: ability, complexity, knowledge, efficiency, leadership, etc.), evaluative words used in disambiguation (Eval_GI) (314 words: ability, able, acceptable, adept, adequate, etc.), and words referring to identifiable and standardized individual human behavior patterns (Role_GI) (569 words: acquaintance, adversary, ambassador, father, citizen, etc.). Moreover, the model uses a feature that measures the usage of positive adjectives, derived from a principal component analysis that fed upon several counts of adjectives from several word dictionaries. The summary plot (Fig. 4) highlights interesting patterns in the usage of those categories of words. A non-trivial presence of 2nd-person pronouns, roles, evaluative words, and abstract words contributes much more significantly to predict a lower rubric score than their limited usage or mere absence does to predict a higher score. It therefore means that directly addressing a person in a narrative essay is not recommended. Moreover, given the prompt to write a story on patience, repeatedly expressing words like ‘patience’ or ‘patient’ may signal a weak development of ideas. However, by examining the GI category listings further, it can be seen that the word ‘patient’ has three senses and that when used as a noun it is considered as a role (a person receiving medical treatment) and when used as an adjective it is considered as an evaluative word expressing a positive feeling. The AES system does not have yet the ability to identify the sense of a word. This may explain why it deems abstract, role, evaluative, and positive words to be undesirable. Figures 2, 3 and 4 reveal that essay length (number of words) is of both local and global importance, which considerably increases as the length of the essay moves away from the average. Note also that the AES explanation model does not leverage ‘word count’ to systematically justify all essay rubric scores. Figure 3 (left) highlights two essays in which word count does not contribute at all to the rubric score prediction. Given word count’s greater importance (Fig. 4), a simple linear regression model was trained to predict rubric scores only from word counts. It was found that 38.9% of essays (testing set) got an accurate prediction, 85.4% of predictions were accurate or off by 1, and 98.7% were accurate or at most off by 2 (Table 1). Table 1 exhibits that this study’s neural network (NN) model outperformed the word-count (WC) model by 6% in terms of accurate predictions, which at its turn outperformed the majority classifier (MC) by 5.8%. To compare the models’ performance against human raters, models’ predictions (regression) were rescaled to fit human raters’ scale (0–3). Table 1 reveals that 60.8% of essays were given identical rubric scores by the two human raters and 97.5% of human ratings were either exact or off by one. It is noticeable that 1) in terms of accurate predictions only the NN model surpassed human performance (62.1%), 2) the WC model’s performance was slightly lower than humans’ one (57.8%), 3) the MC model’s performance came short of human performance by only 11%, and 4) all three models outperformed human performance in terms of exact and adjacent matches. These findings reveal that AES performance should not be 1) evaluated only in terms of accuracy but also in terms of interpretability and 2) underestimated just because of the strong correlation that exists between word counts and essay scores [4].

76

4.3

D. Boulanger and V. Kumar

Noun Phrase Complexity and Clause Complexity

The model considers the average number of dependents per nominal (a noun or group of words acting as a noun) (av_nominal_deps), a measure of noun phrase complexity, as important to determine the quality of a text’s organization. Moreover, Fig. 4 appears to recommend that students should write less complex noun phrases, suggesting that a richer vocabulary can reduce the complexity of certain noun phrases. The model also looks at the number of direct objects per clause (dobj_per_cl). Direct objects cannot occur without the presence of transitive verbs. Though direct objects may be compound, only one direct object per clause is possible. Thus, a low rate of direct objects per clause might mean higher usage of intransitive verbs or linking verbs. Similarly, the model pays attention to the presence of both simple and complex nominative subject complements, that is, the varying (standard deviation) number of dependents per predicate nominative (ncomp_stdev). Thus, the model appears to indirectly give importance to the variety of sentence structures: those with linking verbs and those with action verbs. Larger numbers of dependents per predicate nominative possibly imply the presence of dependent clauses, that is, of a complex sentence, while smaller numbers might indicate a simple sentence. The number of possessives per object of the preposition is another piece of information that the rubric scoring model feeds upon (poss_pobj_deps_struct). A possessive is either a noun ending with a “s” (in the singular form) or a possessive pronoun (e.g., mine, yours, his, hers, ours, theirs). In most cases, only one possessive will occur per object of preposition, implying that the model favorably views the very presence of such possessives. Possessives are often used in prepositional phrases to avoid concatenating too many prepositional phrases. 4.4

Lexical Sophistication and Cohesion

An essay with function words that are phonographic neighbors (OG_N_FW), words differing in exactly one letter and one phoneme, and having higher age of acquisition (Kuperman_AoA_FW) [1] possibly exhibits greater lexical sophistication. Given the smaller size of the set of all function words compared to the size of the set of all content words, the high frequency of one-syllable function words, and their grammatical importance, we hypothesize that these features are somewhat correlated and hence redundant to the ‘hdd42_fw’ feature, which measures the diversity of function words. Larger numbers of orthographic neighbors (Ortho_N), words that differ by only one letter, serve as evidence against giving higher rubric scores. Moreover, the more specific verbs are (walk vs. go), measured as the average number of superordinate terms a verb has, according to their first sense and first path in the WordNet semantic network, the more they contribute to increase rubric scores (hyper_verb_S1_P1). From the perspective of cohesion, the ‘order’ feature calculates the ratio of words expressing order (e.g., to begin with, next, first, etc.) over the total number of words in the text. Likewise, the ‘all_logical’ feature calculates the ratio of logical connectives (e.g., actually, admittedly, after all, etc.) over the total number of words. In the former case, higher ratios are more desirable, while in the latter case, lower ratios are better

SHAPed Automated Essay Scoring

77

indicators of writing organization. The dependence plot (right) in Fig. 5 suggests that percentages of logical connectives should be smaller than 4%. When below this threshold, it denotes evidence for higher organization; when above the threshold, it suggests lower quality of organization. The feature’s level of importance increases/decreases linearly as the feature value increases/decreases. 4.5

Features of Local and Global Importance

Figure 3 shows how it is possible to examine the set of features that is important to the prediction(s) of the rubric score(s) of either a single essay or a small group of essays. The decision plot (left) exhibits the prediction paths of five essays with similar explanations. Observe how less intermingled the lines appear compared to the decision plot on the right. Similar explanations tend to look parallel. Note that the explanation model can provide essays of different rubric scores (4 versus 5) with similar explanations. On the other side, the decision plot (right) shows that there are various ways to get a rubric score of 5 and that a feature may not have the same importance for all essays. For instance, the usage of direct objects (dobj_per_cl) contributes to raise the rubric score of one essay more significantly, while it contributes to decrease the score of the essay in Fig. 1, and yet for another it merely ignores it. Notice how the lists of most important features differ between decision plots and the summary plot in Fig. 4.

5 Future Work and Conclusion The linguistic interpretations made in this paper are merely the views of the AES explanation model; they are not causal representations of the real world [5]. However, SHAP imparts powerful insights to formulate causal hypotheses, which could be tested using Pearl’s calculus of causation and structural causal models [11]. This calculus of causation essentially provides the machine with a quantitative and qualitative (e.g., causal diagrams) language to encode causal knowledge and take into account the data generation process, which makes the difference between causal inferencing and statistical inferencing [2]. For example, less than 282 out of the original 1592 writing features are used in this paper to explain predictions. However, among the strongly correlated features that were eliminated, some might have been more intuitive and interpretable for the human than those selected by the model. Hence, it is important to understand the causal relationships among variables and the potential confounding, mediation, and interaction effects among them to shape human-friendly feedback using the more meaningful variables that will really help students write higher-quality writings.

References 1. Kumar, V., Boulanger, D.: Automated essay scoring and the deep learning black box: how are rubric scores determined? Int. J. Artif. Intell. Educ. (submitted)

78

D. Boulanger and V. Kumar

2. Murdoch, W.J., Singh, C., Kumbier, K., Abbasi-Asl, R., Yu, B.: Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. 116, 22071–22080 (2019) 3. Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, pp. 4765–4774. (2017) 4. Kumar, V., Fraser, S.N., Boulanger, D.: Discovering the predictive power of five baseline writing competences. J. Writ. Anal. 1, 176–226 (2017) 5. Molnar, C.: Interpretable machine learning. Lulu.com (2019). https://christophm.github.io/ interpretable-ml-book/ 6. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?”: explaining the predictions of any classifier. CoRR. abs/1602.0 (2016) 7. Taghipour, K.: Robust trait-specific essay scoring using neural networks and density estimators. Doctoral dissertation (2017) 8. Zupanc, K., Bosnić, Z.: Automated essay evaluation with semantic analysis. Knowl.-Based Syst. 120, 118–132 (2017) 9. McCarthy, P.M., Jarvis, S.: MTLD, vocd-D, and HD-D: a validation study of sophisticated approaches to lexical diversity assessment. Behav. Res. Methods 42, 381–392 (2010) 10. Torruella, J., Capsada, R.: Lexical statistics and typological structures: a measure of lexical richness. Procedia-Soc. Behav. Sci. 95, 447–454 (2013) 11. Pearl, J., Mackenzie, D.: The Book of Why: The New Science of Cause and Effect. Basic Books, New York (2018)

Probabilistic Approaches to Detect Blocking States in Intelligent Tutoring System Jean-Philippe Corbeil1(B) , Michel Gagnon1 , and Philippe R. Richard2 1

Polytechnique Montr´eal, 2900 Boulevard Edouard-Montpetit, Montr´eal, QC H3T 1J4, Canada {jean-philippe.corbeil,michel.gagnon}@polymtl.ca 2 Universit´e de Montr´eal, 90 Avenue Vincent-D’Indy, Outremont, QC H2V 2J7, Canada [email protected]

Abstract. A blocking state is a measurable state on an intelligent tutoring systems’ user interface, which mirrors a student’s cognitive state where she/he cannot temporarily make any progress toward finding a solution to a problem. In this paper, we present the development of four probabilistic models to detect a blocking state of students while they are solving a Canadian high school-level problem in Euclidean geometry on an ITS. Our methodology includes experimentation with a modified version of QED-Tutrix, an ITS, which we used to gather labelled datasets composed of sequences of mouse and keyboard actions. We developed four predicting models: an action-frequency model, a subsequence-detection model, a 1D convolutional neural network model and a hybrid model. The hybrid model outperforms the others with a F1 score of 80.4% on the classification of blocking state on validation set while performing 77.3% on the test set. Keywords: Blocking state · Student modeling · Probabilistic framework · Convolutional neural network · QED-Tutrix

1

Introduction

Intelligent tutoring systems (ITS) are promising tools for teachers as well as students because they can provide an adaptable framework for teaching and learning. One fundamental missing aspect of ITSs is the ability to detect moments when the learner needs help. If the ITS interacts too quickly, it will disturb the learner, who might lose his chain of thoughts. If the ITS intervenes too slowly, the learner will not get enough support from the system. One solution is to let the student ask for help on her/his own. Still, most learners do not have good help-seeking behaviors [1]. In this paper, we present our solution that detects the moments when learners are in a blocking state. We develop this solution based on c Springer Nature Switzerland AG 2020  V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 79–88, 2020. https://doi.org/10.1007/978-3-030-49663-0_11

80

J.-P. Corbeil et al.

probabilistic models trained on sequential data gathered from experimentation with QED-Tutrix [6,7,9]. QED-Tutrix is an ITS built to help students to produce a proof for a given problem in Euclidean geometry [6,9]. It runs on top of Geogebra [5], a dynamic geometry software. The system requires the student to submit inferences (hypothesis, results or justification), using a list of possible inferences. QED-Tutrix uses these submitted inferences to keep track of the student’s progress in one of the many possible solutions. The finite state machine, managing all interactions, is tuned to give feedback on inferences and to interact every minute with the student. This interaction is very limited. This issue leads us to the following question: Can we detect a blocking state, which is when a learner needs help, by considering the actions on the user interface? Our contributions are: 1. We defined the concept of the blocking state and its operationalization by considering actions on the user interface of the ITS. 2. We gathered a dataset of ITS logs in which we measured blocking states. 3. We predicted blocking states with probabilistic models using the sequence of actions that happened before. In the next section, we describe the background around the concept of blocking state. Then, we explain the approach to gather our train and test datasets in two experiments and the preprocessing of the data. The Sect. 4 presents our models and their performances on the dataset. Finally, we discuss the threats to the validity.

2

Background

We found two concepts close to the blocking state: help-seeking behaviours and the wheel-spinning state. However, none of these can answer our research question. Aleven et al. [1] made exhaustive research on help-seeking behaviours. They hypothesized that an ITS could help to become better learners when guiding the students’ meta-cognitive abilities. They created the Help tutor using 57 production rules for correct and incorrect help-seeking behaviours. According to their model, they found that 73% of help-seeking behaviours are help-seeking errors. Aleven et al. [2] reviewed the literature up to 2016. They found that hints in ITS only help when the learner has an intermediate level of skill [8]. Aleven et al. also mentioned that hints are useful when students are stuck and need help to continue. Thus, there is a need for the detection of blocking state in ITSs. Its primary purpose should be to give hints at the right time to stuck students. Research on wheel-spinning [3,4] showed that it is crucial to help students at the right moment to maintain their involvement and avoid negative behaviours. To address these issues, they defined the concept of “wheel spinning” as a state where a student cannot achieve mastery of a skill at no further point in time.

Probabilistic Approaches to Detect Blocking States in ITS

81

They showed that around 60% of students mastered a skill in ten practice opportunities. These are attempts in which the learner tries to solve mathematical problems requiring this peculiar skill. There are two distinctions between our definition of blocking state and the wheel-spinning state. First, our goal is to detect a student in need of help, within a given problem. Second, we are trying to predict moments when the learner is stuck.

3

Methodology

3.1

Experimental Setup

We gathered the training data from a group of 19 second-year students from the Universit´e de Montr´eal, in the teaching of the mathematics program. Their age is around 20 years, both sexes were well represented, and their background in mathematics is a one-year study focused only in mathematics. We asked them to solve four Euclidean geometry problems in QED-Tutrix. Furthermore, we collected the test data from a group of 4 graduate students in software engineering. Their age is around 24 years, and they are all male. The data was fully anonymized. The experiment for the training set took place in Polytechnique Montr´eal on the 2nd of October 2017. First, we gave students 20 min to get familiar with the original software version. They had to solve the classical rectangle problem. After, using a modified version of QED-Tutrix, they were given 30 min to solve each problem. We instructed them to push the help button on the interface if they felt in need of help. They were also told that the system provides no help after this action. We partitioned the group into three subgroups. Each subgroup had a different order of problems to limit fatigue effects. The gathered data is about the actions of the students on the QED-Tutrix interface. We used the version of QED-Tutrix shown in Fig. 1. First, we added a help button on the interface. Second, we decided to turn off the feedback provided by the system at every minute. This feedback would cause tutorial bias, which might affect the way that blocking states happen. We kept the positive and negative feedback from the tutor when an inference is submitted. Third, we added listeners in QED-Tutrix. These mouse and keyboard actions are: – Enter (E): the mouse enters a region of the interface. – Click (C): the user clicks on a region on the interface. – Change tab (T): the user selected a different central panel between GeoGebra, Sentences or Proof. – Scroll chatbox (SC): the user is scrolling to look at the history of interactions. – Select filter (FI): the user selects a filters to choose an inference. – Chosen sentence (CS): The user chooses an inference. We also added specific actions that are not part of the resolution1 , but are useful to indicate the end of a sequence: 1

Throughout this paper, we use the term “resolution” in the sense of “the action of solving a problem”.

82

J.-P. Corbeil et al.

Fig. 1. QED-Tutrix interface in which we can see the help button in red, bottom-right corner. (Color figure online)

– Submit: the user completed an inference and clicked the Submit button. – Exit: the user ended her/his resolution. – Help flag: the user clicks the help button. When the learner moves the mouse or clicks on the interface, we get an E or C event, respectively. The student can change the central panel with tabs— a T event. In the Sentences panel, the learner has five filters to help her/him find an inference. These filters correspond to mathematical objects or propriety names that filter the set of inferences—an FI event. When the student selects an inference, the event is CS. When he/she filled the inference’s blank boxes, the learner can submit it (submit event). If the learner needs to revise his resolution with the chatbox interactions, it is an SC event. If the learner completes his/her proof, the system records an Exit event. 3.2

Blocking State Definition

We define a blocking state as a temporary state in which the learner cannot progress in her/his proof. We assume that the ITS could help the student to get out of this state and resume his progress in solving the problem. For the operationalization, we consider that the learner is in either blocking states when a Help flag action occurs or when QED-Tutrix gives negative feedback after a submit action. We also consider a blocking state when the learner quits without having solved the problem. We chose these events because we can measure them well with our system. The student is undoubtedly in need of help. A student that clicked the help button is explicitly telling the system that, at

Probabilistic Approaches to Detect Blocking States in ITS

83

this point, she/he needs help. When the learner submitted a wrong inference, she/he is also telling the system that there was something wrong. For instance, we might have considered the long interval of time without any interaction as a blocking state. However, the learner might have been distracted. Thus, it is not a precise measurement and distractions are not of interest—even if they increase the probability that the student is in a blocking state. We considered that H = T rue when one of the three targeted events occurs, and H = F alse for the other events that end an action sequence. With our definition, we preprocessed the whole sequence of actions executed by the learner during one resolution of a problem as follows. A resolution is a sequence of actions that we partitioned into mutually exclusive subsequences, each one ending with a blocking state H. More formally, let us denote S1 , . . . , Si , . . . , Sn the n sequences of actions of a student trying to solve one problem during one session of resolution. We define one boolean variable Hi for each sequence Si . We hypothesize that a sequence of previous actions is representative of the cognitive state of the learner. Note that our models will not take into account the overall history of proof. We keep this feature for further research. 3.3

Dataset

In our experiment, we have collected actions from learners for each proof. Table 1 shows informations about the train and test datasets. The trainset contains 72 proofs from our 19 participants. We extracted a total of 1436 sequences from these 72 proofs. There are 148 sequences that we could label as blocking states. We see that the blocking state is our minority class, nearly 1:10. We note in the test dataset that there are fewer blocking states with only around 6%. This is explained by the greater ease in using softwares and by the diligence of the test set population. Very few proofs were completed by the end of the thirty minutes in the training set session, and none for the test set session, which indicates that our problems are challenging. Table 1. Train and test sets information. Information

Train set Test set

Number of actions

39,777

17,796

72

15

Number of resolutions Number of completed proofs Number of sequences Number of blocking states (H = true)

5

0

1,436

705

148

40

To conduct the training process, we separated our training set into training/validation sets (85/15), maintaining similar distributions of H states. To do

84

J.-P. Corbeil et al.

this, we down-sampled sequences to obtain equal numbers of non-blocking and blocking states in the validation set. We did this sampling procedure ten times, and we evaluated it ten times. We took the average of these. We did the final evaluation by predicting on the test set.

4 4.1

Models Action Frequency Model MF

The Fig. 2a and 2b show the ratio of occurrences of each action at each time position. The first figure shows the distribution for blocking states, while the second one shows for non-blocking states. Since the frequencies of every action oscillate around an average value, we chose to represent them with a Gaussian function, whatever is the position of the sequence.

(a) Blocking state sequences.

(b) Non-blocking state sequences.

Fig. 2. Distribution of action frequencies across time.

Let a ∈ A be an action in the set {SC, CS, F I, T, E, C}. If we consider a sequence Si , we can compute the probability of a blocking state H for Si by looking at the frequency of the actions. For one specific sequence Si , let fa (Si ) be the ratio of occurrences of action a in sequence Si over the total number of actions in Si . We need to estimate the distribution of fa for both possible values of H, which we approximated with a Gaussian distribution. In our model MF , the observed frequency comes from one of the two Gaussian distributions. We can then assimilate the probability P [H|Si , a] like the one from a null hypothesis test. We get the same formula as for the double tail event. Once computed for both value of H, we can combine  we renormalize them. Finally,  for all actions P [H|Si ]MF = a∈A ωa P [H|Si , a] where a∈A ωa = 1.

Probabilistic Approaches to Detect Blocking States in ITS

4.2

85

Subsequence Detection MS

For Λ, the set of all recurrent patterns of actions in the sequences, we can develop the following conditional probability:  P [Si |λ] P [λ|H] P [H] (1) P [H|Si ]MS =  λ∈Λ λ∈Λ,H P [Si |λ] P [λ|H] P [H] Where we assumed the conditional independence between the random variable λ and H given Si . 4.3

Convolutional Network Model MC

The third model is a 1D convolutional neural network (CNN). The input of the network is our Si , and the output is a probability that the learner needs help P [H|Si ]MC . The network topology is sequential and is as follows: 1D CNN (40 filters, 6 actions kernel size, relu), Dropout layer (40%), 1D CNN (15 filters, 3 actions kernel size, relu), Flatten layer, Dropout layer (40%) and dense layer (sigmoid activation). The topology and the hyperparameters were tuned by hand. 4.4

Hybrid Model MH

We can mix the last three models’ output into a hybrid model with two hyperparameters α1 and α2 constrained by 0 ≤ α1 + α2 ≤ 1: P [H|Si ] = α1 P [H|Si ]MC + α2 P [H|Si ]MS + (1 − α1 − α2 ) P [H|Si ]MF

5

(2)

Results

Since we want the ITS to interact at the right moment with the learner, we considered the F1 score. This score implies that we aim for models that are both good at identifying a sequence of actions in a blocking state (precision) and finding as much as possible these sequences (recall). In Fig. 3, we can see the F1 score metric of MH for all possible values α1 and α2 . We clearly observe a performance peak at α1 = 0.184 and α2 = 0.796. The global maximum F1 score is 80.4. The proportion of action frequency model is deduced with 1 − α1 − α2 , which gives 0.02. However, we notice another local maximum of 80 % close to α1 = 0.22 and α2 = 0.05. This indicates similar performances from MF and MS . Table 2 shows the scores for all the models on the validation and test sets. For the validation results, the precision of MC is high at 99%, which is probably caused by overfitting. However, it struggles with a recall of 50%. This performance means that MC can be sure that the learner requires help, but it is not the best model to find new sequences in this state. On the other hand, MF and MS

86

J.-P. Corbeil et al.

Fig. 3. Performances of MH across the hyperparameters α1 and α2 . Table 2. Performance metrics for all models. Validation set Metrics

MF

MS

MC

Test set MH

MF

MS

MC

MH

P recision 57.9 56.8 99.0

83.4 48.6 56.4 92.4 79.9

Recall

77.8 77.8 50.0

77.8 62.5 82.5 65.0 75.0

F1 score

66.3 65.5 66.4

80.4 54.6 67.0 76.3 77.3

are useful to find new sequences in need of help with recalls of 77.8%. Still, they are not precise. Combining them into MH results on outstanding performances that combine the best of all models. This combination leads to a F1 score of 80.4% with excellent precision on the validation set. Also, we can note that the precision and recall are close. On the test set, with the best model MH , we get slightly lower performances than on train set, with a F1 score for blocking state of 77.3% and precision of 79.9%. This result is still outstanding since both populations have considerable differences. This score also indicates that the behaviours of both groups are very close to QED-Tutrix’s interface when they are in a blocking state. Nevertheless, we note a drastically lower performance from MF on the test set, which indicates underfitting. Overall, MC offering close performance to MH , MH is still a better option for maximizing precision and recall while being robust. To put the model in production, we would only keep MS and MC . From MS and MC , we observe a better performance than on the training set. We think that this better performance is due to population differences since

Probabilistic Approaches to Detect Blocking States in ITS

87

we would normally expect a lower performance on the test set for any model. We argue that the algorithms learned, with the data of the training set, a subset of subsequences that are more frequently used by the software engineering graduated students. This claim seems reinforced by the higher recall of MS and MC , which is the capacity of the model to find less false negative. To prove this point, we have calculated the entropy  H of subsequences of both training set and test set with the equation H = − λ∈Λ pλ ln (pλ ). Applied to the action subsequences, it measures the randomness of a sequence. High entropy is associated with random actions in a sequence. We consider a dataset containing low-entropy sequences of actions to have more patterns and to be more structured. The maximum subsequence length taken into account was twelve actions since, above this number, we considerably found fewer subsequences. We calculated 3.99 for the train set and 2.53 for the test set. We observe a drastically lower entropy in our test set, which represents a drop of 1.5 in entropy, a logarithmic scale. This lower entropy indicates that our test set has more recurring patterns compared to our training dataset. Since a lower entropy dataset has more recurring patterns in it, the prediction power of a model on them raises if we train this model on a higher entropy dataset. When looking at missed blocking states, we can identify patterns leading to misclassification. First, the hybrid model seems to associate the chosen sentence action CS more frequently to non-blocking states than blocking states, which makes most blocking state with CS action in the sequence harder to identify successfully. We also see the same association with the presence of FI, filters selection in the sentences panel. On the other hand, we notice that SC and C actions, scrolling in the history of the chatbox and click, respectively, are strong indicators of a blocking state. These observations are confirmed by looking at the differences in the distributions in Figs. 2a and 2b. Most of the identified blocking states are simple sequences with E and T actions. These sequences are mostly learners roaming across the user interface.

6

Threats to Validity

We mitigated both internal and external threats in the design of experiment. Maturation threat and multiple-treatment threat were mitigated with the rotation of problems. We also removed tutorial biases by disabling interventions from QED-Tutrix. The testing threat is present in our experimentation since we did a pretest with the rectangle problem. It was desirable to reduce blocking states caused by the user interface since they are not of interest in this study. We avoided the experimenter effects with cautious interactions, where we did not give direct help to students. The generalization of this study is not straightforward since the samples were from a specific population with only one ITS. From a blocking state detection perspective, we still believe that this study’s results are similar to those that we could have obtained with a larger population because of the way that students solve problems in QED-Tutrix. We also found that there are similarities in the patterns of actions of both populations.

88

7

J.-P. Corbeil et al.

Conclusion

This work developed probabilistic models for the detection of blocking states, where the learner requires tutoring help. We were able to prove that we can predict when a student needs help when using an ITS by tracking her/his actions on the user interface. The hybrid model, which combines all the other models, outperformed on the test set, even if the test population had a very different background. It performs well with a F1 score of 77.3%. The development of our models provides a new promising framework to support learners at the “right” moment on online learning platforms and ITS. Acknowledgements. We thank the FRQNT (Fonds de Recherche du Qu´ebec Nature et Technologies) and CRSH (Conseil de Recherches en Sciences Humaines).

References 1. Aleven, V., Mclaren, B., Roll, I., Koedinger, K.: Toward meta-cognitive tutoring: a model of help seeking with a cognitive tutor. Int. J. Artif. Intell. Educ. 16(2), 101–128 (2006) 2. Aleven, V., Roll, I., McLaren, B.M., Koedinger, K.R.: Help helps, but only so much: Research on help seeking with intelligent tutoring systems. Int. J. Artif. Intell. Educ. 26(1), 205–223 (2016). https://doi.org/10.1007/s40593-015-0089-1 3. Beck, J.E., Gong, Y.: Wheel-spinning: students who fail to master a skill. In: Lane, H.C., Yacef, K., Mostow, J., Pavlik, P. (eds.) AIED 2013. LNCS (LNAI), vol. 7926, pp. 431–440. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-64239112-5 44 4. Gong, Y., Beck, J.E.: Towards detecting wheel-spinning: future failure in mastery learning. In: Proceedings of the Second (2015) ACM Conference on Learning@ Scale, pp. 67–74. ACM (2015) 5. Hohenwarter, M., Fuchs, K.: Combination of dynamic geometry, algebra and calculus in the software system geogebra. In: Computer Algebra Systems and Dynamic Geometry Systems in Mathematics Teaching Conference (2004) 6. Leduc, N.: QED-Tutrix: Syst`eme tutoriel intelligent pour l’accompagnement des ´el`eves en situation de r´esolution de probl`emes de d´emonstration en g´eom´etrie plane. Ph.D. thesis, Ecole Polytechnique, Montreal, Canada (2016) 7. Richard, P.R., Fortuny, J.M., Gagnon, M., Leduc, N., Puertas, E., TessierBaillargeon, M.: Didactic and theoretical-based perspectives in the experimental development of an intelligent tutorial system for the learning of geometry. ZDM Math. Educ. 43(3), 425–439 (2011). https://doi.org/10.1007/s11858-011-0320-y 8. Roll, I., Baker, R.S.D., Aleven, V., Koedinger, K.R.: On the benefits of seeking (and avoiding) help in online problem-solving environments. J. Learn. Sci. 23(4), 537–560 (2014) 9. Tessier-Baillargeon, M.: GeoGebraTUTOR: d´eveloppement d’un syst´eme tutoriel autonome pour l’accompagnement d’´el`eves en situation de r´esolution de probl`emes de d´emonstration en g´eom´etrie plane et gen`ese d’un espace de travail g´eom´etrique idoine. Ph.D. thesis, Universit´e de Montr´eal, Montreal, Canada (2016)

Avoiding Bias in Students’ Intrinsic Motivation Detection Pedro Bispo Santos1(B) , Caroline Verena Bhowmik2 , and Iryna Gurevych1 1

2

Ubiquitous Knowledge Processing (UKP) Lab, Darmstadt, Germany [email protected] Department of Psychology, University of Koblenz-Landau, Koblenz, Germany [email protected] http://www.ukp.tu-darmstadt.de

Abstract. Intrinsic motivation is the psychological construct that defines our reasons and interests to perform a set of actions. It has shown to be associated with positive outcomes across domains, especially in the academic context. Therefore, understanding and identifying peoples’ levels of intrinsic motivation can be crucial for professionals of many domains, e.g. teachers aiming to offer better support to students’ learning processes and enhance their academic outcomes. In a first attempt to tackle this issue, we propose an end-to-end approach for recognition of intrinsic motivation, using only facial expressions as input. Our results show that visual cues from students’ facial expressions are an important source of information to detect their levels of intrinsic motivation (AUC = 0.570, F1 = 0.556). We also show how to avoid potential bias that might be present in datasets. When dividing the training samples per gender, we achieved a substantial improvement for both genders (AUC = 0.739 and F1 = 0.852 for male students, AUC = 0.721 and F1 = 0.723 for female students). Keywords: Affective computing · Fairness in AI · Behavioral analytics · Facial expressions · Intrinsic motivation · Nonverbal signals · Educational psychology

1

Introduction

Motivation is a theoretical construct that describes the willingness of individuals to perform certain activities or to behave in a specific way. It explains the initiation, effort and persistence of behaviours, including its direction and intensity [10]. As a driving force, motivation affects almost every aspect of a person’s life, be it personally, professionally, or academically. Existing motivation theories focus on different aspects of the construct. A well-known and widely applied P. B. Santos and C. V. Bhowmik—Authors share equal contribution. The FAZIT-STIFTUNG supported this work. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU and the TitanX Pascal GPU used for this research. c Springer Nature Switzerland AG 2020  V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 89–94, 2020. https://doi.org/10.1007/978-3-030-49663-0_12

90

P. B. Santos et al.

theory is the Self Determination Theory [2], which explains that an intrinsically motivated person pursues her activities out of interest and willingness. Whereas intrinsic motivation describes an individual’s motivation to perform a task for the pleasure of it, extrinsic motivation merely stems from the interest in achieving a goal or receiving a reward through performing a task. Intrinsic motivation is often compared to or used interchangeably with the concept of interest [14]. Student motivation and interest play an essential role in academic performance and learning outcomes [12]. The possibility to automatically identify students’ levels of intrinsic motivation could bear numerous advantages, mainly through Intelligent Tutoring Systems (ITSes) [15]. The current work presents evidence about the benefits of using visual input signalled by students’ facial expression to identify their levels of intrinsic motivation. Students were recorded while working in dyads on physics experiments. Having only visual information as input, we subsequently trained and evaluated different models for classifying intrinsically motivated students. Our results show that visual cues from students’ facial expression are a piece of vital information to detect their self-reported levels of intrinsic motivation. Fairness in artificial intelligence applications is an essential topic since biases present on datasets can harm the machine learning models trained on them [13, 18]. In this sense, we also show that training models per gender not only avoids the potential bias present in the dataset but can also improve the results. We are the first to (1) present an approach to detect intrinsic motivation based solely on visual input, (2) apply this approach in the educational domain in classroom situations, (3) increase the performance of our models by training separated classifiers per genre avoiding sub-population bias.

2

Related Work

De Vicente and Pain [15] have already discussed the importance to include motivation recognition in ITSes. ITSes should monitor learner’s emotional and affective state in order to offer a more suitable tutoring strategy that can engage the learner when feeling unmotivated [1]. There are several approaches proposed in the educational domain to detect affective states from students. D’Mello and Graesser [5] used posture information captured by sensors in students’ chairs, together with dialog features to predict four different states: boredom, confusion, engagement, and frustration. Different approaches were also investigated, e.g. using only conversational features [3], or posture features [4] for training the models. All these approaches evaluated the experience of learners when using a non-affective and an affective ITS. Findings indicate significant improvements in students’ learning experience when using affective state detection [8]. De Vicente and Pain [16] tried to predict students’ motivation by using a rule-based system implemented in an ITS. However, their approach was evaluated based on the labels assigned by annotators and not on self-reported labels. Furthermore, they evaluated their system on a tiny number of samples with only

Avoiding Bias in Students’ Intrinsic Motivation Detection

91

ten students. Whitehill et al. [17] and Kaur et al. [9] tried to automatically recognize the engagement of students perceived by observers, while Monkaresi et al. [11] worked on the detection of self-reported levels of engagement.

3

Dataset

We recorded 45 German grammar school students in 10th grade who visited the university’s physics library to work on experiments in a 90-min long session. All participating students were aware of the video recordings and parents gave their consent before students’ visit at the university. Afterwards, we manually preprocessed the recordings to result in 45 videos of 45 s in length and showing only one student each. After the students completed the experiments, we asked them to fill out a four-point Likert scale questionnaire measuring their intrinsic motivation. We extracted two items from an intrinsic motivation in physics questionnaire [7] and asked students to assign scores to two sentences addressing their interest and joy in attending their physics lessons. The questions were: 1) I attend a physics class with joy, and 2) I have fun during a physics class. Answering categories were: 1 - never; 2 - sometimes; 3 - often; 4 - always. Reliability of the two items across all 45 participating students was very high (Cronbach’s α = 0.88). The student sample was made of 25 male and 20 female students (Ageμ = 15.80; Ageσ = 0.68; M in = 14; M ax = 17). Students’ intrinsic motivation was on average 2.19 (σ = 0.93) with 1 being the lowest and 4 being the highest level of motivation. Students’ intrinsic motivation scores where obtained by calculating the average of the scores for the two questions.

4

Analysis and Results

Our proposed end-to-end approach receives the raw input without requiring any exhaustive feature engineering. The preprocessing pipeline ended up generating 31, 475 data points which we used as our final dataset, with 59% of the data points corresponding to highly motivated students, and 41% of the data points corresponding to lowly motivated students. For detecting students’ intrinsic motivation, we employed four 2-dimensional convolutional layers, where each convolutional layer has a 5 × 5 kernel and a stride of 2. The first convolutional layer has 64 filters, and the number of filters keeps doubling until the last convolutional layer. All convolutional layers have batch normalization and use rectified linear as activation function, while the last fully connected layer uses sigmoid as the activation function. Since we aimed to detect whether students’ motivation was either high or low, we defined a threshold of 0.5. If the score was equal to or above this threshold, we set the data points from the student to high motivation, and if the score was below it, we set the data points to low motivation. We used a leave-one-student-out cross-validation setup. In this setup, we used all data points concerning 44 students for training and the data points for the

92

P. B. Santos et al.

remaining student for testing. The experiments were run 50 times with different random seeds to investigate how stable our approach is regarding random initialization. We used the majority classifier and a random classifier as baselines. Table 1 shows the results of our approach can detect the intrinsically motivated students. Our method achieved an average weighted F1 of 0.556, and an average AUC of 0.570. Table 1. Results concerning weighted F1 and AUC. System

AUC

Majority baseline 0.438

0.500

Random baseline 0.516 (±0.002)

0.500 (±0.003)

CNN

4.1

Weighted F1

0.556 (±0.015) 0.570 (±0.029)

Dataset Bias Analysis

Recent work has pointed out issues concerning bias and diversity in machine learning models [13,18]. Since these issues could be present in our data, we decided to divide the samples from our dataset based on a set of features. The chosen features (gender, skin colour, hair colour, and eyeglasses) were those that might mislead the classifier such that it relies on these particular subgroups of features instead of the information present in students’ facial expressions. The male gender had a significant Pearson correlation of 0.513 (p < 0.05), so the model may have discriminated gender instead of motivational levels since this is an end-to-end approach. To investigate this issue, we trained different models after dividing the dataset according to students’ gender. The other features did not have a significant correlation, so they were not further analyzed. We trained the models for each gender using a leave-one-student-out crossvalidation setup, and Table 2 shows the results. With this approach, we achieve excellent results, obtaining an average weighted F1 of 0.852 and an average weighted AUC of 0.739 for the male students and an average weighted F1 of 0.723 and an average AUC of 0.721 for the female students. In this further analysis, we used 50 different random seeds as well. For the male students, 86.1% of their data points correspond to motivated students and 13.9% to unmotivated students. For female students, 37.2% of their data points correspond to unmotivated students and 62.8% to motivated students. Since our model is an end-to-end method, it probably started to focus on gender identification instead of learning subtle characteristics in the facial expressions of students (e.g. action units, gaze, and so on). So, when training the models individually, they seemingly started to discriminate other features instead of gender, thus enhancing the results.

Avoiding Bias in Students’ Intrinsic Motivation Detection

93

Table 2. Weighted F1 and AUC for when separating the models per gender. M stands for male, and F for female. Majority and random are respectively the majority and random baselines. System

AUC - M

F1 - F

AUC - F

Majority 0.796

0.500

0.485

0.500

Random 0.760 (±0.003)

0.500 (±0.005)

0.534 (±0.003)

0.501 (±0.003)

CNN

5

F1 - M

0.852 (±0.022) 0.739 (±0.050) 0.723 (±0.034) 0.721 (±0.033)

Conclusion

The present work shows that end-to-end models trained on students’ facial expressions can detect their intrinsic motivation in classroom situations. We also highlight the importance of detecting bias in datasets, reinforcing recent discoveries in the literature about the necessary care that some particular demographics require [6]. The drawback of using an end-to-end approach based on convolutional architectures is that a possible bias present in the dataset might mislead models to predict the label under analysis. Fairness [18] and lack of diversity in datasets [13] for human-centric tasks is a recent trend in the machine learning community. Students’ intrinsic motivation is a characteristic that is highly relevant for student learning and academic outcomes. Because of that, a growing amount of research has been addressing teachers’ judgments of it in the regular classroom. The insights gained from the end-to-end approach applied in the present study will serve as a solid basis to further investigate and understand teacher judgment processes, and the role of students’ facial expressions in the communication of this construct. Hence, the knowledge gained from the present and future research can assist in the development of evidence-based methods and tools for teacher education and training.

References 1. Azevedo, R., Millar, G., Taub, M., Mudrick, N., Bradbury, A., Price, M.: Using data visualizations to foster emotion regulation during self-regulated learning with advanced learning technologies: a conceptual framework. In: International Learning Analytics and Knowledge Conference, Vancouver, BC, Canada, pp. 444–448 (2017) 2. Deci, E.: Intrinsic motivation, extrinsic reinforcement, and inequity. J. Pers. Soc. Psychol. 22(1), 113–120 (1972) 3. D’Mello, S., Craig, S., Witherspoon, A., McDaniel, B., Graesser, A.: Automatic detection of learner’s affect from conversational cues. User Model. User-Adap. Interact. 18(1–2), 45–80 (2008). https://doi.org/10.1007/s11257-007-9037-6 4. D’Mello, S., Graesser, A.: Automatic detection of learner’s affect from gross body language. Appl. Artif. Intell. 23(2), 123–150 (2009) 5. D’Mello, S., Graesser, A.: Mind and body: dialogue and posture for affect detection in learning environments. In: International Conference on Artificial Intelligence in Education, Los Angeles, CA, USA, pp. 161–168 (2007)

94

P. B. Santos et al.

6. Dwork, C., Immorlica, N., Kalai, A.T., Leiserson, M.: Decoupled classifiers for group-fair and efficient machine learning. In: Proceedings of the 1st Conference on Fairness, Accountability and Transparency, New York, NY, USA, vol. 81, pp. 119–133 (2018) 7. Frey, A., et al. (eds.): PISA 2006 Handbook of Scales. Documentation of Assessment Instruments (PISA 2006 Skalenhandbuch. Dokumentation der Erhebungsinstrumente). Waxmann (2009) 8. Johnson, W.L., Lester, J.C.: Face-to-face interaction with pedagogical agents, twenty years later. Int. J. Artif. Intell. Educ. 26(1), 25–36 (2015). https://doi. org/10.1007/s40593-015-0065-9 9. Kaur, A., Mustafa, A., Mehta, L., Dhall, A.: Prediction and localization of student engagement in the wild. CoRR abs/1804.00858 (2018) 10. Maehr, M.L., Meyer, H.A.: Understanding motivation and schooling: where we’ve been, where we are, and where we need to go. Educ. Psychol. Rev. 9(4), 371–409 (1997). https://doi.org/10.1023/A:1024750807365 11. Monkaresi, H., Bosch, N., Calvo, R., D’Mello, S.: Automated detection of engagement using video-based estimation of facial expressions and heart rate. IEEE Trans. Affect. Comput. 8(1), 15–28 (2017) 12. Rotgans, J., Schmidt, H.: Situational interest and academic achievement in the active-learning classroom. Learn. Instr. 21(1), 58–67 (2011) 13. Ryu, H., Mitchell, M., Adam, H.: Improving smiling detection with race and gender diversity. CoRR abs/1712.00193 (2017). https://arxiv.org/abs/1712.00193 14. Schiefele, U.: Interest, learning, and motivation. Educ. Psychol. 26(3–4), 299–323 (1991) 15. de Vicente, A., Pain, H.: Motivation diagnosis in intelligent tutoring systems. In: Goettl, B.P., Halff, H.M., Redfield, C.L., Shute, V.J. (eds.) ITS 1998. LNCS, vol. 1452, pp. 86–95. Springer, Heidelberg (1998). https://doi.org/10.1007/3-54068716-5 14 16. de Vicente, A., Pain, H.: Informing the detection of the students’ motivational state: an empirical study. In: Cerri, S.A., Gouard`eres, G., Paragua¸cu, F. (eds.) ITS 2002. LNCS, vol. 2363, pp. 933–943. Springer, Heidelberg (2002). https://doi. org/10.1007/3-540-47987-2 93 17. Whitehill, J., Serpell, Z., Lin, Y.C., Foster, A., Movellan, J.R.: The faces of engagement: automatic recognition of student engagement from facial expressions. IEEE Trans. Affect. Comput. 5(1), 86–98 (2014) 18. Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.W.: Men also like shopping: reducing gender bias amplification using corpus-level constraints. In: Empirical Methods in Natural Language Processing. pp. 2979–2989. Copenhagen, Denmark (2017)

Innovative Robot for Educational Robotics and STEM Avraam Chatzopoulos1(&) , Michail Papoutsidakis1 Michail Kalogiannakis2 , and Sarantos Psycharis3 1

,

University of West Attica, Thivon 250, 12241 Aigaleo, Attiki, Greece {xatzopoulos,mipapou}@uniwa.gr 2 University of Crete, Gallos University Campus, 74100 Rethymnon, Crete, Greece [email protected] 3 ASPETE, 14121 Heraklion, Athens, Attiki, Greece [email protected]

Abstract. This paper aims to present the design of a low-cost, open-source, robotic platform for use in Educational Robotics and STEM as a holistic approach to the curriculum. In alignment with the research presented in [1] the robotic platform’s innovation is based on two axes: (a) its specifications came from the 1st cycle of participatory action research; (b) it is equipped with a visual programming language integrated into the robot’s “brain” itself so that it can be programmed by any device (smartphone, tablet, PC) with Wi-Fi connectivity, without the need for any software or app to be downloaded and installed in the device. The spark for this research arose from an educational robotics survey’s data evaluation handled at the municipality of Agia Varvara in Athens-Greece which, while showing a strong students interest in educational robotics, however few of them got involved because of the robotic platform’s high cost. So, this research’s motivation was to go on designing and developing a robotic platform suitable for the whole educational community that the specifications based on its members’ needs and extracted by quantitative and qualitative data collection and analysis tools. Keywords: Robot

 STEM  Educational Robotics  VPL  Action research

1 Introduction STEM term was firstly introduced by the NSF (National Science Foundation) in the 1990s as SMET [2], and it was used to refers to teaching and learning in the fields of Science, Technology, Engineering, and Mathematics (STEM derivatives), or it is used as a generic label for any action, policy, program or practice that involves one or more of its disciplines [3]. In the literature [1, 4–6] there is a big variety of the STEM education term’s definition. There are two different approaches to integrate STEM into education [7]: • The content integration that focuses on merging content fields into a single teaching activity to highlight “big ideas” from multiple content areas. • The contextual integration that focuses on the content of a single scientific field, while frameworks from other disciplines are used to make the subject more relevant. © Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 95–104, 2020. https://doi.org/10.1007/978-3-030-49663-0_13

96

A. Chatzopoulos et al.

One such STEM integration is Educational Robotics (ER), a broad term that refers to a collection of activities, educational programs, technology or Robotic Platforms (RPs), educational resources, and pedagogical theories of learning within and outside schools [1, 8]. Robots and specifically ER have been gaining popularity in recent years and Greece, the main mobility currently seen in STEM education in schools concerns ER applications [1]. There has been a great interest among researchers [9], and educators [10], since ER [11] is a powerful learning and supportive tool for the development of cognitive-social skills. ER is introduced in many learning environments as an innovative teaching and learning tool [12] that supports students: (i) developing high-level skills, (ii) creating multiple representations of understanding the object, (iii) constructive communication and collaboration between them, (iv) developing and improving their learning by solving complex authentic problems, (v) implementing abstract design ideas to reflect and immediately notice the results of this effort, (vi) facilitating student learning through research and experimentation by contributing to the development of knowledge in the STEM areas. Besides, STEM and ER activities promote problem-based learning and critical thinking, as they focus on research and analysis of a complex real-world problem and it is also important the play aspect involved in making them particularly attractive to students especially in primary education [13]. Furthermore, gender stereotypes are often the basis of gender roles, the behaviors that society teaches that are ‘correct’ for boys and girls. Several studies have suggested that ER can help to eliminate gender stereotypes about the role of girls with and technology [14–16].

2 Problem Statement The purpose of the present research is, to design and develop a low-cost, open-source RP for primary school students, to be used as an educational tool for applying ER activities and STEM education. The RP implementation takes into account the students’ needs extracted from the 1st cycle of participatory action research. The research’s objectives are: (i) The 1st version of RP’s design and specifications, are based on observations and survey results, in the framework of action research beginning with an ER event -as described briefly below- and, (ii) RP’s development has to be low-cost, based on open-source software and hardware, and can be programmed using a visual programming language (e.g. block-based programming). In November 2019, in the context of an ER event handled at the municipality of Agia Varvara (Athens, Greece), a survey was conducted to record students and parents on ER and STEM education [17]. Among others were questions about students’ experience in educational robots and the willingness to engage in their development. The results were particularly interesting. The majority (49%) of the participants didn’t know what STEM education is, they are not sure (50%) about what is ER is, but the vast majority of them (86%) know RPs, particularly the Lego RP (68%) [18]. While the vast majority of them (83%) had not participated in an ER seminar in the past, they would like to get involved and develop (88%) their robot and the majority of them (55%) would have unlimited time to complete its construction. Moreover, particularly encouraging is the fact that the vast majority (67%) of their parents want to get involved and support their children in

Innovative Robot for Educational Robotics and STEM

97

robots’ development. And that was this research’s motivation to go on designing and developing an RP suitable for the whole educational community (students, teachers and parents). RP’s specifications were extracted by the combination of quantitative and qualitative data collection and analysis tools, to summarize the most important of them: (i) the vast majority of the participants (54%) prefer to program the RP using any device (smartphone, tablet, PC), (ii) they want (92%) RP’s software-hardware to be open-source, (iii) they would like (50%) to build their RP using a 3D printer, and (iv) the RP should be compatible with old technology devices (40%). As to RP’s total cost, two are the most prevalent trends: (i) a 36% of the participants want the RP’s cost to range between 31€-50€, and (ii) a 32% of them would prefer a cost range between 51 €-100€. This last specification is also the research’s milestone in RP’s design since the goal was to keep its cost close to 50€, as opposed to the high cost (180€) of the corresponding Lego WeDo II. To recap and following the above remarks came the initial specifications of the RP such as: (i) has to be low-cost and open-source, (ii) has to be programmed by any device (and also be compatible with old technology), and (iii) can be built using a 3D printer.

3 Robotic Platform’s Architecture RP was designed and developed with ease of construction and use. Several considerations had to be done to fulfill the educational community needs: • • • • • •

Keep the hardware cost as low as possible (close to 50€). Keep the hardware free from “exotic” difficult to find electronic parts. Use open-source hardware and software the most. Can be easily be assembled by students, teachers, and parents. It can be programmed by almost any device: smartphone, tablet, PC. No need for software to download and install. A Visual Programming Language (VPL) will be embedded into the RP.

The RP’s construction consists of a mechanical chassis, 2 servo motors converted to DC, sensors, actuators, 2 microcontrollers (lC), electronic parts for interfacing, and various mechanical parts such as wheels, M4 screws, nuts, and washers, etc. A complete part list is shown below in Table 1. All the electronic schematics, mechanical designs, and software were designed and developed by the researchers and would be freely available to the educational community under the Attribution-By Creative Commons (CC-BY) license [19]. This implementation based on the use of perforated aluminum plates (Fig. 1), offering a modular construction with the possibility to extend the robot frame in both three dimensions. Besides, the above construction has been introduced into educational activities and has been positively evaluated [20], however, the final version will have a frame of plastic parts that can be easily manufactured using a 3D printer. The Arduino Uno was chosen as the RP’s main lC because of its low cost and high availability (there are also many Uno’s cheap clones in the market) [21]. Although Uno is programmed in C/C++ through the Arduino IDE [11], this implementation uses a different innovative approach; it uses the ESP 32 lC [22] as an intermediate web server to integrate Blockly

98

A. Chatzopoulos et al. Table 1. RP’s components list.

Qty Part Qty Part 1 Base (perforated aluminum plate) 2 Side plate (perforated aluminum plate) 2 Motor support plate (perforated 1 Third wheel support plate (perforated aluminum plate) aluminum plate) 1 Front plate (perforated aluminum 2 Servo motors (converted to DC) plate) 2 Wheel with tire (3D printed) 1 Ball caster wheel 1 Basic electronics board 1 ESP32 microcontroller (webserver) 1 Arduino Uno microcontroller (RP’s 1 Logic level converter (to interface ESP’s TX, main “brain”) RX signals with UNO’s) 1 Sonar HC-SR04 (for distance 1 LDR (Light Dependent Resistor) meter) 1 L293D motor driver IC 2 Push buttons 1 Buzzer 2 LED (Red color) 7 Resistors (1  100 X, 2  220 X, 80 Dupont Wires (male to male, male to female, 2  1K, 2  10K) various cm) 1 Power supply (220VAC 50 Hz Output: 12 V 2 USB cable (1 type A to B, 1 type B to micro USB) @ 2 A) 80 Fasteners and hardware (M4 screws, nuts, 1 Breadboard (to connect experimental electronic circuits) washers, board spacers)

[23], an open-source Block based VPL focus to young age students. In this way, it innovates in that Uno programming is implemented through any device (smartphone, tablet, PC) equipped with Internet connectivity (Wi-Fi card and browser software) and not through specialized software or app that the user has to install on his device.

Fig. 1. Bottom view of RP’s mechanical parts and RP’s photo.

In Figs. 2 and 3 the RP’s electronic schematic and block electronic diagram are respectively presented. Arduino Uno is used as the basic programming controller (lC) for the RP. It is connected to motors through the L239D motor driver to support the needed current (350–500 mA per motor) [24].

Innovative Robot for Educational Robotics and STEM

Fig. 2. RP’s electronic schematic

Access Point

Motor Driver Arduino Uno

Motor Left

ESP32 Wi-Fi Module

I/O TX

Motor Right I/O

Sonar

LDR

Level Shifter

RX

Button

LEDs

Buzzer

Fig. 3. Robot’s hardware block diagram.

RX TX

99

100

A. Chatzopoulos et al.

It is also connected to the sensors (Sonar, LDR and buttons) and the actuators (LEDs, buzzer) through electronic-circuit interfaces [25]. ESP lC is primarily used as a web server. It is connected to Uno through the TX, RX signals to upload syntactically correct C/C++ code. A logic level converter circuit is necessary as an intermediate between the two lCs’ because of their different voltages use. ESP is responsible for (i) client-server connection, (ii) listen and answer client’s requests, (iii) send Blockly’s HTML, CSS and JS code to the client, (iv) convert Blocks to C/C++ code and (v) upload code to Uno. Uno is responsible for (i) control motors, (ii) read digital (sonar, button) and analog sensors (LDR), and (iii) switch on/off actuators (LEDs, buzzer). Arduino Uno - ESP32 combination is chosen mainly because of the low cost. However, there are many other more expensive open-source lCs that may be used instead such as Arduino Uno Wi-Fi, Arduino MKR Wi-Fi, Arduino YUN, etc.

4 Software Implementation RP’s software implementation includes: (i) Arduino’s functions for reading and storing sensors data, and for driving RP’s actuators (motors, LEDs, buzzer), (ii) ESP’s webserver software to serve client (devices) requests, (iii) webpage software and Blockly library, implemented in HTML, CSS, JavaScript, which are responsible for webpage’s UI and visual programming language (VPL). Blockly is integrated into the ESP lC, to provide RP’s ability to be programmed without the need for the client’s software installation. Blockly allows users to create programs through graphical manipulation [26], by dragging blocks around a screen and using flow diagrams, state diagrams, another component wiring, and icons or non-text representation. Blockly is an opensource developer library that provides a block editor UI and a framework for generating code in text-based languages such as JavaScript, Lua, PHP, Dart, and Python; custom generators for other text languages may also be created [26]. Blockly code is represented by blocks, which may be dragged around the screen and have connection points where they can be attached to other blocks and chained together (Fig. 4). For generating Arduino’s C/C++ code there is a dedicated Blockly version called Blockly@rduino [27].

Fig. 4. Blockly’s blocks and generating PHP code

Innovative Robot for Educational Robotics and STEM

101

Blockly’s downside is that does not provide a full vocabulary of blocks or a runtime environment, so it needs to integrate it with some form of output which was this research’s case.

5 Robotic Platform’s Kinematics This RP employs a differential drive of a two-wheel vehicle to achieve mobility in twodimensional space (Fig. 5). In this way, the RP can move in all directions (forward, backward, left and right) by directly controlling the speed and direction’s rotation of the DC motors [11].

RP’s Movement Forward Backward Left Right X Stop

Left Motor (Rotation and Motor’s Digital drive) R MotorLeft+ MotorLeftHIGH  LOW LOW  HIGH X LOW LOW HIGH  LOW X LOW LOW

Right Motor (Rotation and Motor’s Digital drive) R MotorRight+ MotorRightLOW  HIGH HIGH  LOW LOW  HIGH X LOW LOW X LOW LOW

Fig. 5. RP’s dynamics-coordinates and RP’s movements concerning its motors’ rotation

RP’s posture as seen in the above figure is described by the following mathematics [11]: 2

3 2 cos u X_ p 4 Y_ p 5 ¼ 4 sin u 0 0

3   0 V 05 ¼ ¼ F ðuÞu x 1

ð1Þ

Where: V is the RP’s linear velocity, x is the RP’s rotational velocity, u is RP’s orientation, Xp and Yp are the RP’s coordinates of the center of its mass, and the vector u is the control command depending on the right and left wheels’ speeds. RP’s linear (V) and rotational (x) velocities can be obtained from the right and left wheels’ velocities, according to the next functions: x¼

du VR  VL ¼ dt b

ð2Þ

VR  VL 2

ð3Þ



102

A. Chatzopoulos et al.

Where: VR and VL are the velocity of the right and left wheels respectively, and b is the distance between the centers of the right and left wheels. These equations are implemented in the RP’s movement control functions; a sample code of RP’s variable-speed forward movement with respect to its motors’ rotation (Fig. 5) is presented: const int MotorLeftP=10; const int MotorLeftM=11; const int MotorRightP=9; const int MotorRightM=6; pinMode(MotorLeftP,OUTPUT); pinMode(MotorLeftM,OUTPUT); pinMode(MotorRightP,OUTPUT); pinMode(MotorRightM,OUTPUT); void loop() {forwardSpeed(128); delay(1000)} void forwardSpeed(int RPspeed) { //PR speed range value: 0-255 (RP’s speed: 0-100%) digitalWrite(MotorLeftPlus,LOW); analogWrite(MotorLeftPlus,RPspeed); digitalWrite(MotorRightPlus,RPspeed); analogWrite(MotorRightMinus,LOW);}

6 Discussion According to researchers, educators, and Papert, ER has numerous advantages and benefits for students [1, 8, 28–30]: (i) improves concentration and the overall learning process, (ii) increases motivation to learn, (iii) offers hands-on exposure to a wide range of subjects (mechanical, electrical, computer engineering) and is a useful aid for STEM, (iv) remains students’ high levels of attention and curiosity (v) develops cognitive and social skills including teamwork, problem-solving, creativity, and robot design, (vi) attracts students to technological and scientific studies, and (vii) encouraged students to promote their interest and improve their English ability. From a technical perspective, the first step towards ER and STEM is to choose and use an RP [1]. It may seem simple however the above ER event’s survey (see section Problem Statement) evidence a lack of students’ engagement in ER due to the high RPs cost. In this research, a low-cost RP implementation was thoroughly presented, where it’s specifications came from the 1st cycle of action research results. According to the educational community’s needs, the RP’s specifications should be the following: lowcost ( (SituationObjectx , SituationObjecty , < start time >) < attribute > (SituationObjectx , < start time >) :=< value > Scenario: – Event t0 : Start situation • IsOn(Ego, RightLane, t0 ) • IsOn(Car, RightLane, t0 ) • IsBehind(Ego, Car, t0 ) – Event t1 : Signalling left turn • TurnIndicator(Ego, t1 ):=Left – Event t2 : Looking at left mirror • Gaze(Driver, t2 ):=LeftMirror – Event t3 : Changing lanes • IsOn(Ego, LeftLane, t3 ) – Event t4 : In front of other Car

3.2

– – –



• IsBehind(Car, Ego, t4 ) Event t5 : Signalling right turn • TurnIndicator(Ego, t5 ):=Right Event t6 : Looking at right mirror • Gaze(Driver, t6 ):=RightMirror Event t7 : Changing lanes • IsOn(Ego, RightLane, t7 ) • Overtake(Ego, Car, t7 ) Event t8 : End situation • TurnIndicator(Ego, t8 ):=Off

Reasoning Using First Order Logic

To identify state changes in attributes and relations, agents uses first-order logic as the predicates shown below.

¬IsOn(Ego, Lane, t − 1)∧ IsOn(Ego, Lane, t) =⇒ LaneChange(Ego, Lane, t)

¬ (TurnIndicator(Ego, t − 1) = Left) ∧ (TurnIndicator(Ego, t) = Left) =⇒ TurnSignalling(Ego, Left, t)

The OvertakeAgent has one task – to recognise an overtake event. An overtake should be checked every time Ego does a lane change. The LaneChange event triggers the execution of the OvertakeAgent. Additionally, we query about the temporal data. LOT (car, lane, tcur ) ≡

argmin ∀t:∃IsOn(car,lane,t)∧t, and you did < $SideMirrorLook ? check : not check > for cars behind you in the side mirror. The explanation tree, illustrated in Fig. 3, shows that our multi-agent system can provide a complete explanation of a high level situation using the recursive vector structure. Multiple text snippets, from each explanation, can be merged to form a detailed human readable explanation. In this case, the OvertakeExplainerAgent has access to the LaneChangeExplanations. By retrieving the two

Explaining Traffic Situations - Architecture of a Virtual Driving Instructor

123

lane change explanations (LCE1 and LCE2 ) which defined the overtake, one can explain the complete overtake process. OvertakeExplanation ≡ {LCE1 , LCE2 }

4

Conclusion

We have developed an architecture for a virtual driving instructor system, which can assess complex situations, such as the presented overtake scenario, and can derive conclusions or explanations of interest. The multi-agent system allows the VDI to recognise traffic regulation violations as well as correct traffic behaviour. The explanation data structure generated by the multi-agent system has all the information necessary to generate complete, interpretable and traceable explanations. An example of such explanation using templates is also shown for the example of an overtake scenario. As this work focuses on architectural design of an ITS, its implementation is a necessary next step in this project. An integrated ITS also requires a detailed design of the student model and its relation with the personalized feedback concept. Development of the student model is also left as a future work.

References 1. Arroyo, E., Sullivan, S., Selker, T.: CarCOACH: a polite and effective driving COACH. In: Proceedings of the Conference on Human Factors in Computing Systems, pp. 357–362 (2006). https://doi.org/10.1145/1125451.1125529 2. Backlund, P., Engstr¨ om, H., Johannesson, M., Lebram, M.: Games for traffic education: an experimental study of a game-based driving simulator. Simul. Gaming 41(2), 145–169 (2010). https://doi.org/10.1177/1046878107311455 3. Buechel, M., Hinz, G., Ruehl, F., Schroth, H., Gyoeri, C., Knoll, A.: Ontologybased traffic scene modeling, traffic regulations dependent situational awareness and decision-making for automated vehicles. In: 2017 IEEE Intelligent Vehicles Symposium (IV), vol. 7, pp. 1471–1476. IEEE, June 2017. https://doi.org/10.1109/ IVS.2017.7995917 4. Chen, B., Cheng, H.H.: A review of the applications of agent technology in traffic and transportation systems. IEEE Trans. Intell. Transp. Syst. 11(2), 485–497 (2010). https://doi.org/10.1109/TITS.2010.2048313 5. Endsley, M.R.: Toward a theory of situation awareness in dynamic systems. Hum. Factors 37(1), 32–64 (1995). https://doi.org/10.1518/001872095779049543 6. Gatt, A., Krahmer, E.: Survey of the state of the art in natural language generation: core tasks, applications and evaluation. CoRR abs/1703.09902 (2017) 7. Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics. DSAA 2018, pp. 80–89 (2019). https://doi.org/10.1109/DSAA.2018.00018 8. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

124

M. K. H. Sandberg et al.

9. Gunning, D., Aha, D.W.: DARPA’s explainable artificial intelligence program. AI Mag. 40(2), 44–58 (2019) 10. Gutierrez, G., Iglesias, J.A., Ordo˜ nez, F.J., Ledezma, A., Sanchis, A.: Agentbased framework for advanced driver assistance systems in urban environments. In: FUSION 2014–17th International Conference on Information Fusion (2014) 11. Hagras, H.: Toward human-understandable, explainable AI. Computer 51(9), 28– 36 (2018). https://doi.org/10.1109/MC.2018.3620965 12. Matheus, C.J., Kokar, M.M., Baclawski, K.: A core ontology for situation awareness. In: Proceedings of the 6th International Conference on Information Fusion, FUSION 2003, vol. 1, pp. 545–552 (2003). https://doi.org/10.1109/ICIF.2003. 177494 13. McAree, O., Aitken, J.M., Veres, S.M.: Towards artificial situation awareness by autonomous vehicles. IFAC-PapersOnLine 50(1), 7038–7043 (2017). https://doi. org/10.1016/j.ifacol.2017.08.1349 14. Meiring, G.A.M., Myburgh, H.C.: A review of intelligent driving style analysis systems and related artificial intelligence algorithms. Sensors (Switzerland) 15(12), 30653–30682 (2015). https://doi.org/10.3390/s151229822 15. Oulhaci, M.A., Tranvouez, E., Espinasse, B., Fournier, S.: Intelligent tutoring systems and serious game for crisis management: a multi-agents integration architecture. In: Proceedings of the Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises, WETICE, pp. 253–258 (2013). https://doi.org/10. 1109/WETICE.2013.78 16. Raptis, D., Iversen, J., Mølbak, T.H., Skov, M.B.: Dara: assisting drivers to reflect on how they hold the steering wheel. In: ACM International Conference Proceeding Series, pp. 1–12 (2018). https://doi.org/10.1145/3240167.3240186 17. Sipele, O., Zamora, V., Ledezma, A., Sanchis, A.: Advanced driver’s alarms system through multi-agent paradigm. In: 2018 3rd IEEE International Conference on Intelligent Transportation Engineering. ICITE 2018, pp. 269–275 (2018). https:// doi.org/10.1109/ICITE.2018.8492600 18. Sukthankar, R., Hancock, J., Pomerleau, D., Thorpe, C.: A simulation and design system for tactical driving algorithms. In: Proceedings of AI, Simulation and Planning in High Autonomy Systems, vol. 6 (1996) 19. Weevers, I., Kuipers, J., Brugman, A.O., Zwiers, J., van Dijk, E.M.A.G., Nijholt, A.: The virtual driving instructor creating awareness in a multiagent system. In: Xiang, Y., Chaib-draa, B. (eds.) AI 2003. LNCS, vol. 2671, pp. 596–602. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44886-1 56 20. Nkambou, R., Bourdeau, J., Mizoguchi, R.: Advances in Intelligent Tutoring Systems. Studies in Computational Intelligence. Springer, Heidelberg (2010). https:// doi.org/10.1007/978-3-642-14363-2 21. Zamora, V., Sipele, O., Ledezma, A., Sanchis, A.: Intelligent agents for supporting driving tasks: an ontology-based alarms system. In: VEHITS 2017 - Proceedings of the 3rd International Conference on Vehicle Technology and Intelligent Transport Systems, pp. 165–172 (2017). https://doi.org/10.5220/0006247601650172 22. Zhao, L., Ichise, R., Yoshikawa, T., Naito, T., Kakinami, T., Sasaki, Y.: Ontologybased decision making on uncontrolled intersections and narrow roads. In: Proceedings of the IEEE Intelligent Vehicles Symposium, vol. 2015-Augus, pp. 83–88. IEEE (2015). https://doi.org/10.1109/IVS.2015.7225667

MOOCOLAB - A Customized Collaboration Framework in Massive Open Online Courses Ana Carla A. Holanda1,2(&), Patrícia Azevedo Tedesco1(&), Elaine Harada T. Oliveira3(&), and Tancicleide C. S. Gomes1(&) 1

3

Federal University of Pernambuco, Recife, Brazil {acah,pcart,tcsg} @cin.ufpe.br 2 Federal Institute of Acre, Rio Branco, Brazil Federal University of Amazonas, Manaus, Brazil [email protected]

ABSTRACT. [Context] The use of MOOCs has generated an increasing amount of data that, if properly explored, can provide an understanding of how students interact throughout their learning, in addition to identifying strategies that can be used in the process of building environments. that enhance the construction of knowledge in a shared way. [Objective] The objective of this article is to propose a Conceptual Collaborative Framework (MOOColab), based on Learning Analytics mechanisms and Recommendation Systems to improve collaborative learning in the environment. [Methodology] For the development of the Framework, the Design Science Research model was used for the analysis, development and evaluation of MOOColab. [Results] An experiment was carried out with two samples: a control group (which did not use the Framework) and an experimental group (which went through the same course using MOOColab). [Conclusion] From the results obtained in the research, it is evident that the implementation of Framework identifying the individualities of each student with the discovery of behavioral patterns and their respective skills to adapt the environment with the recommendation of peers, in order to improve the mutual exchange of information between students involved in the learning process. Keywords: MOOC Analytics

 Collaboration  Recommendation Systems  Learning

1 Introduction Massive Open Online Courses (MOOCs) are an online course modality capable of serving a large number of students and attract different student profiles, offering qualification opportunities to participants, including those who do not attend an educational institution (Zhang 2016). MOOCs operate in learning contexts focused on educational experiences that integrate collaboration and interaction between students (Wagner et al. 2016) providing a teaching model with an emphasis on active learning (Zong and Xu 2017). In this way, they can promote an unlimited learning network of participants where they are, at the same time, actively teaching and learning (Downes 2012). © Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 125–131, 2020. https://doi.org/10.1007/978-3-030-49663-0_16

126

A. C. A. Holanda et al.

However, Zhong and Xu (2017) comment that although MOOCs are seen as environments that brings autonomy to students, they need to promote greater participation and discussion among students involved in the course, considering that students do not feel motivated to continue the courses initiated by the lack of collaboration in the environments. The creation of adequate contexts and environments to develop social and cognitive skills brings people together in the construction of new social relationships, providing opportunities for learning through the construction and sharing of experiences. Thus, this research aimed to present our Collaboration Framework in massive environments (MOOColab) developed and implemented to enhance collaboration among students, based on the recommendation of peers in the environment. Learning Analytics techniques and Recommendation Systems were used to map the student’s profile and, consequently, recommend colleagues based on their skills and behaviors in the environment. To validate the Framework, two groups were used: a control and an experimental group, which were compared and, based on these results, we had evidence that the MOOColab environment brought more collaboration and learning based on the exchange of knowledge.

2 Methodology Considering the motivation and objectives pointed out in this article, the method adopted was based on the Design Science Research (DSR) paradigm whose main mission is to develop knowledge for the design and development of artifacts. Wieringa (2010) defines that solutions must be iteratively proposed, refined, evaluated and, if necessary, improved (Fig. 1).

Fig. 1. Steps of the methodology used.

MOOCOLAB

127

The first stage of the research consisted of identifying the aspects that motivated the study to be carried out based on a systematic mapping of the literature that was complemented with a systematic literature review (the results of which are available in Holanda and Tedesco (2017)). An observation was also made in TinTec MOOC of the Federal Institute of Acre, specifically in Programming Logic courses where 287 students were enrolled during the period from September 2017 to December 2017 to analyze the positive and negative points of the environment under the students’ perception. A questionnaire was applied to students to complement this analysis. The second stage concerns the definition of research objectives, which considered the gaps detected in the previous stage as well as the definition of strategies to address them. The next three steps are aimed at generating and testing the artifact that helped solve the problem and are interactive. These steps were carried out in two cycles: in the first cycle, the entire strategy for developing the artifact was carried out in the form of a prototype; the results obtained subsidized the second cycle, which was the development of the Framework based on the feedbacks made in the previous cycle. Finally, in the communication stage, the rigor with which the research was conducted was evidenced, and the effectiveness of the solution was assessed.

3 Research Results The Framework was directed to the instructional model of xMOOCs and uses a pedagogical proposal in which students help each other in the learning process, acting as partners with each other seeking to acquire knowledge about a given object that is being addressed in the course. The Framework’s three distinct layers, can be seen in Fig. 2, which shows the architecture of MOOColab. The first stage of the Framework was designed to collect student data from four different instruments: 1. Surveys - used at the time of the student’s registration to collect personal, demographic and motivational information and at the end of the session to have student feedback on the strategies used. 2. Skills tests – used to identify students’ prior knowledge, and also at the end of the course. 3. MBTI questionnaire (Myers-Briggs Type Indicator based on Myers and Myers (1998)), which aimed to identify the student’s personality type to better adapt peer recommendation strategies for efficient collaboration. 4. Database of courses with the collection of several variables: Availability, Profile, Behavior, Reputation, Social Proximity, Physical Proximity, Permanence and Languages. From these collected data it was possible to create a student model with the mapping of their behavior patterns and interaction in the environment in order to recommend peers who can promote an active participation in the learning process. In the first stage, Learning Analytics techniques were used to interpret and analyze the

128

A. C. A. Holanda et al.

structure and relationships in collaborative tasks and interactions carried out in the environment. In this step we use the Clustering technique to group students with similar characteristics into different groups automatically. From the mapping of the students’ behavior, it was possible to identify the way they behave and react to the tools, materials and activities available in the environment. The Framework, then, identifies the students’ needs (2nd layer) from their evolution during the course, through the collection of variables such as: their previous knowledge (measured in an initial test when the student signed up for the course), activities not performed (which is evidence that you are having difficulty understanding the content covered), access logs, in addition to the assessments that are being carried out in the course.

Fig. 2. MOOColab framework architecture.

Thus, it starts to analyze potential candidates (other students) to collaborate in the understanding and continuity of the course. This analysis was performed based on a set of criteria found in the literature, checked in a case study and validated by experts. Therefore, the student is recommended through an inference machine that uses a knowledge base and a set of rules and criteria that takes into account the fitness function: Recommendation = (/1 * Availability) + (/2 * Profile) + (/3 * Behavior) + (/4 * Reputation) + (/5 * Social Proximity) + (/6 * Physical Proximity) + (/7 * Permanence) + (/8 * Languages) where / = weighted average determined by experts. After the interactions are carried out, the 3rd layer of the Framework begins with the student’s assessment based on the interaction carried out. This step consists of the student self-evaluation of the interaction that occurred during the performance of an activity. The objective of this stage was to evaluate the collaboration process from a

MOOCOLAB

129

peer perspective. This assessment is also considered in the student recommendation process. Reward techniques are also used as students fulfill the objectives set by the teacher at the time of creating the course. 3.1

Conducting the Experiment

Once the framework structure was defined, it was developed and implemented on IFAC servers for analysis and evaluation. Thus, two groups were defined: a control group using the institution’s MOOC environment that was not integrated with MOOColab; and an experimental group with access to the Framework. The first experiment was carried out in the chosen course was from the MOOC environment of the Federal Institute of Acre in the HTML 5 course with 265 students enrolled from April to May 2019. The second experiment was carried out in the MOOColab environment analyzing the same HTML 5 course with 265 students enrolled from July to August 2019. Within the educational data sets, the objective was to discover patterns that, if combined with pedagogical approaches, would lead to improving students’ learning behavior. In this sense, the analysis consisted of tracking most of the students’ operations in the course environment. We observed students’ activities in the means of communication available in the environment, as well as their interactions in relation to videos and activities available to students, as well as the performance of tests to issue the certificate (Fig. 3).

Read/WriƩen Forums

Weekly Forum AcƟviƟes 400 200 0

Week 1Week 2Week 3Week 4Week 5Week 6Week 7Week 8

Period Controle Group

Experimental Group

Fig. 3. Weekly forum activities.

As for the analysis of the forums, it is observed that it is not a frequent channel of communication between those enrolled in the course in the control group, as shown in Fig. 3. In general, there was little interaction between students during weeks and this participation gradually decreased along the course. Seeking to understand the possible causes of this inactivity in the communication channel, the answers contained in the questionnaire were used to try to justify this attitude. Many students cited the feeling of isolation, considering that they did not know their classmates, because there was no record of those enrolled in the course, or a moment to introduce themselves. In the Experimental Group, it is observed that the forum was a communication channel widely used by those enrolled in the course, which demonstrates that the

130

A. C. A. Holanda et al.

strategies used in the Framework proved to be efficient and enabled greater interaction between students, making the environment more participatory. This was corroborated by the students when they answered the final questionnaire. Regarding the videos, it is noticed that the average time of viewing the videos is 5 min, that is, it is important to produce short videos observing this characteristic. In both groups, it was observed that in the first week, students spent more time seeing the videos of the first class. In the rest of the classes they watched the first minute and then stopped watching it. This information gives us indications that the students are interested in viewing the videos to know what will be covered during the classes. As for carrying out the activities performed, there is a direct relationship with the videos watched in both groups. It is important to highlight that at the end of each video activities are made available, which may explain this direct relationship. To carry out the activities, the student can do it without a limit of attempts in both groups (Fig. 4).

Access

Evaluation

Access

Activity average

Week 8

Week 7

Week 6

Week 5

Week 4

100 90 80 70 60

Week 3

0

300 200 100 0

Week 2

50

Week 1

100

Number of Acesses

Experimental Group Activity average

300 200 100 0

Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8

Number of Acesses

Control Group

Evaluation

Fig. 4. Relationship between videos and activity averages

The difference observed between the groups is that the Experimental Group stayed longer watching the videos in the environment and, consequently, the average of the activity scores were higher (83.125 compared to 68.125 in the control group) showing that the watching the videos contributed significantly to the results of the evaluations. Figure 4 shows how this relationship occurs. Also evaluating the students who obtained the certification, a significant number of students belonging to the Experimental Group completed the course successfully (65%) compared to the Control group (15%). The results show that MOOColab indicated to be more adapted to the needs of students, which motivated them to finish the course.

4 Conclusion The results of the experiments showed an improvement in the learning built collaboratively in the environment, demystifying the idea that massive environments are a simple repository of materials becoming a personalized collaboration environment according to the needs, profiles and skills of the enrolled students, aiming at the promotion of collaboration that is relevant for learning in MOOC, through the insertion of new communication tools, as well as the possibility of creating a more personalized

MOOCOLAB

131

channel among students according to their engagement and skills. As future work, we intend to carry out an experiment to analyze, through statistical analysis, the results of using MOOColab.

References Downes, S.: The Rise of Moocs, Knowledge, Learning, Community. Disponível em (2012). http://www.downes.ca/post/57911 Holanda, A.C.A., Tedesco, P.C.A.R.: MOOCs and collaboration: definition, challenges, trends and perspectives. In: XXVIII Brazilian Symposium on Informatics in Education – SBIE (2017) Zhang, J.: Can MOOCs be interesting to students? An experimental investigation from regulatory focus perspective. Comput. Educ. 95, 340–351 (2016) Zhong, X., Xu, H.: Reform of teaching mode for computer specialty based on MOOCs. In: 12th International Conference on Computer Science & Education (ICCSE 2017), pp. 705–708 (2017)

Mixed Compensation Multidimensional Item Response Theory B´eatrice Moissinac1 and Aditya Vempaty2(B) 1

School of EECS, Oregon State University, Corvallis, OR, USA [email protected] 2 Xio Research, New York, NY, USA [email protected]

Abstract. Computerized Assisted Testing (CAT) has supported the development of numerous adaptive testing approaches. Such approach as Item Response Theory (IRT) estimates a student’s competency level by modeling a test as a function of the individual’s knowledge ability, and the parameters of the question (i.e. item). Multidimensional Item Response Theory (MIRT) extends IRT so that each item depends on multiple competency areas (i.e., knowledge dimensions). MIRT models consider two opposing types of relationship between knowledge dimensions: compensatory and noncompensatory. In a compensatory model, having a higher competency with one knowledge dimension compensates for having a lower competence in another dimension. Conversely, in a noncompensatory model all the knowledge dimensions are independent and do not compensate for each other. However, using only one type of relationship at a time restricts the use of MIRT in practice. In this work, we generalize MIRT to a mixed-compensation multidimensional item response theory (MCMIRT) model that incorporates both types of relationships. We also relax the MIRT assumption that each item must include every knowledge dimension. Thus, the MCMIRT can better represent real-world curricula. We show that our approach outperforms random item selection with synthetic data.

Keywords: Item Response Theory

1

· Computerized Adaptive Testing

Introduction

Item Response Theory (IRT) is a psychometric model that aims at estimating a student’s competency level by adaptively selecting items (i.e., test questions) [12]. IRT models the relationship between the item’s parameters (i.e., difficulty, subject covered, etc.) and the student’s latent measure of competency θ. Ultimately, IRT aims at attaining higher accuracy of θ estimation while reducing the test’s length. It has recently translated from psychometric research to intelligent learning applications, such as hint systems [5], constraint based modeling extending IRT [7], and procedural knowledge assessment [9]. c Springer Nature Switzerland AG 2020  V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 132–141, 2020. https://doi.org/10.1007/978-3-030-49663-0_17

Mixed Compensation Multidimensional Item Response Theory

133

Traditional IRT assumes that only one knowledge dimension (i.e., subject or area of competency) is being evaluated per item. Multidimensional Item Response Theory (MIRT) models extend IRT to evaluate an item when several knowledge dimensions are relevant [14]. Two main approaches for MIRT have been considered: compensatory and noncompensatory. In a compensatory model, having a higher competency with one knowledge dimension compensates for having a lower competence in another dimension. For instance, to answer 3 times 3, a student might not know their multiplication tables, but they may still be able to find the correct answer using additions (3+3+3). In this example, a researcher would define the knowledge dimensions “addition” and “multiplication” as compensatory. Conversely, in a noncompensatory model all the knowledge dimensions are independent and do not compensate for each other. A survey of the MIRT estimation techniques can be found in [10]. Therefore, the current modeling of compensations in MIRT models is not conducive to represent a real-world curriculum. Designing questions that cover only a specific set of knowledge dimensions in order to fit the constraint of compensation or noncompensation might not be possible or might not produce questions that are very high in the Bloom’s taxonomy [3]. In this paper, we develop an MIRT model capable of using a “natural” set of questions, as opposed to a set of questions designed to fit the constraints of an algorithm. We propose a mixed-compensation multidimensional item response theory (MCMIRT) model that is able to consider test items (questions) with two properties: – Item may have compensatory and noncompensatory knowledge dimensions simultaneously; – Item may depend only on a subset of knowledge dimensions (instead of all of them).

2

Background on IRT and MIRT

IRT models the relationship between the item’s parameters and the student’s latent knowledge measure θ. An item i’s parameters are: (1) ai , the discriminability of item i; (2) bi , the difficulty of item i; and (3) ci , the probability of guessing the correct answer of item i. These form the three-parameter logistic model (3PL) typically used in IRT [11]: P (ui = 1|θ, Δi ) = ci + (1 − ci )

eai (θ−bi ) . 1 + eai (θ−bi )

(1)

In other words, the probability of an examinee’s response to item i (ui ) to be correct (‘0’ is wrong and ‘1’ is correct) is defined by a logistic function parametrized by the examinee’s competency θ and the item’s parameters Δi = {ai , bi , ci }. The higher the competency of the examinee the higher is the probability of being correct, with respect to the item’s parameters. The discriminability of an item is the notion of how well an item is helpful to differentiate between competency and incompetency in a knowledge dimension.

134

B. Moissinac and A. Vempaty

Mathematically, ai is the slope of the logistic function for item i and θ. The difficulty of an item corresponds to the level of competency needed to achieve at least P (ui = 1) = 0.5. We refer the reader to [14] for further details about IRT and the logistic curve of the 3PL model. In a multidimensional item response theory (MIRT) model1 , answering an item i requires competency in more than one knowledge dimension. The two major MIRT models - compensatory and noncompensatory - are defined by their modeling of the relationship between knowledge dimensions. A compensatory model assumes that knowledge dimensions are related to each other, such that more expertise on one knowledge dimension will compensate for a lack of expertise on another (see our “Addition” and “Multiplication” example in previous section). The probability of answering item i correctly becomes a function of linear combination of competencies across k knowledge dimension: 

e k ai (θk −bi )  , P (ui = 1|θ, Δi ) = ci + (1 − ci ) 1 + e k ai (θk −bi )

(2)

where θ = {θ1 , ..., θK } is the vector of abilities, one θ per knowledge dimension, and ai and bi are vectors of discriminability and difficulty of item i for each dimensions. It is assumed that each knowledge dimension is relevant for all items. The noncompensatory MIRT model assumes that the knowledge dimensions are independent from each other, and no compensation is possible. Thus, each dimension’s logistic function is treated as an independent probability:

P (ui = 1|θ, Δi ) = ci + (1 − ci )

 k

3

eai (θk −bi ) . 1 + eai (θk −bi )

(3)

Mixed-Compensation Multidimensional Item Response Theory Model

In this section, we present our MCMIRT model. MCMIRT relies on a graphical representation of the knowledge dimensions and their relationships. 3.1

Knowledge Graph

Encoding knowledge into a graph is an efficient and compact way to represent complex notions and it is widely used across many fields [2,4,13]. In intelligent and adaptive learning research, graphs have been used to represent incremental accumulation of knowledge [6] as well as procedural knowledge [1]. In this work, we leverage knowledge graphs to represent an MIRT model with both compensatory and noncompensatory knowledge dimensions. Each node of the graph is 1

See Chap. 4 of [14] for a thorough introduction.

Mixed Compensation Multidimensional Item Response Theory

135

a knowledge dimension, and each undirected edge represents the compensatory relationship between two knowledge dimensions. If two knowledge dimensions A and B are noncompensatory, then there is no edge between A and B, although there may be a path via a third dimension. As an illustration (see Fig. 1), consider a set of three language skills for a US third grade student: Meaning (i.e., the definition of a word), Synonym (i.e., synonym(s) of a word), and Spelling (i.e., the spelling of a word). Meaning and Synonym are compensatory knowledge dimensions, because knowing the meaning of a word can help you identify synonyms, and vice-versa. On the other hand, knowing the spelling of a word is independent of its meaning or synonyms (at the 3rd grade level, students are not taught about etymology.).

Fig. 1. Example of a knowledge graph for three language skills for a US third grade student: Meaning, Synonym, and Spelling.

3.2

Coverage and Subgraphs of an Item

The MCMIRT model also relaxes the assumption that an item must include every knowledge dimension. In the Fig. (1) example, the question “What is the definition of democracy?” only measures the ability of Meaning. The item “Which of the following is the correct spelling for the synonym of shore: “benk”, “bank”, “sea”, “see”?” relates to Spelling and Synonym. We call the relevance of an item to a knowledge dimension as coverage. Therefore, the previous question covers Spelling and Synonym. In the graphical representation, coverage is shown by shading a node (see Fig. 2). Furthermore, given an item i, when several knowledge dimensions are covered by i and are compensatory with each other, they form a subgraph. We note Si = {s1 , ..., sm } the set of subgraphs created by the coverage of item i. Figure 2 illustrates this setup. Item i covers dimensions B, C, D, and E. D and E are compensatory, and therefore form subgraph s3 . B and C are not compensatory with each other, thus, they form two separate subgraphs s1 and s2 . 3.3

Estimation of θ

In classic MIRT approaches, the estimation of θ starts by defining P (ui = 1|θ, Δi ). For MCMIRT, the idea is to treat the subgraphs are hyper-variables with a compensation within the subgraph and a noncompensation between subgraphs:

136

B. Moissinac and A. Vempaty

Fig. 2. Example of a knowledge graph with knowledge dimensions A, B, C, D, and E. The grey nodes are the knowledge dimensions covered by item i. The dashed circles indicate the three subgraphs of item i in that knowledge graph.





e

sm ∈Si

1+e 

P (ui = 1|θ, Δi ) = ci + (1 − ci ) 

k ai,k (θk −bi,k )Ii,s m  k a (θ −b )I i,k k i,k k i,sm

k





Compensatory





(4)

Noncompensatory

– θ = {θ1 , ..., θK } vector of abilities, one θ per knowledge dimension. – Δi = {ai , bi , ci }, where ai and bi are vectors of discriminability and difficulty of item i for each dimensions2 . – Si = {s1 , ..., sM } is the set of subgraphs created by the coverage of item i over the knowledge graph. – Ii,sm is an indicator vector of length K. For each knowledge dimension, indicator is 1 if the knowledge dimension is in subgraph sm , and 0 otherwise. If a knowledge dimension is not covered, it is in no subgraph, thus its indicator is 0. If a knowledge dimension is covered, only the indicator vector of the subgraph whose this knowledge dimension belongs to will have its token flipped to 1. This vector is a neat trick to facilitate obtaining the first and second derivative (see below). A standard approach to estimate MIRT models is to use MaximumLikelihood to estimate θ given the observed responses [10]. We want to find the values for θ such that the log-likelihood of observing Y is maximized, where Y = {u1 , ..., ui−1 } are the responses from the individual thus far: θˆ = arg max log L(θ|Δ, Y ) θ

2

If a dimension is not covered by i, the value is 0 by default.

(5)

Mixed Compensation Multidimensional Item Response Theory

137

The likelihood function is given by L(θ|Y, Δ) = = =

 i

 i

 ci + (1 − ci ) 

i



P (ui = 1|θ, Δi )ui (1 − P (ui = 1|θ, Δi ))1−ui g

sm ∈Si



ci + (1 − ci ) 



e i,sm g 1+e i,sm

ui 

gi,sm

sm ∈Si



Pi

e 1 + egi,sm



1− ui 

ci + (1 − ci ) 

 sm ∈Si



(1 − ci ) 1 −

sm ∈Si

 

g



e i,sm g 1+e i,sm

egi,sm 1 + egi,sm

Gi (θ )

Qi

(6) 1−ui  1−ui



where gi,sm is used as a short hand for the linear combination of item i and k . Gi (θ) is a short hand for (9). The subgraph sm : gi,sm = k ai,k (θk − bi,k )Ii,s m log-likelihood is:  ui log Pi + (1 − ui ) log Qi (7) log L(θ|Y, Δ) = i

=

 i

Gi (θ) =

ui log Pi + (1 − ui )(log(1 − ci ) + log(1 − Gi (θ))



egi,sm 1 + egi,sm

sm ∈Si gi,sm

=

e 1 + egi,sm    sm

(8) (9)



egsl 1 + egsl sl ∈Si \{sm }   

(10)

subgraphs other than sm

(10) decomposes the equation in two parts to isolate one subgraph from the rest of the product. The decomposition helps with the first derivative of Gi (θ) (see (11) to (13)). When taking the first derivative of Gi (θ) by θj , we care only j = 1, the rest of the product is a about the part of the product where Ii,s m j constant because, if dimension j belongs to the subgraph sm , then Ii,s = 1 and m j Ii,sl = 0 ∀sl ∈ Si \ {sm } for item i.  ∂Gi (θ) egsl ai,j egi,sm = ∂θj (1 + egi,sm )2 1 + egsl    sl ∈Si \{sm }    Depends on θj

(11)

Constant

 ai,j egi,sm = 1 + egi,sm 1 + egi,sm    sm ∈Si

(12)

= Aij Gi (θ).

(13)

Aij

138

B. Moissinac and A. Vempaty

Then, the first derivative of the log-likelihood is given by (14):

Aij (1 − ci )Gi (θ) ∂ log L(θ|Y, Δ)  Aij Gi (θ) − (1 − ui ) = ui . ∂θj ci + (1 − ci )Gi (θ) 1 − Gi (θ) i

(14)

The second derivative can have three forms for item i, θj and θh : – θj and/or θh are not covered by i – θj and θh are in different subgraphs – θj and θh are in the same subgraph. k and the same decomposition than in (11). Due ∇2 log L is derived using Ii,s m to lack of space, we do not include the exact formulation of ∇2 log L. There exists no analytic solution to finding the optimum. Instead, we use a numerical estimation using the Newton-Raphson method [15], a root-finding algorithm to find the point for x : f (x) = 0, on (14). Once we get an estimate θˆ using (15), we use the trace of the inverse of the Fisher information criteria (16) [10] to select the next best item to present to the student.

θˆt+1 = θˆt − ∇2 log L−1 ∇ log L ˆ = −∇2 log L ∗ P (ui |θ, ˆ Y ). Ii (θ)

(15) (16)

The overall process of MCMIRT is summarized in Algorithm (1). Algorithm 1. MCMIRT Estimation using Newton-Raphson method ˆ Y) 1: procedure EstimationTheta(U, {θ}, ˆ do 2: for Each candidate θˆ ∈ {θ} 3: repeat 4: θˆt+1 = θˆt − ∇2 log L−1 ∇ log L 5: until |θˆt+1 − θˆt | <   Equation (16) 6: i = arg mini∈U Tr(Ii (θˆt+1 )−1 ) 7: end for ∗ ˆ Y)  Pick item selected by θˆ , the estimated 8: i∗ = arg maxθˆ log L(θ|Δ, candidate with highest log likelihood 9: Re-draw candidates θˆ that are not maximizing log L(θ|Δ, Y )  ∗ 10: M SE(θˆ ) = k1 k (θk − θˆk )2 ∗ ˆ 11: return i and candidates {θ} 12: end procedure

4

Experimental Setting

We evaluated MCMIRT on synthetic data. We randomly generated a knowledge graph with eight knowledge dimensions. Each knowledge dimension had between

Mixed Compensation Multidimensional Item Response Theory

139

0 and 2 edges, randomly picked. We randomly generated a set U of 50 items, the knowledge dimensions covered by each item, and their values for Δ. We ran 20 statistical runs. The procedure of a statistical run is as follows: (1) Draw new student’s true ability vector θ from Uniform(−5,5); (2) Draw n ˆ from Uniform(−5, 5); (3) Draw m items randomly from U candidate vectors {θ} to create the starting set of answered items; (4) Execute Algorithm (1); (5) Add student’s response to Y and remove item from U; (6) Repeat 4 and 5 until U is empty. At the end of each iteration, we measured the Mean Squared Error (MSE) ˆ Y ) (see Algorithm (1), line 10). The Newtonof θ with the highest log L(θ|Δ, Raphson estimation is very sensitive to steep cliffs that commonly happen with logistic functions. We implemented gradient clipping [8] to constrain θˆ within the realistic values [−5, 5]. We also used a large number of candidates (n = 2, 000) to search the space and minimize the risk of getting stuck in a suboptimal point.

Fig. 3. Mean Square Error of Fisher Information criteria vs. random item selection when estimating θ with MCMIRT - From left to right, top to bottom M = {1, 5, 10, 15} respectively. n = 2, 000.

5

Results and Discussion

Performance of Fisher information-based item selection was benchmarked against random selection using the same number of items in the initial item set (see step 3 in the statistical run process above), and is presented in Fig. 3 using the MSE averaged over the statistical run for each iteration. MCMIRT with Fisher information consistently outperforms MCMIRT with random selection. The size of the starting set of items M does not seem to affect the final

140

B. Moissinac and A. Vempaty

Fig. 4. Mean Square Error of Fisher Information criteria vs. random item selection when estimating θ with MCMIRT. The size of the starting sample of items helps the algorithm at the beginning. The algorithm converges to a similar performance. Fisher seems to perform systematically better than random selection of items after 15 to 20 items.

performance of the algorithm. As shown in Fig. fig:results2, the algorithm converges at a similar speed and with similar accuracy no matter the initial value M . The highlights of Fig. 4 indicate that MCMIRT with Fisher information performs systematically better than random selection of items after 15 to 20 items.

6

Conclusion

IRT and MIRT are efficient adaptive testing models to estimate a student’s level of competency. However, the traditional MIRT models have strong constraints on the item set that are not conducive to a smooth translation to a real-world application. Designing questions that cover only a specific set of knowledge dimensions in order to fit the constraint of compensation or noncompensation might not be possible or might not produce questions that are very high in the Bloom’s taxonomy [3]. In this paper, our goal was to create an MIRT model capable of using a natural set of questions, that is, a set of questions that a real-world teacher would have designed, as opposed to a set of questions designed to fit the constraints of an algorithm. We proposed a mixed-compensation multidimensional item response theory (MCMIRT) model that is able to consider items with two properties: – Item may have compensatory and noncompensatory knowledge dimensions simultaneously; – Item may depend only on a subset of knowledge dimensions (instead of all of them).

Mixed Compensation Multidimensional Item Response Theory

141

This generalization offers a more flexible modeling of the curriculum and the item set. We included the derivations and process to estimate the student’s ability using the log-likelihood of θ and the Newton-Raphson method. Our experiments consisted of a benchmarking between MCMIRT using Fisher information based item selection and MCMIRT using random item selection. Our results indicate that MCMIRT is consistently outperforming random selection.

References 1. Anderson, J.R.: Act: a simple theory of complex cognition. Am. Psychol. 51(4), 355 (1996) 2. Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25 (2000) 3. Bloom, B.S., Engelhart, M.D., Furst, E.J., Hill, W.H., Krathwohl, D.R.: Taxonomy of Educational Objectives: Handbook 1: Cognitive Domain. Longman Publishing Group (1984) 4. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. ACM (2008) 5. Conejo, R., Guzm´ an, E., P´erez-de-la Cruz, J.L., Mill´ an, E.: Introducing adaptive assistance in adaptive testing. In: AIED, pp. 777–779 (2005) 6. Doignon, J.P., Falmagne, J.C.: Knowledge Spaces and Learning Spaces, November 2015 7. Galvez, J., Guzman, E., Conejo, R., Millan, E.: Student knowledge diagnosis using item response theory and constraint-based modeling. In: AIED, pp. 291–298 (2009) 8. Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016) 9. Hernando, M., Guzm´ an, E., Conejo, R.: Measuring procedural knowledge in problem solving environments with item response theory. In: Lane, H.C., Yacef, K., Mostow, J., Pavlik, P. (eds.) AIED 2013. LNCS (LNAI), vol. 7926, pp. 653–656. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39112-5 81 10. Van der Linden, W.J., Pashley, P.J.: Item selection and ability estimation in adaptive testing. In: van der Linden, W., Glas, C. (eds.) Elements of Adaptive Testing, pp. 3–30. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-8546181 11. Lord, F.: Application of Item Response Theory to Practical Testing Problems (1980) 12. Lord, F.M.: Applications of Item Response Theory to Practical Testing Problems. Routledge (2012) 13. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995) 14. Reckase, M.D.: Multidimensional Item Response Theory. Statistics for Social and Behavioral Sciences. Springer, New York (2009). https://doi.org/10.1007/978-0387-89976-3 15. Ypma, T.J.: Historical development of the Newton-Raphson method. SIAM Rev. 37(4), 531–551 (1995)

Data-Driven Analysis of Engagement in Gamified Learning Environments: A Methodology for Real-Time Measurement of MOOCs Khulood Alharbi1(&), Laila Alrajhi1, Alexandra I. Cristea1, Ig Ibert Bittencourt2, Seiji Isotani3, and Annie James4 1

Computer Science, Durham University, Durham, UK [email protected] 2 Federal University of Alagoas, Alagoas, Brazil 3 University of Sao Paulo, Sao Paulo, Brazil 4 University of Warwick, Warwick, UK

Abstract. Welfare and economic development is directly dependent on the availability of highly skilled and educated individuals in society. In the UK, higher education is accessed by a large percentage of high school graduates (50% in 2017). Still, in Brazil, a limited number of pupils leaving high schools continue their education (up to 20%). Initial pioneering efforts of universities and companies to support pupils from underprivileged backgrounds, to be able to succeed in being accepted by universities include personalised learning solutions. However, initial findings show that typical distance learning problems occur with the pupil population: isolation, demotivation, and lack of engagement. Thus, researchers and companies proposed gamification. However, gamification design is traditionally exclusively based on theory-driven approaches and usually ignore the data itself. This paper takes a different approach, presenting a large-scale study that analysed, statistically and via machine learning (deep and shallow), the first batch of students trained with a Brazilian gamified intelligent learning software (called CamaleOn), to establish, via a grassroots method based on learning analytics, how gamification elements impact on student engagement. The exercise results in a novel proposal for realtime measurement on Massive Open Online Courses (MOOCs), potentially leading to iterative improvements of student support. It also specifically analyses the engagement patterns of an underserved community. Keywords: Grassroots method

 Data-driven approach  Gamification

1 Introduction Education is a major part of society [1] and a fundamental human right. Nevertheless, access to it, especially to higher education (HE), varies. For example, in the UK, higher education is widely available (around 50% in 2015–16); however, in the Brazilian context, every year millions of students compete to have the opportunity to study in a high-quality, public university (with only up to 20% succeeding). Although we believe © Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 142–151, 2020. https://doi.org/10.1007/978-3-030-49663-0_18

Data-Driven Analysis of Engagement in Gamified Learning Environments

143

that the long-term solution to this problem is to improve the quality of basic education for all, the current burden in balancing the discrepancy between public and private education is taken over in countries like Brazil, due to the distances involved, by organisations providing e-training for the subjects of the entrance exam for universities. However, most of the solutions follow the classical and inappropriate one-size-fits-all approach, which has been shown to be inadequate, and, especially in MOOCs, results in a massive dropout rate [2, 3]. Gamification [4] has been proposed as a potential solution. However, previous research proceeded by implementing these, based, at best, on theoretical considerations [5], and from the few that proceeded to evaluation, the latter’s aim was to prove the validity of specific gamification [6]. We believe that, whilst a theoretical basis, especially rooted in pedagogy, is necessary, it is essential to ground the design process itself on lessons learned, in a cyclic manner, from the usage of the system and the learner behaviour. Thus, in this paper we propose to redesign e-learning systems from the data itself, and let data guide the recommendation of new features. Moreover, as large-scale studies are few and far between, this research offers an invaluable insight into the issues and opportunities inherent in scaling such systems. This paper therefore presents a large-scale study that analysed the first batch of students trained with a Brazilian gamified intelligent learning software (CamaleOn1), to establish and describe how gamification elements and resources impact on student engagement, and inform thus a redesign of the system, based on grassroots educational data mining. From a technical point of view, the aim of this research is to move from theory-driven to a data-driven approach, in order to redesign effective gamified intelligent learning systems. In terms of application, this work aims at a vital problem of our society, that of sustainable personalised inclusive large-scale distance learning for boosting the chances of disadvantaged groups to pass the challenging public tests for admission into prestigious universities in Brazil. Thus, the main research questions is: Do gamification elements increase engagement in MOOCs? and if so, how can we find out in real-time which gamification elements impact on student engagement?

2 Related Research 2.1

Gamification

With the use of e-learning systems like MOOCs, initial findings show that typical distance learning problems occur with the pupil population, such as: isolation, demotivation, and lack of engagement [7]. For this reason, some researchers and companies propose the application of gamification to deal with such problems [8–10]. Gamification is defined as ‘the use of game elements in a non-game context’ [11]. Gamification has been widely disseminated not only in the context of research, but also in terms of business applications. However, the introduction of gamification in online learning is relatively new. Efforts are still being made to understand gamification and find the best 1

https://plataformacamaleon.com.br/.

144

K. Alharbi et al.

use for game mechanics in learning environments [6]. In addition, whilst there are many known benefits about the use of gamification in education [12], the design of gamified learning systems is usually theory-driven. As a result, there is a lack of runtime feedback, non-gamified scaffolding, and under-exploitation of interaction data. Whilst the theoretical basis is very important in designing purpose-fit gamified systems, in the context of large-scale online learning, it is not feasible to propose a one-size-fits-all design of gamification. For this reason, it is very important to take into account the data generated from the system, in order to better understand the users’ interactions, and refine the offering. [13] presented a taxonomy for gamification elements and their potential effect over student behaviour, like engagement and motivation, which was evaluated by experts’ via surveys. In this paper we use a data-driven approach to analyse this effect using an online learning environment’s log data. We analyse students’ interactions with gamification elements and use machine learning classifiers to predict their effect over engagement, from early interactions, thus simulating a real-time analysis. 2.2

Engagement

The process of learning involves many elements contributing to its success, and one of these is engagement. Learner engagement is an important factor in academic performance. [14] define engagement as being incorporated in behavior, emotions and thinking. In both physical and virtual classrooms, learner engagement is a major factor of learner achievement. Literature has explored engagement and provided several approaches to improve it. Supporting and improving learner engagement has been shown to have a highly positive effect on academic performance [15]. [16] studied participants engagement in MOOC environment through analysing their reflective comments to understand what makes MOOC engaging, by applying five machine learning classifiers. Their findings highlight some practical solution for instructors that include strategies for supporting student-tutor interactions, such as motivating the students through enthusiastic attitude and using humour to arouse students’ attention. In this research, the e-training course does not involve instructor interaction, thus we focus on other suitable and effective elements to improve and increase engagement. 2.3

Educational Data Mining and Learner Analytics

Educational data mining [17] and learner analytics [3, 18] allow for a grassroots view of actual interaction between students and e-learning systems. These areas have been growing in popularity recently [19]. They have been used for student modelling or student behaviour modelling, prediction of performance, increase in (self-) reflection and (self-) awareness, prediction of dropout, retention, improved assessment and feedback services, recommendation of resources [20], or scientific inquiry, personalisation, domain modelling, grouping, planning and scheduling, and parameter estimation [21]. However, they have not been used, to the best of our knowledge, for the cyclic (re-) design of increasingly adaptive and gamified e-learning systems. This means that the discovered relations and rules would, at best, be used based on somewhat rigid initial assumptions on the existent system, instead of inspiring a completely new re-design, just based on lessons learnt from real-time data. We also boast tapping into the great potential of the developing world, and its specific landscape of educational needs.

Data-Driven Analysis of Engagement in Gamified Learning Environments

145

3 Methodology 3.1

Approach

To understand how to improve gamification, we have a completely different approach to related research. Instead of building a system from scratch, based on existing or expanding theories, we analyse user behaviour in a given system of e-learning, and base our improvement suggestions on the existing user behaviour. In our case, this system is CamaleOn (see next section). However, the beauty of this approach is that it can be applied to any system. This is also a more realistic approach, as many educational online systems are available and in use, and it is a costly and often problematic to change them completely. Instead, a more gradual approach to this change is proposed, based initially on available data, and subsequently informed by gamification theories. 3.2

CamaleOn

CamaleOn is a Brazilian Gamified Intelligent Tutoring System. Officially launched in 2012, its aim is to increase the accessibility of educational resources to Brazilian students. There is a particular focus on providing students from public schools the resources needed to attend a Brazilian university. To motivate the user to continue with the website, CamaleOn uses different aspects of gamification (e.g., elements such as experience points (XP), badges, etc. as methods for motivation). Figure 1 presents the design of CamaleOn’s webpage, where points (XP) are displayed on the top of the screen at all times, to provide a visualisation of the student’s advancement via a gamified progress bar. Trophies are greyed out until earned; each holding a label explaining how it can be earned. Additionally, a progress map at the bottom of the screenshot visualises the student progress through the subjects of the curriculum.

Fig. 1. CamaleOn: main pages.

146

3.3

K. Alharbi et al.

Data

Data collected from CamaleOn represents 8270 students, a sample size much above the required statistically applicable one for the student population of Brazil (for confidence level 95%, confidence interval ±5%, sample size calculator from Surveysystem.com, for the population of Brazil at 211 million people2 a minimum of 384 people is needed). Students solved 307814 problems, watched 1131 videos, received 236345 badges, logged in 67752 times. Data was collected on their behaviour (Logs) to build a Student Model [22]. Behaviour reflects interaction data between the students and the various elements of their online learning environment, such as problems, resources, etc. 3.4

Matching Data to Research Questions

The first step in the data-driven approach is to extract refined research questions from the data, based, on the overall aims of the research. In Table 1, the bold words in the “Data Subset” column indicate which dataset the subset of data originated from. The list of attributes, following the dataset, are the attributes which were selected from that subset. Analysing the attributes and data available from CamaleOn, we need to first extract the gamification elements used; here, they are badges, points, medals. For student engagement, we can use frequency of interaction (e.g., number of logins) and lack of dropout (thus involvement in the higher levels of the course).

Table 1. Matching data sets to research questions. Data subset

Research items

Students: Number of Points, Number of Badges, Number of Investigate performance of students Medals, Number of Problems Solved, Number of Mistakes versus engagement and Number of Correct Answers Logs: Log Type (equal to “Problem Solving”), Problem Correctly Done

The purpose is to find out if existent gamification features are useful, and if more gamification features need to be introduced, to address engagement. It is important to note here that further analysis is possible, and that this paper only illustrates how existent data may be used to improve the design of an extant system. 3.5

Definitions and Measures

For our research question, we chose to define engagement by both the number of logins and the total number of question attempts. Students’ academic performance is not a necessary indication of engagement. Here, we set the threshold for the highly engaged group of students as consisting of students u u 2 St from the student cohort, where:

2

https://www.ibge.gov.br/apps/populacao/projecao/index.html.

Data-Driven Analysis of Engagement in Gamified Learning Environments

147

GHE ¼ fu 2 Stj#loginu  avgð#loginu;8u2St Þ AND #questionsu  avgð#questionsu;8u2St Þg

ð1Þ Where #x refers to the number of x and avg(y) computes the average value of y. This corresponds to students who have logged into the system more than 8 times and attempted to answer at least 304 questions, which are the mean values for number of logins and question attempts, respectively. This resulted in 1058 highly engaged students, and 7212 less engaged students. The gamification elements in the system are: • Points: points are earned by answering low level questions. • Medals: medals are earned by showing high skills in questions answering such as answering all questions in a topic correctly or solving side assignments. • Badges: earned by interacting with the system in a specific way such as: spending one hour in the system or learning a sub-assignment 3 days in a row. We define the gamification elements in the system by the variable “Reward Count” RCu , as the sum of Points pqu , Badges bqu and Medals mqu earned by a student u: RCu ¼

X#no q¼0

que

pqu þ

X#no q¼0

que

mqu þ

X#no i¼0

int

bqu

ð2Þ

We first answered the research questions using correlation analysis, based on the Pearson coefficient. Next, we use both shallow and deep learning methods to further answer the questions in more depth. For shallow methods, we use and compare a number of ML models for classification: Linear Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbours (KNN), Classification and Regression Trees (CART) and Naive Bayes (NB). Then we apply two deep learning algorithms to compare the performances of Machine Learning (ML) against Deep Learning (DL) models for numerical data with a low number of predictors, namely Multilayer Perceptron (MLP) and Convolutional Neural Network (CNN), which are recommended for numerical, non-sequential data. Figure 2 provide a general view to our methodology.

Fig. 2. General view of methodology followed in answering the research question.

4 Results 4.1

Normality Test

For the normality test of high and low engagement for students, we applied the Kolmogorov–Smirnov test, rather than Shapiro Wilk, due to the large data size that exceeded 5000 instances. Results indicate a non-normal distribution for each group (p  .00).

148

4.2

K. Alharbi et al.

Data Visualisation: Higher/Lower Engagement Versus Gamification Use

We next visualise the two groups, to analyse visual differences in gamification elements’ use, via the total number of earned rewards for each group (Fig. 3).

Fig. 3. Box plot higher and lower engagement groups versus earned rewards (points, badges, medals).

4.3

Data Correlation: Engagement Versus Gamification

Table 2 shows the correlation between engagement and gamification. For instance, it indicates a strong positive association between students’ number of logins and the number of rewards they earn. The highest correlation value is noticed between Badges and engagement status. The lowest value is seen between the number of earned medals and that of logins, possibly due to fact that medals are questions and curricula related. However, the engagement variable “Is Engaged” shows a positive association with all of the gamification elements represented by Reward Count, RCu . Table 2. Correlation test results between engagement indicators and gamification elements. Reward count Points Medals High login 0.531 0.482 0.373 High question attempts 0.660 0.656 0.604 Is engaged 0.660 0.656 0.604

4.4

Badges 0.631 0.671 0.682

Engagement Prediction Based on Gamification

Following the correlation test results, we used the gamification elements and the additional aggregate parameter “Reward Count”, and their combination, as inputs of different dimensions, to classify high and low engagement with various classification models (Table 3). The output of the classifier would either be the learner is engaged (1) or not engaged (0). These results show that the CamaleOn gamification elements are a strong predictor for students’ engagement, with all accuracies > 0.924. I.e., the number of rewards students earn is strongly linked to the number of logins and general advancement through the system. The accuracy of CNN and MLP exceed the traditional ML models, suggesting that ML and DL classifiers perform slightly better - but

Data-Driven Analysis of Engagement in Gamified Learning Environments

149

similarly, for problems with a small number of features, such as this. MLP was the clear overall winner in terms of prediction model comparison. The highest score is observed (mostly) with the combination of all elements. What is interesting is the similarity of individual elements’ score, despite the differences between them in functionality and purposes. I.e., Medals reward curricula advancement, while Badges reward defined system actions. Table 3. Classifiers’ results for engagement level based on gamification elements. Inputs LR

Reward count Points Medals Badges All elements LDA Reward count Points Medals Badges All elements KNN Reward count Points Medals Badges All elements CART Reward count Points Medals Badges All elements NB Reward count Points Medals Badges All elements MLP Reward count Points Medals Badges All elements CNN Reward count Points Medals Badges

Acc Low-engagement (0) P R F1 .951 .97 .98 .98 .950 .97 .98 .98 .937 .95 .98 .97 .938 .96 .97 .97 .954 .97 .98 .98 .924 .93 .99 .96 .950 .93 .98 .96 .937 .92 .98 .96 .938 .92 .97 .97 .954 .96 .99 .97 .947 .97 .97 .97 .950 .97 .98 .97 .937 .96 .97 .96 .938 .96 .95 .96 .954 .98 .98 .98 .944 .96 .98 .97 .950 .96 .98 .97 .937 .95 .98 .97 .938 .96 .97 .97 .954 .97 .97 .97 .954 .98 .97 .98 .950 .98 .97 .98 .937 .96 .97 .97 .938 .96 .97 .97 .954 .98 .96 .97 .958 .98 .97 .98 .957 .98 .97 .98 .942 .96 .98 .97 .941 .96 .97 .97 .964 .98 .98 .98 .957 .97 .98 .98 .956 .98 .97 .98 .941 .96 .98 .97 .931 .93 .99 .96

High-engagement (1) P R F1 .87 .78 .82 .87 .77 .82 .85 .65 .74 .80 .72 .76 .88 .80 .84 .98 .46 .63 .98 .45 .62 .96 .42 .59 .80 .72 .76 .91 .72 .80 .81 .82 .81 .83 .78 .81 .76 .75 .75 .71 .75 .73 .84 .84 .84 .81 .73 .77 .83 .69 .76 .82 .67 .74 .80 .72 .76 .81 .81 .81 .83 .84 .83 .83 .69 .76 .78 .74 .76 .80 .72 .76 .78 .86 .82 .83 .84 .84 .83 .86 .83 .82 .71 .76 .80 .72 .76 .87 .86 .86 .85 .81 .83 .81 .87 .84 .82 .69 .75 .92 .51 .66

150

K. Alharbi et al.

5 Conclusion The paper shows a grassroots approach to understanding the gamification needs of students, and analysing how gamification elements impact on student engagement. Specifically, we analyse how gamification can be linked to student engagement in CamaleOn, a Brazilian MOOC for highschool students being trained for higher education. This is a first step towards establishing how to design better gamified environments to support online education for underrepresented and underserved communities. This approach is best suited for MOOCs, which have a large amount of data, but do not necessarily obey any particular learning theory for gamification, and which need further improved to better serve their communities, whilst ‘running’. Thus, this approach means measuring student impact in real-time, to be able to intervene at finer-granularity, e.g., in the design of the next run of a course. Further research we are already undertaking is analysing motivators for the student participation, as well as how student behavior can be attributed to their knowledge. Future work could involve re-evaluating such research questions and hypotheses with data from future academic years, to see how consistent the years are and to decrease the threat of external validity. This proposal involves cutting-edge technologies and techniques evolving constantly, such as (for the areas of computer science only) user analytics, information retrieval, ‘big data’ processing, user profiling, social web information elicitation and usage, recommendations, semantic web representation and processing, various other technologies and techniques related to the emerging web science. Concretely, in the long-term, we expect to report advances, for instance, in user analytics visualisation techniques, adaptation and personalisation techniques combining content-based personalisation with social interaction. Moreover, in such massive online environments, new types of behaviors are taking place, and new behavioral patterns emerge, and this is where the expertise of our behavioral experts is essential.

References 1. Soni, S., Dubey, S.: Towards systematic literature review of E-learning. Int. J. Sci. Res. Comput. Sci. Eng. Inf. 3(3), 1389–1396 (2018) 2. Belanger, Y., Thornton, J., Barr, R.C.: Bioelectricity: a quantitative approach–Duke University’s first MOOC. EducationXPress 2013(2) (2013) 3. Kizilcec, R.F., Piech, C., Schneider, E.: Deconstructing disengagement: analyzing learner subpopulations in massive open online courses. In: Proceedings of the Third International Conference on Learning Analytics and Knowledge. ACM (2013) 4. Hamari, J., Koivisto, J., Sarsa, H.: Does gamification work?-A literature review of empirical studies on gamification. In: HICSS (2014) 5. Gibson, D., Jakl, P.: Theoretical considerations for game-based e-Learning analytics. In: Reiners, T., Wood, L.C. (eds.) Gamification in Education and Business, pp. 403–416. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-10208-5_20 6. Antonaci, A., Klemke, R., Specht, M.: The effects of gamification in online learning environments: a systematic literature review. In: Informatics. Multidisciplinary Digital Publishing Institute (2019)

Data-Driven Analysis of Engagement in Gamified Learning Environments

151

7. Khalil, H., Ebner, M.: MOOCs completion rates and possible methods to improve retention a literature review. In: World Conference on Educational Multimedia, Hypermedia and Telecommunications (2014) 8. Challco, G.C., et al.: Personalization of gamification in collaborative learning contexts using ontologies. IEEE Latin Am. Trans. 13(6), 1995–2002 (2015) 9. de Santana, S.J., et al.: A quantitative analysis of the most relevant gamification elements in an online learning environment. In: Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee (2016) 10. Tenório, T., et al.: A gamified peer assessment model for on-line learning environments in a competitive context. Comput. Hum. Behav. 64, 247–263 (2016) 11. Deterding, S., et al.: From game design elements to gamefulness: defining gamification. In: Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments. ACM (2011) 12. Muntean, C.I.: Raising engagement in e-learning through gamification. In: Proceeding of 6th International Conference on Virtual Learning ICVL (2011) 13. Toda, A., et al.: A taxonomy of game elements for gamification in educational contexts: proposal and evaluation. In: 2019 IEEE 19th International Conference on Advanced Learning Technologies (ICALT). IEEE (2019) 14. Fredricks, J.A., Blumenfeld, P.C., Paris, A.H.: School engagement: potential of the concept, state of the evidence. Rev. Educ. Res. 74(1), 59–109 (2004) 15. Lee, J.-S.: The relationship between student engagement and academic performance: is it a myth or reality? J. Educ. Res. 107(3), 177–185 (2014) 16. Hew, K.F., Qiao, C., Tang, Y.: Understanding student engagement in large-scale open online courses: a machine learning facilitated analysis of student’s reflections in 18 highly rated MOOCs. Int. Rev. Res. Open Distrib. Learn. 19(3), 69–93 (2018) 17. Dutt, A., Ismail, M.A., Herawan, T.: A systematic review on educational data mining. IEEE Access 5, 15991–16005 (2017) 18. Lei, S., et al.: Towards understanding learning behavior patterns in social adaptive personalized e-learning systems. AMCIS (2013) 19. Hew, K.F.: Promoting engagement in online courses: what strategies can we learn from three highly rated MOOCS. Br. J. Educ. Technol. 47(2), 320–341 (2016) 20. Papamitsiou, Z., Economides, A.A.: Learning analytics and educational data mining in practice: a systematic literature review of empirical evidence. J. Educ. Technol. Soc. 17(4), 49–64 (2014) 21. Romero, C., Ventura, S.: Data mining in education. Wiley Interdiscip.Rev.: Data Min. Knowl. Discov. 3(1), 12–27 (2013) 22. Sani, S.M., Bichi, A.B., Ayuba, S.: Artificial intelligence approaches in student modeling: half decade review (2010–2015). IJCSN-Int. J. Comput. Sci. Netw. 5(5), 2277–5420 (2016)

Intelligent Predictive Analytics for Identifying Students at Risk of Failure in Moodle Courses Theodoros Anagnostopoulos1,2(&), Christos Kytagias1, Theodoros Xanthopoulos1, Ioannis Georgakopoulos1,3, Ioannis Salmon1, and Yannis Psaromiligkos1 1

Department of Business Administration, University of West Attica, Egaleo, Greece {thanag,ckyt,xanthopoulos,igtei,isalmon, yannis.psaromiligkos}@uniwa.gr 2 Department of Infocommunication Technologies, ITMO University, St. Petersburg, Russia 3 Department of Primary Education, NKUA, Athens, Greece

Abstract. Investigating the factors affecting students’ academic failure in online and/or blended courses by analyzing students’ learning behavior data gathered from Learning Management Systems (LMS) is a challenging area in intelligent learning analytics and education data mining area. It has been argued that the actual course design and the instructor’s intentions is critical to determine which variables meaningfully represent student effort that should be included/excluded from the list of predicting factors. In this paper we describe such an approach for identifying students at risk of failure in online courses. For the proof of our concept we used the data of two cohorts of an online course implemented in Moodle LMS. Using the data of the first cohort we developed a prediction model by experimenting with certain base classifiers available in Weka. To improve the observed performance of the experimented base classifiers, we enhanced further our model with the Majority Voting ensemble classifier. The final model was used at the next cohort of students in order to identify those at risk of failure before the final exam. The prediction accuracy of the model was high which show that the findings of such a process can be generalized. Keywords: Intelligent predictive analytics design  Moodle

 Students at risk  Learning

1 Introduction Today, Learning Management Systems (LMS) are widely used for educational and training purposes. Each year thousands of lessons are delivered through the web using LMSs and this phenomenon is expected to be escalated the following years by means of the continuous increase in demand for lifelong learning. To make e-learning courses more effective instructional designers need to take decisions at various points as well as at multiple levels during the life cycle of the course. Moreover, as demand increases, so does the demand for intelligent tools capable to provide continuous and sophisticated © Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 152–162, 2020. https://doi.org/10.1007/978-3-030-49663-0_19

Intelligent Predictive Analytics for Identifying Students at Risk

153

feedback of the instructional process. LMSs hold large volumes of rich data that contain the way learners interact with the various activities and resources of the system. Identifying students at risk is a major problem and today, various research groups investigate the factors affecting students’ academic failure in online and/or blended courses by analyzing students’ learning behavior data gathered from Learning Management Systems (LMS) [1–7]. Thus, a variety of analytical methods to predict students at risk of failing, based on such data, have appeared in literature, most of which have been strongly validated in terms of a single course and less validated in terms of a bunch of courses [1]. However, education is not merely a craft of delivering packaged knowledge [8]. Teachers of all levels of education need to ‘‘design for learning’’ and this realization transforms teaching, according to [9], as a design science, and the role of teachers as designers of learning. It has been argued that the actual course design and the instructor’s intentions is critical to determine which variables meaningfully represent student effort that should be included/excluded from the list of the predicting factors [1, 5, 10–13]. We agree with this view and we believe that researchers need to focus on the learning design field in order to discover semantically important factors affecting students’ academic failure. In this paper we describe such an approach for identifying students at risk of failure in Moodle courses by investigating a set of engagement factors according to the design of the course. In the next Sect. 2 we describe in more detail the theoretical background of our approach. Section 3 describes our predictive analytics system as it was applied in a real case study at West Attica University. Experiments and results are discussed in Sect. 4. Finally, conclusions and feature research directions are provided in the last Sect. 5.

2 Theoretical Background LMSs holds a series of learning activities such as studying some content, doing a selfassessment test, completing a project assignment, according to the underlying learning design. Each learning activity is related to specific learning objectives and may be implemented in a number of ways such as SCORM learning objects, multimedia documents (e.g. Video Lectures), on-line quizzes, collaboration fora, etc. Students are engaged in the planned activities by following a learning path (or learning workflow) which may be more or less guided or adapted. The way students interact with the underlying learning activities is an indicator of their engagement in the learning process and it is crucial for their own success as well as the quality of the underlying e-learning course [1, 4, 14, 15]. Several studies correlate performance with engagement and more specifically with the way students feel, think and behave in terms of the e-learning course [1, 4, 16]. The work reported in [17] the authors investigate the feasibility of using electronic portfolio data as a means for predicting college retention. Identifying students at risk is a major problem. It has been reported that the students’ dropout rate in e-learning fluctuating between 20 and 80% [18]. There are many endeavors of identifying the causes of a liable students’ dropout in e-learning that do we meet in literature. These endeavors are encircled on analyzing sociological factors [19, 20] or other factors such as academic performance, demographics, and engagement [1,

154

T. Anagnostopoulos et al.

4, 17, 21, 22]. In parallel manner, there are quite a few efforts of controlling the risk of students’ failure in e-learning courses that do we meet in literature. The effort of [5] is encircled on developing a warning system by applying statistical techniques to LMS tracking data captured at the end of the course. On the other hand, [23] have developed a warning system by applying data mining techniques to LMS tracking data captured during the course. In a more elaborate detail, [5] have used students’ engagement LMS data whereas [23] have used learning portfolio LMS data.

3 Predictive Analytics System 3.1

Machine Learning Inference Model

Machine learning classification algorithms are used as the core object of an intelligent inference model. Such algorithms are divided in two major classes according the predicted values of the examined model. In case the predicted value is numerical the machine learning process is called regression. In the opposite case where the predicted value is categorical the machine learning process is called classification. In the classification subcategory, if there are two categorical values, we have the case of binary classification, while when we have more than two classes it is called multiclass classification. Ensemble Classifiers Classifiers are further categorized in base classifiers and ensemble classifiers. The difference between these two categories is that base classifiers are applied to test data and produce a classification model according to certain parameters of their specific design. Instead ensemble classifiers are using more base classifiers to maximize their strengths and eliminate their weaknesses. In this paper, we exploit the potentiality of binary classification base classifiers as well as with an ensemble classifier, which proved to be more efficient, compared with the examined base classifiers. 3.2

Evaluation Method and Metrics

10-Fold Cross Validation We evaluate the examined models with 10-fold cross validation evaluation method, which separates the initial dataset to 10 equal sized parts and then in certain loop uses the first 9 parts to train the classifier and the remaining 1 to test the classifier. This process is repeated until all the parts are used for training and testing. Prediction Accuracy We assess the effectiveness of the classifier by incorporating the prediction accuracy evaluation metric, a 2 ½0; 1, which is defined as follows: a¼

tp þ tn tp þ fp þ tn þ fn

ð1Þ

Intelligent Predictive Analytics for Identifying Students at Risk

155

Where, tp , are the instances, which are classified correct as positives, and, tn , are the instances, which are classified correct as negatives. In addition, fp , are the instances, which are classified false are positives, and, fn , are the instances, which are classified false as negatives. A low value of a means a weak classifier while a high value of a indicates an efficient classifier. Confusion Matrix We also evaluated the adopted classification model with confusion matrix evaluation metric. Confusion matrix is a special form of matrix, which in the case of binary classification has the following form, as described in Table 1:

Table 1. Confusion matrix. class 0 class 1 Classified as A B class 0 C D class 1

Where, “A” quantity depicts the number of class 0 instances, which are classified correct as instances of class 0. “B” quantity depicts the number of class 0 instances, which are falsely classified as instances of class 1. “C” quantity depicts the number of class 1 instances, which are classified falsely as class 0 instances, while “D” quantity depicts the number of class 1 instances correctly classified as instances of class 1. A given classification model is considered as effective if it maximizes “A” and “D” quantities, while concretely minimizes “B” and “C” quantities.

4 Experiments and Results 4.1

Experimental Setup

First Cohort In the first cohort we use a train dataset, which is created through the progress of the lesson during the first cohort. When dataset is created, we use Weka [24] to examine which classifier to adopt for the prediction purpose. Table 2. Final dataset structure. Attribute No of videos watched No of interactive presentation completed No of self-assessment performed No of laboratories completed First test binary score Second test binary score Third test binary score Final test binary score

Type Predictive Predictive Predictive Predictive Predictive Predictive Predictive Class

Value Number Number Number Number {0,1} {0,1} {0,1} {0,1}

156

T. Anagnostopoulos et al.

First Cohort Dataset Structure We create the dataset of the first cohort, which depicts in detail student behavior during the cohort towards final test performance. The initial data of our dataset contained a plethora of information about each student’s interaction with the various learning objects that support the learning activities of the course. For example, it contained information like the count of view/posts, time spent, number of activity attempts, completion status (complete/incomplete), success status (passed/failed), scores, e.t.c. We applied to given dataset a feature extraction process for dimensionality reduction to treat more efficiently the experimental data, and then we concluded to the final dataset structure, which is presented in Table 2. Specifically, overall the first cohort dataset has 183 instances, where each instance depicts all the available information of a unique student attending the lesson. So, we are performing learning and predictive analytics with the data produced by 183 students attending the cohort. In addition, each instance of the working dataset has 8 attributes, 7 of them were predictive attributes, while the last one is the class attribute. Since the class attribute takes two values, we are working with binary classification algorithms. The first 4 predictive attributes take numerical vales, while the next 3 take categorical values, either 0 or 1. The attributes of the assessment activities included in the course, that is lines 5 till 8 of Table 2, exploit the Scorm’s “cmi.success_status” indicator [25], and have 2 values 0 and 1 that correspond to a “pass” or “failed” score in the activity. The “Number of Self-Assessment Performed” attribute sums the number of the successful attempts of a student, utilizing the aforementioned indicator. The Videos activities of the course were encapsulated in to Scorm learning objects in order to exploit a combination of the Scorm’s “cmi.total_time” and “cmi.completion_status” (complete/incomplete) indicators. The attribute Number of Videos Watched is the count of the video views, whereas video view we defined the value of 1 if the student has completed the activity and the total time spent in the activity was relatively close to the duration of the video. The “Number of Interactive Presentations Completed” and “Number of Laboratories Completed” attributes sum the values of 1, which corresponds to the value “completed” in the Scorm’s “cmi.completion_status” indicator.

Table 3. First cohort experimental parameters and values. Parameter 1st base classifier 2nd base classifier 3rd base classifier 4th base classifier 5th base classifier Ensemble classifier Evaluation method Evaluation metrics

Value Naïve Bayes SVM k-NN RIPPER C4.5 Majority voting 10-fold cross validation a, confusion matrix

Intelligent Predictive Analytics for Identifying Students at Risk

157

Adopted Classifiers To define which classifiers to adopt in our study we experimented with certain base classifiers available in Weka [24]. Such base classifiers are the: (1) Naïve Bayes classifier, (2) Support Vector Machine (SVM) classifier, (3) k-Nearest Neighbors (kNN) classifier, (4) Repeated Incremental Pruning to Produce Error Reduction (RIPPER) classifier, and (5) C4.5 classifier. To achieve better performance than that observed with the experimented base classifiers, we further enhanced our model with Majority Voting ensemble classifier. This ensemble classifier combines the decisions of the adopted base classifiers about a certain class value (i.e., of the Final Test Binary Score) for a certain instance. A majority voting is performed to infer the class value based on the statistical mode of the votes of each base classifier. First Cohort Experimental Parameters The experimental parameters of the first cohort include the base and the ensemble classifier as well as the adopted evaluation method and metrics incorporated to asses the proposed classification model. See Table 3. Second Cohort In the second cohort we are using a test dataset, which is created through the progress of the lesson during the second cohort. Overall, the second cohort dataset has 183 instances, where each instance depicts all the available information of a unique new unseen student attending the lesson in the second cohort. When dataset has created we use Weka [24] Majority Voting Ensemble classifier to assess the efficiency of the adopted model with new unseen instances of the second cohort. Second Cohort Dataset Structure The dataset of the second cohort has the same structure as the dataset of the first cohort, since we are able to compare only equally structured classification quantities. Second Cohort Experimental Parameters Experimental parameters of the second cohort include the adopted ensemble Majority Voting classifier as well as the used evaluation method and metrics incorporated to assess the proposed classification model. See Table 4. Table 4. Second cohort experimental parameters and values. Parameter Value Ensemble classifier Majority voting Evaluation metrics a, confusion matrix

4.2

Results and Discussion

First Cohort We evaluate the base classifiers and the ensemble classifier of the proposed model for the first cohort. We used 10-fold cross validation to observe the overall prediction accuracy filtered by McNemar’s statistical significance test and confusion matrixes performance.

158

T. Anagnostopoulos et al.

Experimented Classifiers Prediction Accuracy For the first cohort we used 10-fold cross validation applied in the train dataset. For base classifiers, we observe the following values of prediction accuracy. Naïve Bayes classifier achieves a ¼ 0:6939, SVM classifier achieves a ¼ 0:7103, k-NN classifier achieves a ¼ 0:6721, RIPPER classifier achieves a ¼ 0:6994, and C4.5 classifier achieves a ¼ 0:680: To enhance the proposed model’s prediction accuracy we combine all the 5 base classifiers to a unique classification model composing the Majority Voting ensemble classifier. Consequently, we train the ensemble Majority Voting classifier and we achieved prediction accuracy a ¼ 0:7331. See Fig. 1. To adopt the ensemble classifier as the proposed classification model we applied McNemar’s statistical significance test on the classification results of all the adopted classifiers. McNemar’s test proved that the adopted base classifiers and the ensemble classifier have statistically significant prediction accuracy results. Thus, Majority Voting ensemble classifier is adopted as our optimal prediction model.

1

Prediction Accuracy

0.8

0.6

0.4

0.2

0 Naive Bayes

SVM

k-NN

RIPPER

C4.5

Majority Voting

Classifiers

Fig. 1. Classifiers Prediction Accuracy.

Experimented Classifiers Confusion Matrix. To further assess the efficiency of the adopted classification model we exploited the confusion matrixes of all the classifiers. As we can observe from Table 5, confusion matrix for ensemble classifier Majority Voting outperforms the prediction results of the base classifiers. Second Cohort We applied the adopted ensemble Majority Voting classification model to the test dataset provided by the second cohort. Recall that this dataset is composed by new unseen instances of new students attending the lesson during the second cohort. Prediction accuracy of the adopted Majority Voting model is also filtered by McNemar’s statistical significance test and confusion matrixes performance comparison for both cohorts.

Intelligent Predictive Analytics for Identifying Students at Risk

159

Table 5. Confusion matrixes of the first cohort. Naïve Bayes SVM class 0 class 1 Classified as class 0 class 1 127 0 class 0 96 31 56 0 class 1 22 34 k-Nearest Neighbors RIPPER class 0 class 1 Classified as class 0 class 1 100 27 class 0 93 34 33 23 class 1 21 35 C4.5 Majority voting class 0 class 1 Classified as class 0 class 1 109 18 class 0 108 19 40 16 class 1 30 26

Classified as class 0 class 1 Classified as class 0 class 1 Classified as class 0 class 1

Adopted Ensemble Classifier Prediction Accuracy For the second cohort we observed prediction accuracy value a ¼ 0:7431 by applying ensemble Majority Voting classifier. See Fig. 2. We compared this value with the value of the first cohort a ¼ 0:7322 observed by the same ensemble classification model under the McNemar’s statistical significance test. The test proves that the application of the Majority Voting classification model to the two separate cohorts has statistically significant prediction accuracy values. This means that the adopted model can be generalized and used in consecutive cohorts of the lesson under the assumption that the provided datasets structures will remain the same to compare relative classification quantities.

1

Prediction Accuracy

0.8

0.6

0.4

0.2

0

First Cohort

Second Cohort

Majority Voting Classifier

Fig. 2. Comparison of Majority Voting Prediction Accuracy for both Cohorts.

160

T. Anagnostopoulos et al. Table 6. Comparison of confusion matrixes for majority voting in both cohorts. Majority voting (first cohort) Majority voting (second cohort) class 0 class 1 Classified as class 0 class 1 Classified as 108 19 class 0 108 20 class 0 30 26 class 1 27 28 class 1

Adopted Ensemble Classifier Confusion Matrix. To evaluate the applicability of the adopted ensemble classification model we exploited the confusion matrixes of both cohorts. As we can observe from Table 6, confusion matrix for ensemble Majority Voting classifier can be applied to new unseen students’ instances of the second cohort as well as to the learned instances of the first cohort.

5 Conclusion and Future Directions In this paper we described an approach for identifying students at risk of failure in online courses and the construction of a prediction model. Our approach was implemented in Moodle LMS using data of two cohorts of an online course. The classifier was trained on the data of the first cohort while the second cohort data used for testing the classifier itself. We used 10-fold cross validation to observe the overall prediction accuracy filtered by McNemar’s statistical significance test and confusion matrixes performance. We enhanced our final prediction model by combining all the five base classifiers to a unique classification model composing the Majority Voting ensemble classifier. The results of the final classifier were tested under the McNemar’s statistical significance test. The test proved that the application of the Majority Voting classification model to the two separate cohorts has statistically significant prediction accuracy values. This means that the adopted model can be generalized and used in consecutive cohorts of the lesson. On a conceptual level, our findings suggest predictor factors that are closely related to the learning design of the course. More specifically, the successful completion of a set of course activities are critical for the successful completion of the course. Successful completion in our case study means that there is an actual student effort. Such an effort may be watching a video, interacting with a multimedia content, doing a laboratory exercise or a self-assessment quiz, and passing formative assessment tests. Such a focus can give us a more semantically set of predictor factors affecting students’ academic failure as well as essential feedback for the redesign of the course. Finally, we are in the process of implementing a plugin in the environment of Moodle for the application of an Early Warning Activity Module. Our scope is to develop an Early Warning Intelligent System capable to provide suitable feedback for maximizing student achievement. Finally, controlling the risk of students’ failure in relation to the learning design of the course is another dimension of our future work.

Intelligent Predictive Analytics for Identifying Students at Risk

161

References 1. Conijn, R., Snijders, C., Kleingeld, A., Matzat, U.: Predicting student performance from LMS data: a comparison of 17 blended courses using moodle LMS. IEEE Trans. Learn. Technol. 10(1), 17–29 (2017) 2. Dawson, S., McWilliam, E., Tan, J.P.L.: Teaching smarter: how mining ICT data can inform and improve learning and teaching practice. In: Proceeding of the International Conference ASCILITE 2008, pp. 221–230, Melbourne, Australia (2008) 3. Romero, C., Ventura, S.: Educational data mining: a review of the state of the art. IEEE Trans. Syst. Man Cybern. Part C 40(6), 601–618 (2010) 4. Romero, C., Ventura, S.: Guest editorial: special issue on early prediction and supporting of learning performance. IEEE Trans. Learn. Technol. 12(2), 145–147 (2019) 5. Macfadyen, L.P., Dawson, S.: Mining LMS data to develop an ‘‘early warning system” for educators: a proof of concept. Comput. Educ. 54, 588–599 (2010) 6. Cerezo, R., Sanchez, M., Puerto, M.: Students’ LMS interaction patterns and their relationship with achievement: a case study in higher education. Comput. Educ. 96, 42–54 (2016) 7. Petropoulou, O., Kasimatis, K., Dimopoulos, I., Retalis, S.: LAe-R: a new learning analytics tool in Moodle for assessing students’ performance. Bull. IEEE Tech. Committee Learn. Technol. 16(1) (2014) 8. Mor, Y., Craft, B., Hernández-Leo, D.: Editorial: the art and science of learning design. Res. Learn. Technol. 21 (2013) 9. Laurillard, D.: Teaching as a Design Science: Building Pedagogical Patterns for Learning and Technology, vol. 7625, pp. 41–42. Routledge/Taylor & Francis Group. Empire Drive (2012) 10. Lokyer, L., Heathcote, E., Dawson, S.: Informing pedagogical action: aligning learning analytics with learning design. Am. Behav. Sci. 57(10), 1439–1459 (2013) 11. Nistor, N., Neubauer, K.: From participation to dropout: quantitative participation patterns in online university courses. Comput. Educ. 55(2), 663–672 (2010) 12. Rienties, B., Toetenel, L., Bryan, A.: “Scaling up” learning design: impact of learning design activities on LMS behavior and performance. In: Proceeding of the 5th International Conference on Learning Analytics Knowledge, Poughkeepsie, NY, USA (2015) 13. Rienties, B., Nguyen, Q., Holmes, W., Reedy, K.: A review of ten years of implementation and research in aligning learning design with learning analytics at the Open University UK. Interact. Des. Arch. 33, 134–154 (2017) 14. Jung, I.: The dimensions of e-learning quality: from the learner’s perspective. Educ. Technol. Res. Dev. 59, 445–464 (2011) 15. Dondi, C., Moretti, M., Nascimbeni, F.: Quality of e-learning: negotiating a strategy, implementing a policy. In: Ehlers, U.D., Pawlowski, J.M. (eds.) Handbook on Quality and Standardization in e-Learning, pp. 31–50. Springer, Berlin (2006). https://doi.org/10.1007/3540-32788-6_3 16. Fredericks, J.A., Blumenfeld, P.C., Paris, A.H.: School engagement: potential of the concept, state of the evidence. Rev. Educ. Res. 74, 59–109 (2004) 17. Aguiar, E., Chawla, N.V., Brockman, J., Ambrose, G.A., Goodrich, V.: Engagement vs performance: using electronic portfolios to predict first semester engineering student retention. In: Proceedings of the 4th International Conference on Learning Analytics And Knowledge, pp 103–112, Indianapolis, Indiana, USA, 24–28 March (2014) 18. Rostaminezhad, M.A., Mozayani, N., Norozi, D., Iziy, M.: Factors related to e-learner dropout: case study of IUST elearning center. Procedia-Soc. Behav. Sci. 83, 522–527 (2013)

162

T. Anagnostopoulos et al.

19. Seidman, A.: Minority student retention: resources for practitioners. New Dir. Inst. Res. 125, 7–24 (2005) 20. Tinto, V.: Completing College: Rethinking Institutional Action. University of Chicago Press (2012) 21. DeBerard, M.S., Spielmans, G.I., Julka, D.L.: Predictors of academic achievement and retention among college freshmen: a longitudinal study. Coll. Stud. J. 38(1), 66–81 (2004) 22. Zhang, G., Anderson, T.J., Ohland, M.W., Thorndyke, B.R.: Identifying factors influencing engineering student graduation: a longitudinal and cross-institutional study. J. Eng. Educ. 93 (4), 313–320 (2004) 23. Hu, Y.H., Lo, C.L., Shih, S.P.: Developing early warning systems to predict students’ online learning performance. Comput. Hum. Behav. 36, 469–478 (2014) 24. Frank, E., Hall, M.A., Witten, I.H.: The Weka Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann (2016) 25. Sharable Content Object Reference Model. https://adlnet.gov/projects/scorm/. Accessed 06 Feb 2020

Prediction of Users’ Professional Profile in MOOCs Only by Utilising Learners’ Written Texts Tahani Aljohani1(&), Filipe Dwan Pereira2, Alexandra I. Cristea1, and Elaine Oliveira2 1 Department of Computer Science, Durham University, Durham, UK {tahani.aljohani,alexandra.i.cristea}@durham.ac.uk 2 Institute of Computing, Federal University of Roraima, Boa Vista, Brazil

Abstract. Identifying users’ demographic characteristics is called Author Profiling task (AP), which is a useful task in providing a robust automatic prediction for different social user aspects, and subsequently supporting decision making on massive information systems. For example, in MOOCs, it used to provide personalised recommendation systems for learners. In this paper, we explore intelligent techniques and strategies for solving the task, and mainly we focus on predicting the employment status of users on a MOOC platform. For this, we compare sequential with parallel ensemble deep learning (DL) architectures. Importantly, we show that our prediction model can achieve high accuracy even though not many stylistic text features that are usually used for the AP task are employed (only tokens of words are used). To address our highly unbalanced data, we compare widely used oversampling method with a generative paraphrasing method. We obtained an average of 96.4% high accuracy for our best method, involving sequential DL with paraphrasing overall, as well as per-individual class (employment statuses of users). Keywords: Imbalanced data

 MOOCs  Deep Learning  Author Profiling

1 Introduction and Related Works The wave of innovations in artificial intelligence field has got a great interest in many life aspects such as education systems, with one output being the so-called Massive Open Online Courses (MOOCs). They are educational systems providing free or affordable cost courses, and this has attracted a huge number of individuals to learn in MOOCs. Due to this phenomenon, users in MOOCs are very varied in terms of age, gender, employment status, etc. Follow-up here is that it may be that the very variety of the users affects their needs and thus involvement with the MOOCs [8]. In order to improve this critical avenue of education, it is important to provide personalised recommendation systems and deliver courses for users, based on their demographic characteristics like job-status of a learner. According to Kellogg et al. [9] and Reich et al. [14], the most common type of users who are attracted to MOOCs are those who are aiming to enhance their work/professional skills. Whilst many MOOCs give the opportunity to their users to specify demographic data about themselves, however, the © Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 163–173, 2020. https://doi.org/10.1007/978-3-030-49663-0_20

164

T. Aljohani et al.

actual percentage of them who fill-in these data is extremely low (only 10%, overall) [1]. Hence, many MOOCs end up with several missing values. For instance, some of the learners have declared their age and gender, but they haven’t declared their job status. In the present study, we have aimed to predict employment status of MOOCs’ users automatically, to serve as a means to design customised recommendations, and that pre-course questionnaires with high cognitive overhead for AP could become redundant. Profiling an author is a text classification task based on supervised learning, that often relies on extracting a set of features from authored texts. This approach mainly depends on the fact that authors’ traits can be inferred from his/her writing. Textual features used for AP task are called Stylometry features, and they are normally categorised into five levels of data representation: lexical, semantic, syntactic, structural, and domain (or content) specific [12]. These features used to help machine learning models for prediction. For example, a very recent works in AP have applied bag of words to predict gender [15]. Others predict gender and bot (human or not) based on character and word n-gram features [16], semantic and syntactic features [10], or Twitter-specific features like ratio of the number of hashtags [6]. Models that have achieved competitive results for the AP task are those which build based on ensemble learning [3], and this is according to works done in a ‘shard’ task called The Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN), which is a yearly competition for AP [13]. In addition, predicting user age, gender, native language has been widely studied in AP area. To the best of our knowledge, this is the first employment status prediction work for MOOCs; also, this is the first implementation of AP on MOOCs, as well as using of text augmentation with the AP task. We compared the two basic styles of ensemble learning (parallel with sequential ensemble learning architectures) to show their performance differences on employment status classification. Instead of many stylometry features, our work only uses simple word tokens from user comments. This is to reduce computational expenses, especially after applying the relatively ‘heavy’ ensemble Deep Learning (DL) model. However, it is an essential requirement for a DL model to have a large data size (samples) to learn from, and in our work, we meet this (see Sect. 2.2). One limitation that we have in our data is the imbalanced distribution of class categories. We have paid great attention to this limitation as it makes training a deep architecture a difficult task, thus a data augmentation technique is explored in our work as well, see Sect. 2.3.

2 Methodology 2.1

Problem Definition

The problem is a Text Classification problem, which can be explained mathematically as follows. We are looking to propose a method that represents: Function : f : D ! C

ð1Þ

Prediction of Users’ Professional Profile in MOOCs

165

where D is a collection of documents, and C a fixed set of classes C = {c1, c2, … , cn}, which are the class categories in our study that we aim to predict, with: • Input: a document d 2 D, which is a sample (a comment written by a user). • Output: a predicted class c 2 C, where f(d) = c. • Target: a target (correct) class c′2 C, where if c = c′, we can say our prediction is correct. The accuracy of the predicted class measure by a simple distance function, as following:  Distance

2.2

1 if c ¼ c0 0 if c ¼ 6 c0

ð2Þ

Data Description and Pre-processing

For generalisability of our study, we collected comments of learners (who declare their work status) from 7 courses and different domains - social sciences, literature, and computer science. These courses are delivered repeatedly by the University of Warwick on FutureLearn, a European MOOC platform. Each delivery is called a run, resulting in a total of 27 runs; each run covers several weeks, each week has weekly learning units (e.g. videos, texts, quizzes, etc.), and students can comment to discuss these units. We gathered thus 381298 comments from 9538 users. There were several types of work statuses recorded, which we further grouped into more generic types as follows: retired, working, and not working. This was done due to the fact that some of the original fine-grained statuses were hard to differentiate and slightly ambiguous – such as ‘looking for work’ versus ‘unemployed’, etc., as well as to aim at a higher accuracy in prediction. Additionally, we systematically avoid user-related bias (e.g., by learning about the user instead of the type of user) in our training, by making sure that no comment written by the same user is included in training, and validation set. This provides generalisability and dependency of samples. Thus, we collected the comments from only one run from each course for the validation dataset. This is because in each run there is a new group of learners. Also, this provided us with enough samples for the validation set. Moreover, to obtain the same class proportion on the training, validation and testing set, we used stratified sampling, which separates the observations into homogenous groups (by label) before sampling. In total, 60815 comments from 2569 users (15.95% of the original data set) were used for validating the model. 84.05% of the dataset, equivalent to 320483 comments from 6969 users, was further divided into training (80%) and testing (20%) sets. Then, we have created a pipeline of text normalisation to be used in every single model in our experiments to pre-process all comments. We have expanded contraction, standardised URLs, punctuations, and special characters. Although these cleaning steps are well known and widely applied in Natural Language Processing (NLP), we have experimented also not applying them. However, the prediction results were worse, which suggests that these steps are not jobspecific.

166

2.3

T. Aljohani et al.

Balancing Data per Class: Samples Generating

To achieve a better performance for supervised learning problems, samples have to be unbiased toward one class in order to reduce a tendency toward predicting the majority class. However, classes in our original dataset were unbalanced (retirees are 154527 samples, workers are 117138, and non-workers 48818). So, to avoid expensive options in terms of time and money, we have got more samples in-house by using a technique called text augmentation for oversampling. Here, we compare two methods, the popular oversampling technique which is called synthetic minority oversampling technique (SMOTE) [2], and a modern NLP technique which is Paraphrasing [7]. We use the latter to simulate a human-level of text understanding. Thus, as rephrasing texts helps in human understanding, it could help machines to understand texts by ‘seeing’ different forms of texts with the same meaning. Specifically, we paraphrase sentences from the smaller size categories to train the model. To do so, we tokenise large comments, using ‘.’ for tokenisation from those minority groups and paraphrase each sentence, until we achieve the same number of instances as in the majority class. We replace words by their synonyms and expressions by their paraphrases, to generate new comments. We use the paraphrase database PPDB [7], with over a billion paraphrasepairs, covering several languages. The idea behind it is that if two strings S1 and S2, in a language A, have the same translation f in another language B, then the pair has the same meaning, being a pair of paraphrases. 2.4

Classification Based on Textual Features

To capture different patterns among users’ texts, we have started by measuring the basic linguistic features. This step is usually an initial step in many AP studies, as it can provide an overall understanding of textual features. Here, however, this step has not illustrated any difference among the three categories. Instead, we notice that they are very similar from a statistical textual pattern point of view, as can be seen in Fig. 1 and Table 1. Distributions refer to numbers of characters, words and sentences, respectively.

Fig. 1. Boxplot based on basis statistical features for different text levels per class.

Prediction of Users’ Professional Profile in MOOCs

167

Table 1. Basic statistical features for different text levels per class Not working Retired Working Statistic Std. error Statistic Std. error Statistic Std. error Character level (statistical features based on characters in each comment) Mean number 298.56 1.201 289.6 0.657 318.04 0.819 Std. deviation 265.305 258.24 280.44 Minimum 1 1 1 Maximum 1299 1319 1311 Interquartile range 295 285 325 Skewness 1.522 0.011 1.562 0.006 1.389 0.007 Kurtosis 2.026 0.022 2.291 0.012 1.454 0.014 Word level (statistical features based on words in each comment) Mean number 52.58 0.213 51.23 0.116 55.92 0.143 Std. deviation 47.142 45.607 48.906 Minimum 1 1 1 Maximum 258 263 247 Interquartile range 51 51 58 Skewness 1.541 0.011 1.561 0.006 1.36 0.007 Kurtosis 2.118 0.022 2.336 0.012 1.396 0.014 Sentence level (statistical features based on sentences in each comment) Mean number 3.16 0.011 3.11 0.006 3.23 0.007 Std. deviation 2.429 2.299 2.448 Minimum 1 1 1 Maximum 32 37 36 Interquartile range 3 3 3 Skewness 2.054 0.011 1.988 0.006 1.964 0.007 Kurtosis 7.037 0.022 6.52 0.012 6.642 0.014

As the distributions from the three text levels are not normal, we have applied the pairwise Mann-Whitney test to compare the comments text level by class, and this to calculate affect size [17]. Although in most of the hypotheses tests (see Table 2), we found p < 0.05, p-values are affected by sample size [5]. The effect sizes from all text levels, however, are close to zero, which shows that the probability of a group to have a higher rank than other groups is also close to zero; and, hence, we can conclude there is no significant difference between the parameters length and number in the writing style of the different groups (not working, working, and retired; see Table 2).

168

T. Aljohani et al. Table 2. Pairwise Mann-Whitney test Character

Word

Sentence

Not working Z −5.97 −3.861 −0.387 vs Retired Effect size 0.00017527 7.3311E−05 7.3653E−07 Asy. Sig. (2) 0.000 0.000 0.699 Not working Z −10.821 −12.171 −5.878 vs Working Effect size 0.00070558 0.00089261 0.00020819 Asy. Sig. (2) 0.000 0.000 0.000 Retired Z −22.944 −22.173 −8.88 vs Working Effect size 0.00317211 0.0029625 0.00047516 Asy. Sig. (2) 0.000 0.000 0.000

2.5

Classification Based on Text Weighting Schemes

The initial step in Sect. 2.4 did not provide us with any interpretation of textual features, so we applied machine learning classifiers. First, we have applied different weighting schemes that are used for feature extraction. These vector-based models extract features from texts and convert them into matrixes, based on different views, then represent texts in vector space models. After that, the extracted features can be used as inputs for a classifier for prediction. Here we have used Term FrequencyInverse Document Frequency (TF-IDF), a baseline weighting schema in NLP. We have applied three forms of TF-IDF, in addition to a simple weighting schema (Word Count), explained next: Word Count. It generates vectors based on word occurrence in a corpus. TF-IDF. It generates vectors based on word frequency. This model calculates two scores, tf and idf, combined into the tfidf formula, as follows: tf ðt; d Þ ¼ log ð1 þ freqðt; d ÞÞ   N idf ðt; DÞ ¼ log countðd 2 D : t 2 d Þ

ð3Þ

tfidf ðt; d; DÞ ¼ tf ðt; d Þ: idf ðt; DÞ where t is a term in document d, and D is the entire text corpus. TF-IDF Based on n-gram. This generates TF-IDF vectors for each gram. It is well known in the NLP domain that n-gram models boost accuracy of classification, as they take into account sequences of words. For deeper representation of texts, we also have examined n-gram TF-IDF based on characters (n = 2, 3, for both words and character). Principal Component Analysis. The above-mentioned schemas provide different features, which is typically good for training a classifier; however, more features bring more variance (noise). Thus, we have applied Principal Component Analysis (PCA), a popular dimension reduction technique. Mathematically speaking, it transforms each

Prediction of Users’ Professional Profile in MOOCs

169

data point represented by vector x and n features, to a vector z, with fewer dimensions, m. This can be done through a Linear Transformation: T (z = wT. x). Figure 2 is a simple visualisation of the PCA vectors obtained for the three classes: working, not working, retired. These final feature representations are fed to the classifiers (Naïve Bayes, Logistic Regression, XGBoost) based on the two balancing methods: SMOTE and Paraphrasing (see Table 3).

Word Count

TF-IDF

TF-IDF Word n-gram

TF-IDF Character n-gram

Fig. 2. Dimension Reduction (PCA) in the space for our Weighting Schemas

2.6

Classification Based on Deep Learning Architecture

The results based on simple classifiers (Table 3) are not promised. Thus, we used deep learning models, specifically ensemble learning, as they are recommended as effective for classification problems [3]. As such, in our experiment, we aimed to compare the performance of the two basic categories of ensemble learning. Parallel (Bagging) ensemble learning are base-learners, which work in parallel, and sequential (Boosting) ensemble learning are base-learners that work in a chain. Stacking is another option for sequential learning; however, we have chosen boosting over stacking for sequential learning for fairness comparison. This is because both Bagging and Boosting have some kind of deterministic strategies when combining results. The key difference between these two styles is the dependency between the base-learners. In the parallel ensemble, these base learners are independent and this can reduce errors considerably, by averaging, or voting [11]. In the sequential fashion, base-learners are dependent, which means that any mislabelled sample in a previous step will influence the chain, which affects the overall performance [11]. It is worth mentioning here that Convolutional Neural Networks (CNN) model performs well in online data texts, as they are good in handling independent features, like new words in a language. Recurrent Neural Networks (RNN) model is considered effective for sequence modelling, such as analysing a sequence of words, and they handle sequential aspects of the data, based on the position/time index in a sentence [18], and also words semantics. However, they are still not effective enough to handle small parts of texts, compared to CNN. As a result, a combination of CNN and RNN in an ensemble technique could provide complementary information about the author writing features, by modelling semantic information of a text globally and locally [3]. We have experimented different models based on

170

T. Aljohani et al.

CNN and RNN, and best performance was from the model that used both CNN, and RNN, by means of an ensemble deep learning architecture in a sequential manner (see Sect. 3). The model architecture (see Fig. 3) is described in the following. Embedding Layer. The role of this layer is to map each comment sequence (tokens) onto a real vector domain (Individual Sequences). Thus, an entire comment representation (X) is mapped to a matrix of size s  d: X 2 Rsd; where s is the maximum number of words in the longest comment, and d is the embedding space dimension. CNN Layer. It is a hidden layer, containing the convolutional model. A convolution ci applies a non-linear function f as follows: X ci ¼ f ð j; k wj; k ðx½i : i þ h  1 Þj; k þ b Þ ð4Þ Where i is the current input vector, j a position in the convolution kernel/filter k, and h is the number of words in spans (size of the convolution), b a bias term, w a weight and x is the current word embedding. [i : i] represents a sub-matrix of x [4]. The final merged output matrix that is the output of the CNN model is fed as input to the RNN model, as part of the sequence in the ensemble learning. This transfer is simply done by sharing the internal weights of neurons through the input sequence. Recurrent Layer. We used Long Short Term Memory (LSTM), an efficient model from the RNN family for text classification [4]. The following formulas briefly describe memories/gates inside a hidden unit of LSTM that help it to remember term information:   ft ¼ r Wf  xt þ Uf  ht1 þ bf it ¼ r ðWi  xt þ Ui  ht1 þ bi Þ ot ¼ r ðWo  xt þ Uo  ht1 þ bo Þ ct ¼ ft  ct1 þ it  tanh ðWc  xt þ Uc  ht1 þ bc Þ

ð5Þ

ht ¼ ot  tanh ðct Þ Where t is the timestep, h is a hidden state, ft is the ‘forget gate’, it is the input gate, ct is the cell state, U is the weighted matrics. b is bias term, w is the weight term in these functions, r is the sigmoid function, and o is Hadamard product [4]. To further enhance the LSTM structure and make it able to take past word information into account, we use the bidirectional strategy, which means deploying two LSTMs to feed our data inputs in two different directions. Inputs of the two-LSTMs then will stack together, for better understanding of word sequences. Classification Layer. Then, we use the flattening layer for the representation of the output data to be fed into a final classification layer. The softmax function is best used with the last layer, to predict learners’ employment status, as it uses the probability distribution of categories as a set of numbers between 0 and 1, whose sum is 1.

Prediction of Users’ Professional Profile in MOOCs

171

3 Results and Discussion We present prediction’s performance of all models experimented in this study in Table 3, based on comparison of the two balancing methods that been mentioned in Sect. 2.3 for each model (SMOTE and Paraphrasing). The performance simply measured by the ratio of the correctly predicted samples to the total samples in our data (Accuracy). Generally speaking, deep learning classifiers have achieved higher accuracies for our task, compared with the traditional classifiers. In addition, all models have recorded higher accuracy with the paraphrasing balancing strategy, with exception of Naïve Bayes (with TF-IDF) and CNN. For deep learning classifiers, we have compared with a CNN model, as well as an RNN model, to confirm our intuition that an ensemble method is more appropriate for this task than a ‘simple’ deep model. For fairness in comparisons, all these models use the exact same parameters. In the sequential ensemble model, we have obtained 96.4% overall accuracy on the validation samples collection. CNN performance is superior to RNN, even they were training on an identical embedding layer. On the other hand, sequential ensemble learning has achieved better results than parallel ensemble learning, although the computation time of the sequential ensemble learning in our Graphics Processing Units (GPUs) was less than the computation time of parallel ensemble learning. Overall, we have designed our model with awareness of computational issues, as follows. Whilst our model, as an ensemble model, can introduce a level of complexity, we have strived to reduce this by only considering simple inputs (tokens of words, instead of complex stylometric features) to represent the data, which can reduce computation time and storage.

Fig. 3. Sequential ensemble learning model

To assess the ensemble models in a comprehensive and realistic way, we used several popular performance measurements: F1-score, precision, and recall. They provide a full picture of the range of performance for each model. Table 4 shows details about the parallel model’s performance for each category of the employment

172

T. Aljohani et al.

status, while Table 5 displays the same information, for the sequential model. It is important to not report only average results, which could be strongly biased. As can be seen in Table 4 and Table 5, even at the detailed category level, our model performs exceptionally well. Table 3. Prediction accuracy results of all models (in %) Model Naïve Bayes

Weighting schemes Word vectors TF-IDF N-Gram TF-IDF N-Gram character TF-IDF Logistic Regression Word vectors TF-IDF N-Gram TF-IDF N-Gram character TF-IDF XGBoost Word vectors TF-IDF N-Gram TF-IDF N-Gram character TF-IDF CNN – RNN – Parallel ensemble – Sequential ensemble –

Table 4. Results of Parallel model (Paraphrasing) on each class target (in %). Class Working Not working Retired

Precision 90 83 93

Recall 91 92 89

F1 90 87 91

SMOTE 75.0 64.2 59.3 60.5 85.2 68.8 64.3 71.8 52.3 53.3 49.5 60.3 93.4 76.4 87.2 81.3

Paraphrasing 75.9 63.5 63.1 63.3 88.8 71.5 69.2 72.4 59.4 59.9 59.8 65.3 92.2 78.5 90.3 96.4

Table 5. Results of Sequential model (Paraphrasing) on each class target (in %). Class Working Not working Retired

Precision 98 98 93

Recall 95 97 97

F1 97 97 95

A potential way to use these results is shown in the IF-THEN rules below: IF Learner = Working THEN recommend them summaries of essential information/ short courses (assuming they have less time) IF Learner = Not Working THEN recommend them advanced and specialised courses (assuming they need to improve their knowledge and skills)

Prediction of Users’ Professional Profile in MOOCs

173

4 Concluding Remarks and Future Works Here, we combine the NLP and DL models to predict a new area, the learners’ current job situation, on limited, easily available data - only by using their comments. We can conclude from our experimental results that using the sequential architecture in an ensemble model, for learning the data representation in our task, associated with the paraphrasing strategy to balance data, can perform with high accuracy for this task, establishing a new state of the art. Next, we will apply this model for predicting the job based on comments collected per user. Then, we will develop a personalised interface, adapted to learners’ jobs, which could increase the engagement among learners in MOOCS. Acknowledgement. This work was funded by Ministry of Education of Saudi Arabia.

References 1. Almatrafi, O., Johri, A.: Systematic review of discussion forums in massive open online courses (MOOCs). IEEE Trans. Learn. Technol. PP, 1 (2018) 2. Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) 3. Chen, G., et al.: Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In: IJCNN Proceedings (2017) 4. Cliche, M.: BB_twtr at SemEval-2017 task 4: Twitter sentiment analysis with CNNs and LSTMs. In: ACL Proceedings, pp. 573–580 (2017) 5. Cohen, J.: Statistical Power Analysis for the Behavioural Sciences. Routledge, New York (2013) 6. Gamallo, P., Almatarneh, S.: Naive-Bayesian classification for bot detection in twitter notebook for PAN at CLEF 2019. In: CEUR Proceedings (2019) 7. Ganitkevitch, J., Callison-Burch, C.: The multilingual paraphrase database. In: LREC (2014) 8. Gardner, J., Brooks, C.: Student success prediction in MOOCs. User Model. User-Adapt. Interact. 28, 127–203 (2017) 9. Kellogg, S., et al.: A social network perspective on peer supported learning in MOOCs for educators. Int. Rev. Res. Open Distance Learn. 15, 263–289 (2014) 10. Kovács, G., et al.: Author profiling using semantic and syntactic features notebook for PAN at CLEF 2019. In: CEUR Proceedings (2019) 11. Liu, H., et al.: Ensemble learning approaches. In: Rule Based Systems for Big Data, pp. 63– 73 (2016) 12. Raghunadha Reddy, T., et al.: A survey on Authorship Profiling techniques. Int. J. Appl. Eng. Res. 11(5), 3092–3102 (2016) 13. Rangel, F., Rosso, P.: Overview of the 7th author profiling task at PAN 2019: bots and gender profiling. In: CEUR Proceedings (2019) 14. Reich, J., Tingley, D., Leder-Luis, J., Roberts, M.E., Stewart, B.M.: Computer-assisted reading and discovery for student generated text in massive open online courses. J. Learn. Anal. 2, 156–184 (2015) 15. Sezerer, E., et al.: A Turkish dataset for gender identification of Twitter users. In: ACL, LAW XII, pp. 203–207 (2019) 16. Vogel, I., Jiang, P.: Bot and gender identification in Twitter using word and character NGrams notebook for PAN at CLEF 2019. In: CEUR Proceedings (2019) 17. Wassertheil, S., Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Biometrics (1970) 18. Yin, W., et al.: Comparative study of CNN and RNN for natural language processing. CoRR (2017)

Cohesion Network Analysis: Predicting Course Grades and Generating Sociograms for a Romanian Moodle Course Maria-Dorinela Dascalu1, Mihai Dascalu1,2(&), Stefan Ruseti1, Mihai Carabas1, Stefan Trausan-Matu1,2, and Danielle S. McNamara3 1

University Politehnica of Bucharest, 313 Splaiul Independentei, 060042 Bucharest, Romania [email protected], {mihai.dascalu, stefan.ruseti,mihai.carabas,stefan.trausan}@cs.pub.ro 2 Academy of Romanian Scientists, 54 Splaiul Independenţei, 050094 Bucharest, Romania 3 Department of Psychology, Arizona State University, PO Box 871104, Tempe, AZ 85287, USA [email protected]

Abstract. Online collaborative learning environments open new research opportunities, for example, the analysis of learning outcomes, the identification of learning patterns, the prediction of students’ behaviors, and the modeling and visualization of social relations and trends among students. Moodle is an online educational platform which supports both students and teachers, and can be effectively employed to encourage collaborative learning. Moodle is often used to make inquiries on student homework, exams, to request clarifications, and to make announcements. Our goal is to predict student success based on Cohesion Network Analysis (CNA) and to identify interaction patterns between students (n = 71 who had a sufficient level of participation on the forum) and 4 tutors together with 19 teaching assistants in a Romanian Moodle course. CNA visualizations consider a hierarchical clustering that classifies members into central, active, and peripheral groups. Weekly snapshots are generated to better understand students’ evolution throughout the course, while correlating their activities with specific course events (e.g., homework deadlines, tests, holidays, exam, etc.). Several regression models were trained based on the generated CNA indices and the best model achieves a mean average error below .5 points when predicting partial course grades, prior to the final exam, on a 6-point scale. Keywords: Moodle  Student success Interaction patterns  Sociograms

 Cohesion Network Analysis 

1 Introduction Online collaborative learning environments are increasingly used by both learners and teachers. They facilitate rapid access to information and learning resources, open discussions, sharing opinions and experiences with peers. Many learners search for © Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 174–183, 2020. https://doi.org/10.1007/978-3-030-49663-0_21

Cohesion Network Analysis

175

answers in online forum discussion and share their experiences and opinions with others. In addition to the facilities brought to people, online collaborative learning environments open new research areas, allowing researchers to analyze students’ behaviors, to identify interaction patterns, or to model students’ participation. Moodle is a popular Course Management System (CMS) capable of supporting both students and teachers in the learning process, which can be used to supplement face-to-face courses [1]. Dougiamas et al. [2] consider Moodle as a tool for improving processes within communities of reflective analysis. Many studies have demonstrated that learning via Moodle increases productivity and helps students reinforce their abilities and knowledge [3]. Costa et al. [4] investigate the use of Moodle in a Portuguese University and found out that students recognize the importance of the Moodle platform when used mainly for posting course materials. Moodle is often used to make inquiries on student homework or exams, to request clarifications, as well as to make announcements. Moodle can be effectively employed to stimulate collaborative learning by encouraging students to express their ideas, ask for opinions, and learn from others. Messages from the course forums are the core learning traces used in this study. The research presented in this paper has two main purposes. First, our aim is to predict course grades based on students’ participation in a Moodle instance at our university, by considering their forum posts in Romanian language. Second, we evaluate the interactions between participants and the impact of homework deadlines and tests in their online contributions through interactive sociograms. This study builds on the analyses performed by Sirbu and colleagues [5, 6] and introduces an adapted pipeline for the Romanian language, a first integration with Moodle via a dedicated export function which facilitates follow-up forum discussion analyses, and views in which roles (i.e., lecturer, teaching assistant and student) are differentiated in order to provide additional insights on the course dynamics. The ReaderBench framework (http://www.readerbench.com) [7] was used to analyze the discussion threads exported from Moodle for a course centered on the Use of Operating Systems. The indices generated by ReaderBench were used to predict the course grades and to observe the interactions between participants in a user-friendly manner. Although researchers focused on the objective of predicting student performance have commonly used various machine learning algorithms [8–10], little to no work has been conducted to integrate such techniques into Moodle. To our knowledge, only one Moodle plugin exists so far integrating Social Network Analysis (SNA) techniques – the SNA Tool available online at https://moodle.org/plugins/mod_sna. However, the tool has an extremely limited set of functionalities, basic diagrams considering only post counts, and with a low user traction derived from its download count. We viewed this gap as an opportunity to build a comprehension analysis model designed to support both students and tutors. This paper is structured as follows. The next section introduces our integrated approach. which contains information about the corpus, the data extraction process and customization procedures, together with our interactive visualizations. The following section presents the regression results when predicting partial course grades which denote the students’ activity throughout the semester. The final section comprises conclusions and suggestions for future work.

176

M.-D. Dascalu et al.

2 Method 2.1

Corpus

Our corpus consists of data collected from a Moodle course centered on the Usage of Operating Systems for freshman students in the Computer Science Department from the University Politehnica of Bucharest. The data included forum posts in which usernames were anonymized, together with each contribution’s timestamp, as well as the reply-to links from each forum discussion thread. A total of 557 students were enrolled in the course, which was carried out in parallel in 4 student cohorts, by 4 lecturers and 19 teaching assistants. Overall, 164 students sent posts on the Moodle platform, there were 134 discussion threads and 782 contributions. From the 164 students who wrote posts on the forum, only 71 had a sufficient level of participation given the threshold, which was empirically set at minimum 1 contribution and 20 content words. The students included in this study achieved course grades that ranged from 2.23 to 6 (M = 5.39, SD = 0.72) out of 10 for the entire course. The course lasted for 16 weeks (14 weeks of classes and 2 weeks of holidays during and after Christmas), specifically, between September 25, 2017 and January 26, 2018, and it was held in the Romanian language. 2.2

Moodle Data Extraction

Data was exported from a relational database (MariaDB) and converted into XML files tailored for the ReaderBench framework [7]. Moodle data are housed in multiple tables and we solely relied on the table containing the effective posts. Thus, all the posts were extracted from the database and converted as follows. First, all rows were iterated, and discussion threads were identified. Second, relevant data were extracted (i.e., usernames, date, posts, replies, parent ids) and anonymized, followed by the generation of separate XML files, with a specific hierarchical structure, for each discussion thread. All generated XML files were subsequently processed by the ReaderBench framework. 2.3

Integrated Processing Approach

The ReaderBench framework was used to generate multiple CNA and textual complexity indices further used to model the discourse from the texts in Romanian extracted from Moodle. ReaderBench relies on CNA, which automatically evaluates students’ participation and collaboration by examine cohesive links throughout the discourse. The framework uses spaCy (https://spacy.io/), a Natural Language Processing (NLP) library written in Python, for performing text pre-processing. In ReaderBench, the CNA processing pipeline has several steps, as depicted in Fig. 1. The baseline method considering only message counts was proven to be less predictive than the CNA indices [7] which take into account discourse and the manner in which ideas are expanded collaboratively and in a cohesive manner.

Cohesion Network Analysis

177

Fig. 1. CNA automated processing pipeline.

First, the discussion threads are sent to the ReaderBench processing pipeline. Second, the NLP pre-processing stage includes tokenization, part of speech tagging, syntactic dependency parsing, stop word elimination, and lemmatization, all tailored for Romanian language. Third, a cohesion score is computed as an aggregated semantic distance relying on Latent Semantic Analysis (LSA) [11] and words2vec semantic models [12] trained on a Romanian text corpus of approximately 1 billion words. The fourth step consists of building a cohesion graph which serves as a proxy for the underlying semantic content of each discussion thread from the available forum instances. Fifth, the CNA scoring mechanism is applied to all contributions, followed by the generation of specific CNA indices which are related to Social Network Analysis metrics [13]. Finally, course grades are predicted using various statistical and machine learning models starting from the CNA indices. In addition, different sociograms are generated to depict the course evolution in terms of participation, together with the interactions between participants. Course participants are grouped into three clusters: central, active, and peripheral members based on CNA indegree (predicting collaboration inside the community) and outdegree (predicting participation inside the community) indices [14]. A hierarchical clustering algorithm based on the Ward criterion [15] was applied in order to group the participants based on similar degrees of activity. Besides the clustering algorithm, participants are also grouped based on their roles, namely students, teaching assistants, and lecturers. Using the CNA indices generated by the ReaderBench framework, the clustering algorithm and the grouping by role, multiple views are designed to highlight the interactions between participants, including weekly sociograms to depict the evolution of the community from week to week. 2.4

Interactive Visualizations and Course Insights

To model the interaction between participants and to observe the evolution of the entire community from one week to the next, we used three types of views from d3.js library (https://d3js.org/): force-directed graph, clustered force layout, and hierarchical edge bundling. The web application was built using Angular 6 framework (https://angular.io/). The project was generated using Angular CLI (https://cli.angular.io/), while d3.js library was imported in order to generate the views. In order to generate the views, we used the CNA indices computed by the ReaderBench framework. Because the processing of XML files takes some time (about

178

M.-D. Dascalu et al.

15 min – it depends of the number of XML files, which are the discussion threads, and the number of conversations within a discussion thread, as well as their length), we stored the results in Elasticsearch (https://www.elastic.co/). Four Elasticsearch indices were used to store the data for the three types of views and the participants. The first view, force-directed-graph (https://observablehq.com/@d3/force-directed-graph) is represented by a graph in which the nodes represent the participants, while the edges are the messages exchanged between participants. Second, the clustered force layout view (https://bl.ocks.org/mbostock/1747543) is based on two main forces applied on the nodes, namely cluster and collide. Third, the hierarchical edge bundling view (https://www.d3-graph-gallery.com/bundle) shows the adjacent connections between members from the clustered community. In addition, weekly sociograms were generated in order to analyze how the community evolves from one week to the next. Besides these three types of views to illustrate the interaction patterns between participants, the membership of each participant in a cluster and the role of each participant, we performed a longitudinal analysis based on overall CNA participation in order to evaluate involvement during the course. Also, we examined which topics are the most frequent in the discussions from the Moodle and we represented the most frequently discussed topics using a concept map.

3 Results 3.1

Predicting Course Grades

CNA features and general textual complexity indices were extracted from the merged contributions of the 71 selected students. For the experiments, 20 randomly chosen students were left out to comprise a test set. A 5-fold cross-validation was performed on the remaining 51 students to train the best hyper-parameters for the models using a grid search. The predicted scores are on a 6-point scale reflecting students’ activity throughout the semester before taking the final exam. However, the previous features needed to be filtered before applying a regression algorithm because the number of features for each student (247) is significantly larger than the number of students. The first step was to remove all features that had a very small variance, i.e., with more than 80% of values being identical. For the remaining features, a Principal Component Analysis (PCA) is applied, with the number of output dimensions chosen by grid search. Table 1 presents the mean absolute error (MAE) and mean squared error (MSE) on the test partition for the three selected regression algorithms from Scikit-Learn (https:// scikit-learn.org/stable/supervised_learning.html#supervised-learning). In the case of Support Vector Regression (SVR), the kernel function was varied during grid search, while for the Multi-Layer Perceptron (MLP), we varied the hidden layer size, activation function, optimizer, and regularization weight. CNA indices were grouped using PCA and the components that explained the most variance (32 with at least 1%, and top 8) were used in follow-up regression models. The best model was obtained using SVR, with a MAE of 0.44 and MSE of 0.27. Taking into account that the partial course grades were on a 6-point scale, the model is capable of accurately predicting student

Cohesion Network Analysis

179

grades if they have sufficient posts in Moodle. MLP has lower accuracy and we can observe a substantial increase in MSE. Similarly, the Bayesian regression exhibited larger MAE scores and a drastic increase in MSE. Table 1. Regression results for predicting partial grades. Model # PCA components MAE MSE SVR-RBF 32 0.4438 0.2714 MLP-1 8 0.4828 0.3931 Bayesian 32 0.5358 0.5749

3.2

Interactive Visualizations

Three types of views are designed to model the interactions between students, teaching assistants, and lecturers. The first view, the force-directed graph, is depicted by a graph in which the nodes represent the participants (students, teaching assistants, and lecturers), while the edges are the messages exchanged between participants. The size of the nodes is directly proportional with the average score of indegree and outdegree. Figure 2a presents the community during the 4th week of classes (just before the first homework deadline) displayed as a force-directed graph, where the nodes are filled with a color depending on their role (dark grey – lecturer, light grey – teaching assistant, white - student). Each node’s border is colored by the cluster to which it belongs (blue – central cluster, green – active cluster, orange – peripheral cluster). The teaching assistants are most highly involved in the conversations and are central members (nodes with the largest radius) because they are responsible for lab assignments and homework; TA2 (i.e., teaching assistant number 2) has most contributions. The second view, the clustered force layout, is based on two main forces applied on the nodes which represent course participants: cluster – push the nodes towards the largest node of the same color, collide - prevents the overlapping of circles by detecting collisions. Figure 2b presents the community during the 12th week (last assignment) where the nodes are colored depending on their role (dark grey – lecturer, light grey – teaching assistant, white - student), the border color represents the cluster in which each participant is assigned (blue – central cluster, green – active cluster, orange – peripheral cluster), and the size of the node is directly proportional with the contribution number. Besides two teaching assistants, lecturer 1 also had many contributions and was a central member in the community, while some students were actively involved in the conversations and were central/active members (i.e., STUD 16, 29 and 46 depicted as white nodes with the largest diameter). The clustered force layout view arranges the nodes depending on their group: central nodes are in the middle, surrounded by the active nodes, which are who are also surrounded by peripheral members. From this view, it is noticeable that lecturers and teaching assistants are in the middle of the community, together with some students who are most active. The third view, hierarchical edge bundling, enables the visualization of adjacent connections between members from the clustered community. This type of view is based on bundling the adjacent edges to decrease the clutter usually present in complex

180

M.-D. Dascalu et al.

networks. The dependencies between participants are displayed in a radial manner and are grouped into spline bundles, while the participants are grouped in their corresponding cluster (see Fig. 2c which depicts the community throughout the entire course timespan). The participants’ names are colored based on their cluster (blue – central, green – active, orange - peripheral), and the styling depends on their role (lecturer – bold and greater font-size, teaching assistant – italic and bold, student – normal font). On a mouseover event over each participant’s name, the corresponding links are highlighted: incoming links or reply-to messages in dark blue, while outgoing links are colored in red. From this view we can also observe different involvement patterns per role (e.g., Lecturer1 is by far the most active tutor; Lecturer2 is active; Lecturer3 is peripheral, whereas Lecturer4 did not produce enough text contributions in the forum to be considered within our automated analyses).

(a)

(b)

(c) Fig. 2. (a) Force-directed graph view depicting the interactions from the entire course. (b) Clustered force layout view. (c) Hierarchical edge bundling view. (Color figure online)

Cohesion Network Analysis

181

Weekly sociograms were also generated to observe the evolution of participants from one week to another, as well as the impact of homework deadlines and tests on the involvement of participants in the forum discussions. Throughout our course, students received three tests. An increase of students’ participation was observed in the weeks with tests and homework deadlines. In addition, we generated linear graphics for one or more participants in order to examine the students’ evolution individually. The graphics show that more students post on Moodle and more conversations threads are created during the weeks with homework deadlines or tests, followed by a decrease in the immediately follow-up weeks. Also, there are few students who constantly post on the Moodle. Additionally, a Radar Chart (https://www.d3-graph-gallery.com/spider) was designed to illustrate the number of participant contributions each week of the course. Figure 3 shows the evolution of three different participants over the entire course based on the contributions. In Fig. 3a, the three most interactive participants were selected. Figure 3b presents the number of contributions for a lecturer, teaching assistants, and students (those with most contributions). These visualizations can help instructors observe the variation between students’ contributions in order to compare their activity on Moodle. Based on this analysis, instructors could take actions to stimulate students to become more engaged and participate more in the discussions.

(a)

(b)

Fig. 3. Radar Chart of weekly participation. (a) Three most active students. (b) Lecturer, Student and Teaching Assistant.

4 Conclusions Online collaborative learning environments are increasingly used by both students and teachers. Moodle is a popular online education platform which can be effectively employed to encourage collaborative learning as it is frequently used to make announcements, to request clarifications or to make inquiries on homework and exams.

182

M.-D. Dascalu et al.

Our aim was to introduce an integrated method that predicts student success based on CNA indices and to identify interaction patterns between students, teaching assistants, and lecturers in a Romanian Moodle course. The ReaderBench framework integrates various NLP techniques for Romanian language and was used to generate CNA and textual complexity indices. Three types of views were generated to illustrate the interaction patterns. CNA transcends SNA by accounting for discourse structure and cohesion among posts, not only message counts as considered in SNA. A hierarchical clustering algorithm grouped the members into central, active, and peripheral clusters, wherein each member had one of the three possible roles: lecturer, teaching assistant, or student. The specific role and the identified cluster for each participant are highlighted correspondingly within each view. Weekly snapshots were presented to illustrate participants’ evolution and to correlate different events, such as homework deadlines with students’ participation. The results showed that teaching assistants and lecturers are central members in the community, while students who are more involved in the conversations (i.e., are assigned to the central cluster) were more likely to earn high grades for their activities throughout the semester. One important consideration regards an intrinsic limitation of our approach. Specifically, this study only examines forum discussions, which limits the analyses to those students who participate within the forums, and there are good students who do not make any posts. Thus, in future work, we plan to integrate additional features derived from click-stream data stored in the Moodle logs. Moreover, we strive to integrate these learning analytics tools in Moodle as a plugin, and to provide timely feedback to lecturers about students’ involvement in the course, together with potential actions to maximize their chances of passing or obtaining a better grade – e.g., generate warnings like “Your forum activity is quite low, you should be more involved in the ongoing discussions and you should interact more with your peers”. Specific elements (e.g., type, size, or subject of the course, language formality) are important when it comes to integrating our CNA method as a Moodle plugin; thus, custom configurations and specific semantic models will be key components. To our knowledge, this is the first experiment on a Romanian MOOC and this study introduces our endeavors for adapting the ReaderBench framework to accommodate a new language and a new data source (i.e., the Moodle export), together with novel visualizations, namely the radar charts. Moreover, this is the only initiative, of which we are aware, to create a Moodle plugin that considers student discussions and processes them using advanced Natural Language Processing techniques to predict course outcomes. Moreover, this study extends this approach beyond English Language. The ReaderBench framework supports multiple languages (e.g., English, French, Romanian, Dutch, Spanish, and partially Italian). Hence, this method can be easily adapted to support multiple languages, thus ensuring a multilingual approach. Acknowledgments. The work was funded by the Operational Programme Human Capital of the Ministry of European Funds through the Financial Agreement 51675/09.07.2019, SMIS code 125125, by a grant of the Romanian Ministry of Research and Innovation, CCCDI - UEFISCDI

Cohesion Network Analysis

183

project number PN-III-P1-1.2-PCCDI-2017-0689/“Revitalizing Libraries and Cultural Heritage through Advanced Technologies”. This research was also supported in part by Office of Naval Research (Grants: N00014-17-1-2300 and N00014-19-1-2424). Opinions, conclusions, or recommendations do not necessarily reflect the view of the Office of Naval Research.

References 1. Cole, J., Foster, H.: Using Moodle: Teaching with the Popular Open Source Course Management System. O’Reilly Media Inc, Sebastopol (2007) 2. Dougiamas, M., Taylor, P.: Moodle: using learning communities to create an open source course management system. In: EdMedia+Innovate Learning, pp. 171–178. Association for the Advancement of Computing in Education (AACE) (2003) 3. Escobar-Rodriguez, T., Monge-Lozano, P.: The acceptance of Moodle technology by business administration students. Comput. Educ. 58(4), 1085–1093 (2012) 4. Costa, C., Alvelos, H., Teixeira, L.: The use of Moodle e-learning platform: a study in a Portuguese University. Proc. Technol. 5, 334–343 (2012) 5. Sirbu, M.-D., et al.: Exploring online course sociograms using cohesion network analysis. In: Penstein Rosé, C., et al. (eds.) AIED 2018. LNCS (LNAI), vol. 10948, pp. 337–342. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93846-2_63 6. Sirbu, M.-D., Dascalu, M., Crossley, S., McNamara, D.S., Trausan-Matu, S.: Longitudinal analysis of participation in online courses powered by cohesion network analysis. In: 13th International Conference on Computer-Supported Collaborative Learning (CSCL 2019), pp. 640–643. ISLS, Lyon, France (2019) 7. Dascalu, M., McNamara, D.S., Trausan-Matu, S., Allen, L.K.: Cohesion network analysis of CSCL participation. Behav. Res. Methods 50(2), 604–619 (2018) 8. Jiang, S., Williams, A., Schenke, K., Warschauer, M., O’dowd, D.: Predicting MOOC performance with week 1 behavior. In: 7th International Conference on Educational Data Mining, pp. 273–275. International Educational Data Mining Society, London, UK (2014) 9. Romero, C., López, M.-I., Luna, J.-M., Ventura, S.: Predicting students’ final performance from participation in on-line discussion forums. Comput. Educ. 68, 458–472 (2013) 10. Shahiri, A.M., Husain, W.: A review on predicting student’s performance using data mining techniques. Proc. Comput. Sci. 72, 414–422 (2015) 11. Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997) 12. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representation in vector space. In: Workshop at ICLR, Scottsdale, AZ (2013) 13. Scott, J.: Social Network Analysis. SAGE Publications Ltd., Los Angeles, London and Thousand Oaks, (2012) 14. Nistor, N., Panaite, M., Dascalu, M., Trausan-Matu, S.: Identifying socio-cognitive structures in online knowledge communities (OKCs) using cohesion network analysis. In: 9th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2017). IEEE, Timisoara, Romania (2017) 15. Murtagh, F., Legendre, P.: Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? J. Classif. 31(3), 274–295 (2014)

A Study on the Factors Influencing the Participation of Face-to-Face Discussion and Online Synchronous Discussion in Class Lixin Zhao1(&)

, Xiaoxia Shen2, Wu-Yuin Hwang1, and Timothy K. Shih1

1

National Central University, Zhongli, Taoyuan, Taiwan [email protected], [email protected], [email protected] 2 City University of Hong Kong, Kowloon, Hong Kong [email protected]

Abstract. Discussion is an effective way of improving teaching quality. Faceto-face discussion and online discussion have their own advantages and disadvantages respectively, but they both face the challenge that how to enhance students’ participation. There are many related studies on factors affecting students’ participation, most of which are around external factors, while studies on internal factors are fewer. Spiral of Silence (SOS) theory can be applied for explaining the factors affecting people engage in public discussion, and this study apply extended SOS framework for analyzing psychological factors that affect students’ participation willingness and outspoken willingness in face-toface discussion and online synchronous discussion. The result shows that communication apprehension can significantly hinder participation, and fear of isolation can significantly reduce outspokenness willingness. Online discussion will cause less communication apprehension and more participation willingness and outspoken willingness. Contemporary, Chinese university students have the courage to face the competition and are not afraid of expressing minority opinions, but they may fear the authority of the government and their teachers. Keywords: Class discussion  Face to face discussion  Online discussion Participation willingness  Outspoken willingness  Spiral of Silence



1 Class Discussion Discussion has been proved to be an effective method that can improve the quality of teaching [1]. The advantages of classroom discussion are that it can improve a series of abilities of students, such as critical thinking, self-awareness, tolerance of multiple perspectives, ability to act and develop high-level cognitive skills [2]. There are three main forms of discussion learning: face-to-face discussion in classroom, online asynchronous discussion, mainly used outside classroom, and online synchronous discussion, mainly used in classroom [3]. One challenge for face-to-face discussion is only a few students participate in discussion [4]. In contrast, online discussion based on technology not only can dramatically increase participation and make students © Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 184–195, 2020. https://doi.org/10.1007/978-3-030-49663-0_22

A Study on the Factors Influencing the Participation

185

participate in the discussion more equally [5], but also be able to improve critical analysis ability and reflective competence and promote the construction of social knowledge [6]. However, face-to-face discussion still has some advantages that online discussion cannot replace. Leasure, Davis and Thievon [7] found that face-to-face discussion provides learners with direct interaction and feedback, so it can cultivate more meaningful learning experience. Some researchers indicated that face-to-face discussion creates a more comfortable environment for learners [8]. Many students prefer to participate in face-to-face discussions [9]. No matter what kind of discussion forms, it is faced with the challenge that how to enhance students’ participation and improve discussion quality. A large number of studies attempted analyze the factors that affect participation and quality of discussion in order to put forward better implementation strategies. Aziz, Quraishi and Kazi [10] divided the interfering factors into internal factors and external factors through literature review. There are many researches on external factors, such as: classroom environment, curriculum, age, characteristics and behaviors of teachers and students, etc. [10]. By contrast, there are fewer researches on the internal factors that affect the participation in class discussion, especially those related to the psychological factors of students. Aziz et al. [10] summed up the internal factors as academic motivation, interest, physical and learning disabilities, inclusion, capabilities, previous knowledge, and then analyzed the influence of three of them - academic motivation, fear, and selfesteem on the participation of students of different genders. Weiser’s [11] study on the influence of students’ personality traits on class participation shows that the participation of extroverted students is higher than that of introverted students, while emotional stability neuroticism has no significant influence on class participation.

2 Spiral of Silence Noelle-Neumann [12, 13] proposed Spiral of Silence (SOS) theory in 1974, which explains the formation principle of public opinion under the combined action of individual perception of majority opinion and individual fear of isolation. That is, individuals will evaluate the majority opinion on a controversial topic, and if the majority opinion is consistent with their own opinion, they tend to express it frankly; and if inconsistent, they will be afraid of isolation, fear of being evaded and being treated differently, so they tend to remain silent, or change their opinions to seek consensus. Consequently, one opinion dominates the mainstream and minority opinions are less expressed. When Chen and Looi [14] explained the findings, that is, the reasons why online discussion has higher participation and more speeches than face-toface discussion, it is believed that SOS theory and cultural differences may be an explanation, but there is no research to prove this hypothesis. Previous studies have also found that psychological factors such as fear and introversion lead to students’ tendency not to participate in class discussion [10, 11]. Salmon and Kline [15] criticized the conceptual and empirical grounds of the SOS theory. One of the main drawbacks of SOS theory is that fear of isolation and perception of opinion congruence can merely explain a small number of factors that affect the willingness to express [16]. Some studies have shown that there are many factors

186

L. Zhao et al.

other than fear of isolation that affect the public’s willingness to express [17, 18]. In subsequent studies, researchers expanded the framework of SOS theory and integrated other factors that affect the expression willingness such as: communication apprehension, culture, fear of authority, issue salience [19], knowledge [18], interest [20], hard core [21], government surveillance [22], and public types [23].

3 Methods 3.1

Research Objective

This study used an extended framework of Spiral of Silence theory to analyze the interfering factors of classroom discussion participation. The purpose of the study is: 1. Compare behaviors and quality of the two discussion modes; 2. Try to understand the influence of a series of psychological factors, including fear of isolation, communication apprehension, fear of authority, and self-concept caused by cultural factors, on classroom discussion participation and outspokenness willingness; 3. analyze whether there are differences in the influence of these factors on face-to-face discussion and online synchronous discussion using IM software (WeChat Group) in class. 3.2

Experimental Design

The studies of Johnson [24] and Cress and Hesse [25] had proved that students who participate in synchronous discussion are more active than those who participate in asynchronous discussion. Online synchronous discussion is more suitable to be combined with face-to-face teaching activities to carry out mixed teaching [26]. In this study, online discussion is set as synchronous discussions based on Instant Messenger (WeChat Group) in classroom. Kanuka, Rourke and Laflamme [27] indicated that the quality evaluation of discussion in the two modes of WebQuest and debate activities was higher. Therefore, the teaching activities of this study take the form of debate, and all students are encouraged to use smartphones to retrieve relevant materials to support their views during the debate. A quasi-experimental study of single group pre-test post-test design was used in this study. The subjects were freshmen (two classes, 58 in class A, 56 in class B, 114 in total) and senior students (70 in one class) majoring in communication in a university in China. Teaching experiment is divided into two stages. The first stage is in the sixth teaching week, in which all students participate in face-to-face oral class discussion, and then complete the first stage questionnaire. In the second stage of the seventh teaching week, all students use WeChat group to participate in synchronous online classroom discussion, and then complete the second stage questionnaire. Each stage lasts for two classes, totally 90 min. The discussion topics of the two stages are all political related sociological issues that have certain controversy in Chinese society. The first discussion topic was “Do you support boycotting the NBA due to Morey’s inappropriate comments on Twitter?”. The second discussion topic was “Do you support sovereignty over human rights?”. The teacher acts as the host to promote the discussion but does not show his position. In both stages, 154 valid questionnaires were

A Study on the Factors Influencing the Participation

187

collected, of which 98 were from freshmen (23 from boys and 75 from girls), with an effective rate of 86%; 56 from seniors (14 from boys and 42 from girls), with an effective rate of 80%. 3.3

Measures

Participation Willingness and Outspoken Willingness In this study, students’ willingness to participate in the discussion and to express themselves straightforwardly in class were measured by using the nine point scale (from 1 = strongly disagree to 9 = strongly agree) (In this study, all Likert scales used the nine-point scale). The items of willingness to participate in the discussion are “actively participate in the discussion” and “not express any opinions (reversed)” (for face to face: M = 5.594, SD = 1.918, Chronbach’s a = .717; for online: M = 6.552, SD = 1.287, Chronbach’s a = .777). The items of outspokenness are “express their true opinions”, “disagree with the majority but express the opinions that are consistent with the majority” (reversed) “, and” express neutral opinions without expressing their true opinions (reversed) “(for face to face: M = 6.883, SD = 1.327, Chronbach’s a = .675; for online: M = 5.857, SD = 1.699, Chronbach’s a = .742). Perception of Opinion Congruence According to SOS theory, it is assumed that individual perception of opinion congruence will affect their willingness to express [12]. However, Noelle-Neumann [13] indicated that the difference of size of public will have different effects on the willingness of expression. Therefore, in terms of the analysis of the influence of perception of opinion congruence, this study takes the influence of the consensus of the whole society, the classmates, the friends among classmates and the teacher in the class into account. The students were asked “What’s your attitude towards Topic X?” and “What do you think of the attitudes of Chinese society (your classmates/your friends/your lecturers) towards Topic X?”. Students’ perception of opinion congruence is expressed by calculating the absolute value of the difference between personal opinion and perception of other people’s opinion [28]. 0 means the attitude is completely consistent, and the larger the value is, the greater the perception gap is. (for face to face: with the society: M = 1.961, SD = 1.800, Max. = 8, Min. = 0; with the classmates: M = 1.714, SD = 1.583, Max. = 8, Min. = 0; with the friends in calss: M = 1.500, SD = 1.556, Max. = 8, Min. = 0; with the teacher: M = 1.487, SD = 1.320, Max. = 6, Min. = 0; for online: with the society: M = 2.052, SD = 1.906, Max. = 8, Min. = 0; with the classmates: M = 1.922, SD = 1.690, Max. = 8, Min. = 0; with the friends in class: M = 1.662, SD = 1.573, Max. = 8, Min. = 0; with the teacher: M = 1.714, SD = 1.385, Max. = 7, Min. = 0). Fear of Isolation Fear of isolation is used to measure the degree to which people are worried about being isolated, avoided and treated differently when expressing different opinions. The scale of fear of isolation comes from Lee, Oshita and Hove [23], including three items. (for face to face: M = 2.991, SD = 1.448, Chronbach’s a = .778; for online: M = 2.900, SD = 1.360, Chronbach’s a = .869).

188

L. Zhao et al.

Communication Apprehension Some studies have shown that communication apprehension is also a psychological factor that significantly affects expression willingness [19]. Psychological research shows that oral expression is not easy for many people, and some people may be afraid of expressing their opinions simply because they are afraid of speaking in public, which has nothing to do with the fear of isolation [29]. Similarly, Reda [30] pointed out that participating in expression in learning is not a simple thing for many students. The items for measuring communication apprehension come from the PRCA-developed by McCroskey, used by Willnat et al. [19] and Neuwirth et al. [31]. Only seven items related to public discussion are selected, and appropriate modifications are made for class discussion (for face to face: M = 5.085, SD = 1.553, Chronbach’s a = .899; for online: M = 3.994, SD = 1.577, Chronbach’s a = .941). Independent and Interdependent Self-concept McCroskey and Richmond [29] indicated that to outspeak in some countries is considered to be competent, friendly and intelligent, while in other countries, outspokenness is considered impolite. Individualism and collectivism are the most used terms to define this cultural difference [32]. When these cultural concepts are applied to individual self-concepts, they are interpreted as independence and interdependence [32, 33], which are considered to be important factors affecting the willingness to express [34]. Therefore, independent and interdependent self-concept measurements can be used to explain differences in straightforward expressions in different individuals, contexts, and cultures [19]. The measurement scales come from Gudykuns et al. [35] and Oetzel and Bolton-Oetzel [36]. Seven items are used to measure independent selfconcept. (M = 7.091, SD = 1.364, Chronbach’s a = .863). Six items are used to measure interdependent self-concept. (M = 6.068, SD = 1.226, Chronbach’s a = .737). Fear of Authority and Fear of Teacher People may also be afraid to express their opinions for fear of authority. This study takes into account not only the participants’ fear of authority (the government and the university officials), but also the students’ worries about the teacher. The questionnaires refer to the self-made questionnaires of Willnat et al. [19]. For those participants fearing authority, they think “Different views from the government or the university may get me into trouble”, “I will worry about being monitored by the government or the university” and “the government and the university may make my living conditions worse” (for face to face: M = 5.349, SD = 1.702, Chronbach’s a = .769; for online: M = 4.799, SD = 1.749, Chronbach’s a = .840); for fear of teachers, they think “I will worry that my opinion is not consistent with that of the teacher”, “I will worry about leaving a bad impression on the teacher”, “I am worried that my speech may affect the teacher’s evaluation of me.” (for face to face: M = 4.080, SD = 1.724, Chronbach’s a = .789; for online: M = 3.831, SD = 1.779, Chronbach’s a = .915).

A Study on the Factors Influencing the Participation

189

4 Results 4.1

Learners’ Participation in Class Discussion

In the class discussion, the teacher recorded the number of students who participated in the discussion, the number of speeches and the contents. The statistical results are shown in Table 1. Among them, “number of participants” refers to the number of students who have ever stood up to speak (face-to-face discussion) or posted messages (online discussion); “number of speeches” refers to the total number of stand-up speeches (face-to-face discussion) or the total number of messages posted (online discussion). Effective speeches refer to the statements that include complete views. Invalid speeches refer to the speeches that simply express “good” or “objection” (faceto-face discussion) or emoticons, exclamation marks, “good”, etc. (online discussion).

Table 1. The performance in class discussion Face to face discussion Number of Number of Number valid participants of speeches speeches Freshmen A 20 37 36 Freshmen B 18 38 34 Senior 16 24 24 Total 54 99 94

Online discussion Number of Number participants of speeches 48 167 49 201 58 90 155 458

Number of valid speeches 85 88 70 243

In terms of quantity, there are 54 students participating in the face-to-face discussion (Percentage of participants in the discussion is 54/184 = 29%), and the total number of speeches is 99 (Percentage of valid speeches is 94/99 = 95%). There are 155 students (Percentage of participants in the discussion is 155/184 = 84%) participating in the online discussion, and the total number of speeches is 458 (Percentage of valid speeches is 243/458 = 53%). Above findings show that most students participate in online discussion, while only a small number of students participate in face-to-face discussion. In terms of the contents, the overwhelming majority of face-to-face speeches are effective, and the students will express their support, opposition, doubt and other attitudes by sighing, clapping, or gestures. The students’ attitudes towards participants in online discussion are presented by posting emoticons that represent different meanings (support, opposition, irony, doubt, etc.).

190

4.2

L. Zhao et al.

Spiral of Silence and Influence Factors of Participation Willingness and Outspoken Willingness

Taking the students’ participation willingness and outspoken willingness in discussion as dependent variables, four separate hierarchical regression analyses were made for the two class discussion modes. The results are shown in Table 2. Table 2. The influence factors of participation willingness and outspoken willingness in classroom discussion. Predictors

Standardized coefficients beta Face to face discussion Participation Outspoken willingness willingness

Block 1 controls Gender −.061 −.110 Grade −.158* −.117 Incremental R2 .184 .131 Block 2: Independent .024 .129 Interdependent −.079 −.051 Incremental R2 .020 .052 Block 3: Fear of Isolation .007 −.386*** Communication .565*** .156* Apprehension Fear of Authority −.096 .031 Fear of Teachers .045 .039 Incremental R2 .280 .185 Block 4: POC (Society) .048 .047 POC (Classmates) −.076 −.004 POC (Friends) .164* .144 POC (Teachers) .043 .054 Incremental R2 .022 .027 Total R2 .507 .394 Adjusted R2 .465 .343 Note. POC = Perception of Opinion Congruence *p < .05, **p < .01, ***p < .001

Online discussion Participation Outspoken willingness willingness −.039 −.070 .145

−.117 −.043 .118

.059 −.051 .011

.054 −.115 .036

−.038 −.636***

−.397*** −.197*

−.167* −.102 .348

−.175* −.191* .317

−.054 .020 .015 .127 .016 .520 .480

−.115 .172 .017 .123 .043 .514 .473

Regression analysis shows that fear of isolation has no effect on the students’ participation willingness in the two modes of classroom discussion, but it will significantly reduce their outspoken willingness. The result of perception of opinion congruence analysis indicates that the consistency of perceptual opinions on the four

A Study on the Factors Influencing the Participation

191

levels all does not affect outspoken willingness. On the contrary, the disagreement with friends can slightly promote students’ participation willingness in face-to-face discussion, while the consistency of perceptual opinions at other levels has no effect on participation willingness. Similarly, the two variables of independence and interdependence, which represent cultural factors, have no significant influence on the students’ participation willingness and express bluntly both in the two discussion modes. Whether in face-to-face discussion or online discussion, communication apprehension is the main reason that significantly reduces the participation willingness in discussion. At the same time, in terms of the factors that hinder the outspoken willingness, communication apprehension is also an important factor second only to fear of isolation. This shows that the students who are afraid of speaking in public are reluctant to express their opinions publicly in class discussion, and even if they participate in discussion, they have scruples about expressing their true views directly. In face-to-face discussion, the grade has also become a factor affecting the participation willingness, and the participation willingness in the senior grade is lower than that in the freshmen, which is consistent with the statistics in Table 1. From the teacher’s observation in the classroom, the seniors are about to graduate and are not enthusiastic about participating in the course, while the freshmen just entered the university and have a high passion for participating. The discussion through WeChat does not reflect the same difference. In online discussion, concerns about authority will reduce the students’ willingness to express opinions in the WeChat group, and fear of authority and fear of teachers will reduce outspoken willingness in online discussion. However, in face-to-face discussion, fear of authority and fear of teachers have no effect on participation willingness and outspoken willingness. 4.3

Differences Between Face-to-Face Discussion and Online Synchronous Discussion

The paired-samples t test analysis of the participation willingness, the outspoken willingness and the anxiety predispositions variables under the two modes of class discussion has been made. The willingness to participate in online discussion is significantly higher than that in face-to-face discussion (for online: M = 5.857, SD = 1.287; for face to face: M = 5.594, SD = 1.327; t = 2.315, *p < 0.05), and the outspokenness willingness in online discussion is also significantly higher than that in face-to-face discussion (for online: M = 6.883, SD = 1.918; for face to face: M = 6.552, SD = 1.699; t = 3.772, ***p < .001), which shows that online synchronous discussion can markedly improve the participation and quality of classroom discussion compared with face-to-face discussion. The students’ communication apprehension in online discussion is significantly lower than that in face-to-face discussion (for online: M = 3.994, SD = 1.577; for face to face: M = 5.085, SD = 1.553; t = 8.436, ***p < .001), indicating that the students in online discussion are less afraid to speak publicly in class discussion. The above regression analysis demonstrates that communication apprehension has a significant negative impact on the participation willingness, so online discussion is obviously helpful to reduce the students’ fear of participating in discussion due to being afraid of

192

L. Zhao et al.

expressing in public. In addition, according to the variance analysis of participation willingness and the statistical data of the number of participations and speeches, the students are more eager to participate in online discussion and there are more participation behaviors in online discussion, which mean that online discussion can provide the students with more equal opportunities to participate in discussion. In the level of fear of isolation, there is no significant difference between the two discussion modes (for online: M = 2.900, SD = 1.360; for face to face: M = 2.991, SD = 1.448; t = .798, p > .05), which means that fear of isolation has nothing to do with the difference between face-to-face oral expression and posting text messages in WeChat. However, the outspoken willingness in face-to-face discussion is lower than that in online discussion, which indicates that although the students tend to reduce their outspoken willingness for fear of isolation in both modes, the factors that lead to lower outspoken willingness in face-to-face discussion are not fear of isolation, but communication apprehension, fear of authority, fear of teachers or other factors. Whether fear of authority (for online: M = 4.799, SD = 1.749; for face to face: M = 5.349, SD = 1.702; t = 4.83, ***p < .001) or fear of teachers (for online: M = 3.831, SD = 1.780; for face to face: M = 4.080, SD = 1.724; t = 2.19, *p < .05), these two interfering factors are significantly lower in the online discussion than that in the face-to-face discussions. But according to regression analysis, both of these two factors will affect the outspoken willingness in online discussion. Specifically, the more afraid of authority and the teacher, the less willing students are to express themselves frankly in online discussion, and fear of authority will even reduce the willingness to participate in online discussion. The reason may be because some students feel that they have left fixed written evidence in the WeChat group discussion, and students will worry about the government’s online monitoring or teachers’ looking back afterwards will have a negative impact on themselves. In contrast, the face-to-face discussion are limited in the classroom and are not subject to official monitoring, and oral statements are fleeting, and there is no need to worry about punishment from the teacher, so the two factors will not affect students’ participation willingness and outspoken willingness.

5 Conclusion Based on the extended framework of Spiral of Silence theory and Quasi experimental method, this study aims to explore some internal psychological factors that affect students’ participation willingness and outspoken willingness in two classroom discussion modes: face-to-face discussion and online synchronous discussion. The experimental results show that the students tend not to express their real views in faceto-face and online discussion modes for fear of isolation, but it has nothing to do with perception of opinion congruence. It has been shown that the students are not afraid of expressing minority opinions in public. On the contrary, in face-to-face discussion, the more they disagree with their friends, the more willing they are to participate in the discussion. Some scholars believe that the traditional Confucian culture in East Asia will lead to individual collectivism, weaken individual autonomy, and lead individuals to be cautious in expressing their personal opinions and emotions [34]. But this study

A Study on the Factors Influencing the Participation

193

shows that contemporary Chinese college students have higher independence, and the characteristics of collectivism culture have been greatly weakened. They do not care much about the opinions of the society and those unfamiliar classmates. Instead, they keep in close contact with a small number of classmates who are in the friend relationship. And contrary to the “hard core” theory [15], they do not necessarily need their friends to support themselves in public opinion but are willing to argue with their friends. This shows that today’s Chinese college students are not afraid of challenges and competition, nor will they reject their friends because of differences of opinions. Therefore, it has been proved that debate discussion is very suitable for class discussion and is conducive to enhancing participation. Communication apprehension has been found to be a very important factor hindering classroom discussion participation and straightforward expression. Students tend not to participate in class discussion because they are afraid of speaking in public, and even if they do, they tend not to express themselves bluntly, whether face-to-face or online. Nevertheless, online synchronous discussion based on WeChat group can significantly reduce communication apprehension. Therefore, online discussion is significantly helpful in overcoming communication apprehension and promoting participation and straightforward expression. As for how to overcome communication apprehension and improve participation in class discussion, some researchers have adopted the method of Cold call (i.e. called on students whose hands were not raised), which has been proved to be effective [2], but whether this enforced approach can help students overcome communication apprehension is still worth discussing. Concerns about officials will not only reduce outspoken willingness in online discussion, but also obstacle to participate in online discussion, indicating that topics that have nothing to do with politics may attract students to participate. In addition, the teacher’s opinion will affect students’ outspoken willingness in online discussion. Therefore, teachers should not only encourage participation, be friendly and give positive feedback [37], but also be neutral in debating discussions so as not to put pressure on students. There are some limitations in this study. Firstly, the results show a significant negative correlation between the fear of isolation and the outspoken willingness, but the perception of consensus does not affect the outspoken willingness, which is not completely consistent with the hypothesis of Spiral of Silence theory. For what causes fear of isolation and then affects outspoken willingness, this study fails to give an explanation, which needs further research in the future. Secondly, the purpose of this study is to analyze the internal psychological factors, so some external factors that affect participation and outspokenness are not taken into account. Thirdly, limited by the teaching requirements, this study adopted a quasi-experimental study of single group pre-test post-test design, which might have affected the found results since students can be shy in the first discussion but less shy in the next discussion. A randomly assigned experimental group and control group design can ameliorate this problem. Finally, in terms of internal factors, the experimental results show that the seniors are less willing to participate than the freshmen, indicating that learning motivation may be a key factor. The above limitations can be improved in future research.

194

L. Zhao et al.

References 1. Ramsden, P.: Learning to Teach in Higher Education. Routledge, Abingdon (2003) 2. Dallimore, E.J., Hertenstein, J.H., Platt, M.B.: Classroom participation and discussion effectiveness: student-generated strategies. Commun. Educ. 53(1) (2004) 3. Hou, H.T., Wu, S.Y.: Analyzing the social knowledge construction behavioral patterns of an online synchronous collaborative discussion instructional activity using an instant messaging tool: a case study. Comput. Educ. 57(2), 1459–1468 (2011) 4. Caspi, A., Chajut, E., Saporta, K., Beyth-Marom, R.: The influence of personality on social participation in learning environments. Learn. Individ. Differ. 16(2), 129–144 (2006) 5. Fjermestad, J., Hiltz, S.R., Zhang, Y.: Effectiveness for Students: Comparisons of “in-seat” and ALN Courses. Learning Together Online: Research on Asynchronous Learning Networks, pp. 39–80 (2005) 6. Warschauer, M.: Comparing face-to-face and electronic discussion in the second language classroom. CALICO J. 13, 7–26 (1995) 7. Leasure, A.R., Davis, L., Thievon, A.L.: Comparison of student outcomes and preferences in a traditional vs. world wide web-based baccalaureate nursing research course. J. Nurs. Educ. 29(4), 149–154 (2000) 8. Kim, J.B., Derry, S.J., Steinkuehler, C.A., Street, J.P., Watson, J.: Web-based online collaborative learning. In: Paper at the American Educational Research Association Annual Meeting, New Orleans, Louisiana (2000). http://wwww.wcer.wisc.edu/step/documents/olc3/ olc3abstract.html 9. Tiene, D.: Online discussions: a survey of advantages and disadvantages compared to faceto-face discussions. J. Educ. Multimed. Hypermedia 9(4), 369–382 (2000) 10. Aziz, F., Quraishi, U., Kazi, A.S.: Factors behind classroom participation of secondary school students (A Gender Based Analysis). Univ. J. Educ. Res. 6(2), 211–217 (2018) 11. Weiser, O., Blau, I., Eshet-Alkalai, Y.: How do medium naturalness, teaching-learning interactions and Students’ personality traits affect participation in synchronous e-learning? Internet High. Educ. 37, 40–51 (2018) 12. Noelle-Neumann, E.: The spiral of silence a theory of public opinion. J. Commun. 24(2), 43–51 (1974) 13. Noelle-Neumann, E.: Are we asking the right questions? Developing measurement from theory: the influence of the spiral of silence on media effects research. Mass Communication Research: On problems and policies, pp. 97-120 (1994) 14. Chen, W., Looi, C.K.: Incorporating online discussion in face to face classroom learning: a new blended learning approach. Australas. J. Educ. Technol. 23(3) (2007) 15. Salmon, C.T., Kline, F.G.: The Spiral of Silence Ten Years Later: An Examination and Evaluation (1983) 16. Glynn, C.J., Park, E.: Reference groups, opinion intensity, and public opinion expression. Int. J. Publ. Opinion Res. 9(3), 213–232 (1997) 17. Lasorsa, D.L.: Political outspokenness: factors working against the spiral of silence. J. Q. 68 (1–2), 131–140 (1991) 18. Salmon, C.T., Neuwirth, K.: Perceptions of opinion “climates” and willingness to discuss the issue of abortion. J. Q. 67(3), 567–577 (1990) 19. Willnat, L., Lee, W., Detenber, B.H.: Individual-level predictors of public outspokenness: a test of the spiral of silence theory in Singapore. Int. J. Publ. Opinion Res. 14(4), 391–412 (2002) 20. Shamir, J.: Speaking up and silencing out in face of a changing climate of opinion. J. Mass Commun. Q. 74(3), 602–614 (1997)

A Study on the Factors Influencing the Participation

195

21. Miyata, K., Yamamoto, H., Ogawa, Y.: What affects the spiral of silence and the hard core on Twitter? An analysis of the nuclear power issue in Japan. Am. Behav. Sci. 59(9), 1129– 1141 (2015) 22. Stoycheff, E.: Under surveillance: examining Facebook’s spiral of silence effects in the wake of NSA internet monitoring. J. Mass Commun. Q. 93(2), 296–311 (2016) 23. Lee, H., Oshita, T., Oh, H.J., Hove, T.: When do people speak out? Integrating the spiral of silence and the situational theory of problem solving. J. Publ. Relat. Res. 26(3), 185–199 (2014) 24. Johnson, G.: The relative learning benefits of synchronous and asynchronous text-based discussion. Br. J. Educ. Technol. 39(1), 166–169 (2008) 25. Cress, U., Kimmerle, J., Hesse, F.W.: Impact of temporal extension, synchronicity, and group size on computer-supported information exchange. Comput. Hum. Behav. 25(3), 731– 737 (2009) 26. Asterhan, C.S., Eisenmann, T.: Introducing synchronous e-discussion tools in co-located classrooms: a study on the experiences of ‘active’ and ‘silent’ secondary school students. Comput. Hum. Behav. 27(6), 2169–2177 (2011) 27. Kanuka, H., Rourke, L., Laflamme, E.: The influence of instructional methods on the quality of online discussion. Br. J. Educ. Technol. 38(2), 260–271 (2007) 28. Liu, X., Fahmy, S.: Exploring the spiral of silence in the virtual world: lndividuals’ willingness to express personal opinions in online versus offline settings. J. Media Commun. Stud. 3(2), 45 (2011) 29. McCroskey, J.C., Richmond, V.P.: Willingness to communicate: differing cultural perspectives. South. J. Commun. 56(1), 72–77 (1990) 30. Reda, M.M.: Between Speaking and Silence: A Study of Quiet Students. SUNY Press, Albany (2009) 31. Neuwirth, K., Frederick, E., Mayo, C.: The spiral of silence and fear of isolation. J. Commun. 57(3), 450–468 (2007) 32. Triandis, H.C.: Individualism and Collectivism. Routledge, Abingdon (2018) 33. Kim, M.S., Hunter, J.E., Miyahara, A., Horvath, A.M., Bresnahan, M., Yoon, H.J.: Individual- vs. culture-level dimensions of individualism and collectivism: effects on preferred conversational styles. Commun. Monogr. 63(1), 29–49 (1996) 34. Scheufle, D.A., Moy, P.: Twenty-five years of the spiral of silence: a conceptual review and empirical outlook. Int. J. Publ. Opinion Res. 12(1), 3–28 (2000) 35. Gudykunst, W.B., Matsumoto, Y., Ting-Toomey, S., Nishida, T., Kim, K., Heyman, S.: The influence of cultural individualism-collectivism, self construals, and individual values on communication styles across cultures. Hum. Commun. Res. 22(4), 510–543 (1996) 36. Oetzel, J.G., Bolton-Oetzel, K.: Exploring the relationship between self-construal and dimensions of group effectiveness. Manage. Commun. Q. 10(3), 289–315 (1997) 37. Mustapha, S.M., Rahman, N.S.N.A., Yunus, M.M.: Factors influencing classroom participation: a case study of Malaysian undergraduate students. Proc.-Soc. Behav. Sci. 9, 1079–1084 (2010)

Applying Genetic Algorithms for Recommending Adequate Competitors in Mobile Game-Based Learning Environments Akrivi Krouska, Christos Troussas(&), and Cleo Sgouropoulou Department of Informatics and Computer Engineering, University of West Attica, Egaleo, Greece {akrouska,ctrouss,csgouro}@uniwa.gr

Abstract. Mobile game-based learning (MGBL) exploits an entertaining environment for providing digital education. Such an approach involves the construction of students’ groups for gaming towards advancing their knowledge. However, building adequate groups has important pedagogical implications, since the recommendation of appropriate collaborators could further enhance the students’ cognitive abilities. Towards this direction, this paper presents a MGBL application for the tutoring of computer programming. In this application, the system recommends to each student four peers to play with as competitors using genetic algorithm. The genetic algorithm finds the most adequate peers for each student by taking into consideration students’ learning modality, previous knowledge, current knowledge and misconceptions. As such, the student can select from the list one person from the proposed ones, who share common characteristics. The two main reasons why homogeneous groups are chosen to be formed are to promote fair competition and to provide adaptive game content based on players’ characteristics for improving their learning outcomes. Our MGBL application was evaluated using students’ t-test with promising results. Keywords: Game-based learning Mobile learning

 Genetic algorithm  Group formation 

1 Introduction The field of education has been enriched with digital methods in order to be offered to a larger pool of learners through different approaches. A new research era that has been born due to the technological advances is that of mobile game-based learning (MGBL). MGBL amalgamates mobile learning with game-based learning and provides a context where playing and learning happen simultaneously through handheld devices, leading to effective students’ knowledge acquisition. Given that digital education, and specifically MGBL, is offered to a heterogeneous group of learners, there is the emerging need of systems’ adaptation to students’ individualized preferences and needs. Therefore, incorporating intelligent methods, the © Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 196–204, 2020. https://doi.org/10.1007/978-3-030-49663-0_23

Applying Genetic Algorithms for Recommending Adequate Competitors

197

systems can provide individualized presentation of the learning content, adaptive navigation in the learning environment, personalized assessment, etc. However, the personalized recommendation of peers for collaboration is indispensable for enriching MGBL environments [1] where students can form groups and play as either allies; i.e. they have the same goal and help each other towards reaching a more advanced knowledge level, or competitors; i.e. they present great effort to advance their knowledge and attain their learning outcomes. Competition in MGBL has a great pedagogical affordance; however, recommending the most appropriate peer to a student has not been researched adequately and thus, there is scope for a lot of improvement to this direction. Furthermore, in [2] and [3], the studies noted the pedagogical potential of group formation in MGBL taking into consideration the characteristics of students and preserving the self-learning dimension. In [4] and [5], these works explored MGBL in terms of applying intelligence to improve learning effectiveness. These intelligent techniques involved decision making and uncertainty handling. Other intelligent techniques include the genetic algorithms that have been used for group formation [6, 7]. In [8] and [9], the authors do not focus on MGBL, but they use genetic algorithms to form homogeneous groups of learners for collaboration using their learning modality, knowledge level and communicative skills. Concerning the above and MGBL environments, the studies of [10, 11] and [12] noted that adaptivity and personalization have not been provided adequately yet through MGBL systems. This paper presents a novel approach for recommending adequate competitors sharing common learning characteristics in a MGBL environment using a genetic algorithm [13]. The algorithm generates homogeneous groups based on students’ learning preferences, previous knowledge, current knowledge and misconceptions. Therefore, the system proposes to each player the other members of the group as wellmatched competitors. The scope of this functionality is to provide fair competition in game and personalized quiz regarding players’ characteristics for improving their learning outcomes. The genetic algorithm is a proper approach; since it can produce optimal results in short time [14].

2 System Description In this work, a MGBL system for assessing the learners’ knowledge in computer programming through adaptive quiz game is developed. The novelty of this system is that it recommends to students the proper competitors using a genetic algorithm. The genetic algorithm generates adequate groups based on the similarity of group members’ characteristics. As such, students choose for competitor one of the proposed peers, with who they have been grouped by the algorithm. The characteristics used in the group formation are the following: 1. Students’ learning modality determined by an appropriate questionnaire given during registration with the system based on the VARK model (Visual, Auditory, Read/Write and Kinesthetic).

198

A. Krouska et al.

2. Previous knowledge derived from a pre-test given before the first interaction with the game. 3. Current knowledge resulted from the scores achieved in the quizzes. 4. Misconceptions emerged from the wrong answers in quizzes. In essence, it refers to the degree of misconception ranged from 0 to 1; where the low values show minor misconceptions occurred by syntactic mistakes, whereas high values indicate serious misconceptions related to logical mistakes. Syntactic and logical mistakes are the two main misconceptions encountered in computer programming examined by the system. The scope of recommending competitors with common characteristics is to adjust quiz content, such as type of question items and their level of difficulty, to players’ profile helping them to improve their learning outcomes. Moreover, in this way, the fair competition is promoted, since none of the players has an unfair advantage. To accomplish this goal, the genetic algorithm is a proper approach, as it can handle enormous data effectively and produce optimal solutions in short time. Figure 1 illustrates the architecture of the system. This study focuses on the functionality of recommending adequate competitors using genetic algorithm; therefore, the construction of student profile and the adaptation of quiz content are out of this paper’s scope.

Fig. 1. The architecture of MGBL system developed.

3 Recommending Adequate Competitors Using Genetic Algorithm The novelty of this study is that using a genetic algorithm [15], the MGBL system recommends the proper competitors to players having common characteristics, in order to promote fair competition and provide adaptive quizzes based on their needs. In particular, the genetic algorithm is used to find the most suitable co-players for each student by forming homogeneous groups of five members. As such, the system proposes four competitors to each player, none of whom has any advantage against the other.

Applying Genetic Algorithms for Recommending Adequate Competitors

199

The general operation flow of the genetic algorithm is presented in Fig. 2. Firstly, an initial population is generated. The population is composed of a set of chromosomes, which are the feasible solutions. Afterwards, these chromosomes are evaluated using a predefined fitness function and several genetic operators are applied to produce a new population, namely the next generation, based on the natural selection principles. This process aims to preserve the best chromosomes and generate new ones through the bleeding of the most suitable chromosomes’ genes. These steps are repeated until a termination criterion reached.

Fig. 2. General operation flow of genetic algorithm.

3.1

Student Representation

A set of n students S = {s1, s2, …, sn}, with l characteristics each one si = {c1, c2, …, cl}, should be divided into k groups of m members G = {g1, g2, …, gk}. The criteria of this group formation are: each student to be assigned to only one group and the groups to be homogeneous, i.e. all members to have common characteristics. In particular, the number of students is n = 40 and their characteristics are l = 4, namely learning modality, previous knowledge, current knowledge and degree of misconceptions; while the members of each group are m = 5 and hence, k = 8 groups are formed. In student representation, each characteristic is required to be a numerical value and have the same scale for avoiding perturbations when calculating fitness function. In essence, all data range between 0 and 1. Thus, the learning modality is discretized into the following values: Visual = 0, Auditory = 1/3, Read/Write = 2/3 and Kinesthetic = 1. The measure of previous and current knowledge is converted from 100 – scaled score into 0–1 range. Whereas, the degree of misconception is already computed in 0–1 scale. 3.2

Chromosome Encoding

The feasible solutions of the problem are represented by a chromosome-like data structure. In this paper, a matrix approach is used for encoding the chromosome, where the rows correspond to the number of groups and the columns corresponds to the group

200

A. Krouska et al.

size. Each gene, i.e. each item of the matrix, contains a student id. Therefore, the position of the gene in the matrix defines the group to which the student belongs. 3.3

Fitness Function

This study aims to structure homogeneous groups of students, meaning that the group members are similar to each other according to their characteristics. Therefore, it is necessary to define a measure of such homogeneity. In our case, the Manhattan distance metric is used. Thus, the similarity between two students, denoted as si and sj, considering their l characteristics is formulated as: l  X    cip  cjp  Similarity si ; sj ¼ p¼1

Then, the fitness of the group gx and of the chromosome are computed as follows: GroupFitnessðgx Þ ¼ ChromosomeFitness ¼

m 1 X

m X

Similarityðsv ; sz Þ;

v¼1 z¼vþ1 g X

GroupFitnessðgx Þ

x¼1

In order to structure the best homogeneous groups, the value of Chromosome Fitness should be minimized. 3.4

Initial Population and Evolution

The genetic algorithm is started by generating an initial population randomly to ensure the diversity of the population. In this study, the population size is 80. According to Fig. 2, once the initial population is built, the evolution process starts generating new ones by applying the genetic operators. The termination criterion of our genetic algorithm is defined after 100 generations reached. 3.5

Genetic Operators

For generating the new population with fitter chromosomes than previous one, four basic operators are applied in the following order: a. elitism, b. selection, c. crossover, and d. mutation. Firstly, an elitism strategy is used where the best chromosomes are passed to the next generation in order not to loss good information, since it is possible the best chromosomes to disappear due to bleeding. The elitism percentage is set to 0.1. The second operator applied is the selection. This process employs the fitness function to select two chromosomes, called parents, from current population to apply crossover strategies generating the offspring for the next generation. The mechanism used is the roulette wheel selection.

Applying Genetic Algorithms for Recommending Adequate Competitors

201

Once the parents are selected, the crossover operator is applied to clone the parent generations new offspring. This operator is important for introducing new genetic information into the next population. In this study, the 2-point crossover technique is used. The crossover probability was set to 0.7. The last operator applied when the new population is produced is the mutation, which aims to increase the population diversity by introducing random variations into the chromosomes. The type of mutation used in this work is the swap one. The mutation performed with a mutation probability, set to 0.05. 3.6

Examples of Operation

The genetic algorithm developed in this work, generates homogeneous student groups with the intention to provide them personalized recommendations for co-players and adaptive quizzes to players’ profile, namely learning modalities and cognitive skills. Thus, from educational aspect, the learning outcomes can be improved; while, from gaming view, fair competition is promoted. As shown in Table 1, the members of the groups have common characteristics; hence, the system can adapt the quiz to them. For instance, the system recommends to Mary as co-players Peter, Helen, Tony and Kate; so, she chooses to play with Helen. Since, both of them are visual students, the questions’ quiz delivered to them are enriched with images. Moreover, the level of questions’ difficulty is proportional to their knowledge level, namely moderate one; whereas, the questions’ content is adapted to their misconception degree, which indicates that they make mainly syntactic mistakes.

Table 1. Examples of operations Group

Student

1

Mary Peter Helen Tony Kate Nick Harris George Sofia Irene

2

Learning modality Visual Visual Visual Visual Visual Audio Audio Read/Write Read/Write Read/Write

Previous knowledge 72% 69% 75% 73% 70% 85% 82% 86% 80% 81%

Current knowledge 77% 75% 81% 79% 76% 89% 86% 88% 85% 84%

Degree of misconceptions 0.241 0.253 0.237 0.239 0.244 0.159 0.167 0.161 0.172 0.174

202

A. Krouska et al.

4 Evaluation Results and Discussion Evaluation plays a crucial role for preserving the usefulness, as well as the efficacy of functionalities of a software. Therefore, student t-test was used in order to measure the quality and adequacy of the recommendation of peers to students towards playing as competitors in the MGBL environment. For this experiment, 80 students participated. All the students are computer science students in a public university. They are in the second year of their study. The students used our presented system in the contexts of the tutoring of the course “The Java language”, as an additional tool to support the inclass education. They used the system during 3 months. For the experiment, we created two groups of students, Group 1 and Group 2. Both groups included 40 students. The split of students was made by their instructors (2 faculty members in the university) and the evaluators, so that the students in each group share common characteristics, such as age, gender, etc. Group 1 used the system where the competitors were proposed based on the presented genetic algorithm. On the other hand, Group 2 used a similar system without intelligent where the competitors were the 4 random students with current knowledge level near to the student one (±5 points). After the experimental period of 3 months, the two groups were asked to rate the recommendation of peers to select as competitors by the system. It needs to be underlined that this process is dynamic, and the students of the two groups received many recommendations. As such, their answer represents the impact of all the recommendations by the systems. For the experiment, the alpha value was 0.05 and we analyzed the p-values. Based on the results, for the null hypothesis: “There is no difference between the two groups of students” the t-Test rejects the hypothesis for the question “Rate the recommendation of peers by the system”. That means that there is statistical significance and the recommendation to Group 1 is more effective and qualitative than the one given to Group 2 (Table 2). Table 2. t-Test results on recommendation rating t-Test: two-sample assuming equal variances Variable 1 Mean 7,075 Variance 1,353205128 Observations 40 Hypothesized Mean Difference 0 df 70 t Stat 5,09260565 sP(T  t) one-tail 1,42934E−06 t critical one-tail 1,666914479 P(T  t) two-tail 2,85869E−06 t critical two-tail 1,994437112

Variable 2 5,925 0,686538462 40

Applying Genetic Algorithms for Recommending Adequate Competitors

203

The aforementioned results were anticipated. Our approach, holding genetic algorithms for recommending peers to students to play with as competitors, is a novel tool that exploits the ability of genetic algorithms to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection. This approach is showed to have significant potential on students’ grouping and that in turn has great pedagogical affordance. On the contrary, the recommendation that was given to Group 2 lacked intelligence and as such it was not so efficient for the students to participate in adequate teams. The second part of the experiment involved the instructors. As mentioned above, the instructors are two faculty members in the field of Informatics of a public university. The instructors were asked to propose 4 peers for each student of Group 1 that according to their experience could be the most appropriate competitors for him/her to play using the MGBL application. Our presented approach recommends 4 peers to each student, and as such there are 160 recommendations, namely 4 recommendations for 40 students of Group 1 (40*4 = 160). Based on the results of the instructors, we tried to find the matching rate between their proposals and the recommendation of the system. The first instructor gave 152 recommendations that were identical to the ones of the system, meaning that the percentage of success reaches 95%. The second instructor gave 148 recommendations that were the same to the ones of the system, and the percentage of successful matching is 90.6%. These results can further accentuate the adequacy of recommendations of our system using genetic algorithms.

5 Conclusions and Future Work This paper presents a novel approach for recommending adequate competitors to learners using a genetic algorithm in mobile game-based learning environments. According to the related scientific literature, structuring adequate groups has important pedagogical implications, since it could further enhance the students’ cognitive abilities. Hence, our genetic algorithm takes as input the students’ learning modality, their prior and current knowledge level, as well as the misconceptions they made in quizzes, and proposes four competitors with common characteristics. Our system was evaluated by both students and instructors. The results were very encouraging regarding the effectiveness of our approach. Our future work includes a more wide-range evaluation using more tools as well as students in order to further assess the efficacy of our system in terms of its technological and pedagogical affordance. Moreover, it is our future plans to explore the possible incorporation of other machine learning techniques to the one that is used (i.e. genetic algorithms), such as artificial neural networks or clustering analysis in order to construct a hybrid model. Such a model will then be deeply evaluated in order to assess its robustness and effectiveness.

204

A. Krouska et al.

References 1. Chen, C.M., Kuo, C.H.: An optimized group formation scheme to promote collaborative problem-based learning. Comput. Educ. 133, 94–115 (2019) 2. Cahyana, U., Paristiowati, M., Savitri, D.A., Hasyrin, S.N.: Developing and application of mobile game based learning (M-GBL) for high school students performance in chemistry. Eurasia J. Math. Sci. Technol. Educ. 13(10), 7037–7047 (2017) 3. Chen, C.P.: Understanding mobile English-learning gaming adopters in the self-learning market: the uses and gratification expectancy model. Comput. Educ. 126, 217–230 (2018) 4. Melero, J., HernÁndez-Leo, D.: Design and implementation of location-based learning games: four case studies with “QuesTInSitu: the game. IEEE Trans. Emerg. Top. Comput. 5 (1), 84–94 (2016) 5. Yoon, D.M., Kim, K.J.: Challenges and opportunities in game artificial intelligence education using angry birds. IEEE Access 3, 793–804 (2015) 6. Lescano, G., Costaguta, R., Amandi, A.: Genetic algorithm for automatic group formation considering student’s learning styles. In: 2016 8th Euro American Conference on Telematics and Information Systems (EATIS), pp. 1–8, April 2016 7. Moreno, J., Ovalle, D.A., Vicari, R.M.: A genetic algorithm approach for group formation in collaborative learning considering multiple student characteristics. Comput. Educ. 58(1), 560–569 (2012) 8. Tien, H.W., Lin, Y.S., Chang, Y.C., Chu, C.P.: A genetic algorithm-based multiple characteristics grouping strategy for collaborative learning. In: International Conference on Web-Based Learning, pp. 11–22, October 2013 9. Liu, L., Chen, L., Shi, C., Chen, H.: The study of collaborative learning grouping strategy in Intelligent Tutoring System. In: The 2010 14th International Conference on Computer Supported Cooperative Work in Design, pp. 642–646, April 2010 10. Chittaro, L.: Designing serious games for safety education: “Learn to Brace” versus traditional pictorials for aircraft passengers. IEEE Trans. Vis. Comput. Graph. 22(5), 1527– 1539 (2015) 11. Hamari, J., Shernoff, D.J., Rowe, E., Coller, B., Asbell-Clarke, J., Edwards, T.: Challenging games help students learn: an empirical study on engagement, flow and immersion in gamebased learning. Comput. Hum. Behav. 54, 170–179 (2016) 12. Hwang, G.J., Chang, S.C.: Effects of a peer competition-based mobile learning approach on students’ affective domain exhibition in social studies courses. Br. J. Educ. Technol. 47(6), 1217–1231 (2016) 13. Krouska, A., Troussas, C., Virvou, M.: Applying genetic algorithms for student grouping in collaborative learning: a synthetic literature review. Intell. Decis. Technol. 13(4), 395–406 (2019) 14. Krouska, A., Virvou, M.: An enhanced genetic algorithm for heterogeneous group formation based on multi-characteristics in social networking-based learning. IEEE Trans. Learn. Technol. (2019) 15. Falkenauer, E.: The grouping genetic algorithms-widening the scope of the GAs. Belg. J. Oper. Res. Stat. Comput. Sci. 33(1), 2 (1992)

Dynamic Detection of Learning Modalities Using Fuzzy Logic in Students’ Interaction Activities Christos Troussas(&), Akrivi Krouska, and Cleo Sgouropoulou Department of Informatics and Computer Engineering, University of West Attica, Egaleo, Greece {ctrouss,akrouska,csgouro}@uniwa.gr

Abstract. E-learning software is oriented to a heterogeneous group of learners. Thus, such systems need to provide personalization to students’ needs and preferences so that their knowledge acquisition could become more effective. One personalization mechanism is the adaptation to the students’ learning modalities. However, this process requires a lot of time when happening manually and is error-prone. In view of the above, this paper presents a novel technique for learning modalities detection. Our approach utilizes the HoneyMumford model, which classifies students in activists, reflectors, theorists and pragmatists. Furthermore, the automatic detection uses the fuzzy logic technique taking as input the students’ interaction with the learning environment, namely the kind of learning units visited, their type of media, the comments made by students on learning units and their participation in discussions. Our novel technique was incorporated is a tutoring system for learning computer programming and was evaluated with very promising results. Keywords: Automatic detection  Fuzzy logic  Intelligent tutoring system Honey and mumford model  Learning modalities



1 Introduction E-learning can bring considerable changes in the field of education, since the learning material is available to a very large pool of people. However, as this pool becomes bigger, it involves heterogeneous groups of learners. These learners may have a different learning modality, learning pace, needs and preferences. To fill in the gap of heterogeneity, e-learning should be enriched with adaptivity and personalization techniques as a means to provide a student-centric learning experience [1]. This effort triggered the creation of the area of Intelligent Tutoring Systems (ITSs), which enable learning in a meaningful and effective manner by using a variety of computing technologies [2, 3]. The adaptivity and personalization of ITSs include the delivery of personalized learning units to students, the diagnosis of students’ strengths and weaknesses and the

© Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 205–213, 2020. https://doi.org/10.1007/978-3-030-49663-0_24

206

C. Troussas et al.

content adaptation to them, the provision of effective assessment units for better assessing students’ knowledge level, etc. For all these functionalities, the identification of students’ learning modality is a core aspect in education, since the students’ knowledge acquisition can take place more effectively if we preserve the way with which each student prefers to learn [4–6]. In the related scientific literature, many approaches for identifying automatically the students’ learning modalities have been presented. In [7], the review notes that the majority of systems identifying learning modalities mainly use the Felder-Silverman’s model [8] and employ rule -based algorithms and Bayesian networks. Other research efforts employed fuzzy logic and used the Felder-Silverman’s model to predict the learning modalities [9–11]. In [12], the researchers employed fuzzy logic and used the VARK model [13] to automatically detect learning modalities. Summarizing, the Felder-Silverman’s model is the dominant model used for the identification of students’ learning modality based on different characteristics, such as personal and cognitive students’ traits, learning goals and motivation activity. Furthermore, there are many techniques that have been used, including fuzzy logic. In view of the above, this paper presents a novel way for learning modality detection. Our approach applies the fuzzy logic technique to identify the dominant learning modality based on Honey and Mumford model [14], which categorizes learners in activists, reflectors, theorists and pragmatists. These learning modalities are preferred because they fit better to the teaching strategy adopted in the developed system. In particular, the fuzzy logic controller takes as input the students’ interaction with the platform, namely the kind of learning units visited, the type of media these consists of, the comments created by students on learning units and their participation in discussion rooms, and classifies students in one learning modality from the aforementioned ones. The reasoning of this module is that each learning modality represents certain learners’ behaviors that can be explicated by specific interactions and preferences. However, this inference has a degree of truth, in contrast to Boolean logic, making fuzzy logic a proper approach for predicting students’ learning modality. Our presented model was incorporated in an adaptive hypermedia educational system for tutoring computer programming, while its evaluation results show the effectiveness of the system in terms of adaptivity and accuracy of learning modalities detection.

2 System Description In this work, an adaptive hypermedia educational system was developed incorporating intelligence to provide individualized instruction based on students’ preferences in learning. The novelty of this paper is the automatic detection of students’ learning modalities through their interaction with the system using fuzzy logic. Thus, the system categorizes learners based on Honey and Mumford model [14] in activists who learn by doing, reflectors who learn by observing, theorists who learn by analyzing concepts, and pragmatists who learn by applying knowledge.

Dynamic Detection of Learning Modalities Using Fuzzy Logic

207

In order student’s learning modality to be predicted, the system analyzes what kind of learning units the student usually read, of what media type they are made, how many comments s/he usually makes and his/her participation in discussions; since all this information indicates student’s preferences in learning. Afterwards, using them as inputs in a fuzzy logic controller, the system diagnoses the dominant learning modality of the student, and therefore provides individualized instruction. The personalized tutoring strategies is out of the paper’s scope. Figure 1 illustrates the architecture of the adaptive hypermedia educational system developed.

Fig. 1. Architecture of the adaptive hypermedia educational system developed.

3 Fuzzy Detection of Learning Modality The automatic identification of learning modality solves the problems occurred in traditional way using tests or questionnaires; some of which include students’ low motivation to answer a questionnaire and the lack of self-awareness of their learning preferences. Students may use more than one learning modality to some degree. This degree of uncertainty and ambiguity makes fuzzy logic the proper approach to measure the degree in which a student belongs to each modality and hence, to define effectively the dominant one. Figure 2 shows the proposed fuzzy logic controller for detecting automatically students’ learning modality.

208

C. Troussas et al.

Fig. 2. Fuzzy detection of learning modality.

The automatic detection of learning modality is based on student interaction with the educational system, namely the kind and media type of learning units s/he reads, the comments s/he creates on them and his/her participation in discussion rooms. In particular, the input variables of fuzzy controller are the following: • Theory Preference (TP): This level of preference is determined by the percentage of studied learning units having as “kind” metadata the value “theory”. • Exercise Preference (EP): It concerns the degree the student prefers to study learning units characterized as exercises considering the percentage of exercise learning material studied by the student. • Text Format Preference (TFP): It is related to the level the student prefers to study learning units in text format based on their percentage to the total learning units read. • Video Format Preference (VFP): It is about the preference level in video learning units according to the percentage of this media type that the learning units, studied by a student, have. • Preference on Commenting (PC): It refers to the degree that a student comments on learning units calculating the ratio of his/her comments to the average comments the peers make. • Preference on Discussions (PD): It shows the level of student participation in discussion rooms based on the ration of his/her answers to the average answers the peers give in discussions. The output variable of the fuzzy controller is the following: • Learning Modality (LM): It refers to a student’s preference in the way of learning based on Honey and Mumford model. The input set consists of arithmetical values mapping into fuzzy ones based on triangular membership functions. The same type of membership function is used for converting the output set. Regarding the PC and PD variables, their intervals are determined dynamically based on the averages of these features, the time the model applied. Table 1 illustrates the fuzzy input and output variables.

Dynamic Detection of Learning Modalities Using Fuzzy Logic

209

Table 1. Fuzzy input and output set of detecting learning modalities. Set Input

Variable Theory Preference (TP)

Linguistic term Low (L_TP) Medium (M_TP) High (H_TP) Exercise Preference (EP) Low (L_EP) Medium (M_EP) High (H_EP) Text Format Preference (TFP) Low (L_TFP) Medium (M_TFP) High (H_TFP) Video Format Preference (VFP) Low (L_VFP) Medium (M_VFP) High (H_VFP) Preference on Commenting (PC) Low (L_PC) Medium (M_PC) High (H_PC) Preference on Discussions (PD) Low (L_PD) Medium (M_PD) High (H_PD) Output Learning Modality (LM) Activist (A_LM) Reflector (R_LM) Theorist (T_LM) Pragmatist (P_LM) Note APC = Average of peers’ comments on learning units APD = Average of peers’ answers in discussions

Interval (0, 0.2, 0.4) (0.3, 0.5, 0.7) (0.6, 0.8, 1) (0, 0.2, 0.4) (0.3, 0.5, 0.7) (0.6, 0.8, 1) (0, 0.2, 0.4) (0.3, 0.5, 0.7) (0.6, 0.8, 1) (0, 0.2, 0.4) (0.3, 0.5, 0.7) (0.6, 0.8, 1) (0, 0.5APC, APC) (0.5APC, AP, 1.5APC) (APC, 1.5APC, 2APC) (0, 0.5APD, APD) (0.5APD, AP, 1.5APD) (APD, 1.5APD, 2APD) (0, 0.2, 0.35) (0.25, 0.4, 0.55) (0.45, 0.6, 0.75) (0.65, 0.85, 1)

The fuzzy rule base includes 216 IF-THEN type rules combining the input values in such way to correspond to the learning modality output. The reasoning based on which they have designed is related to the interactions that students may mainly proceed based on the attributes and preferences of their learning modality. Examples of the fuzzy rules are the following: 1. IF PC 2. IF PC

TP = L_TP and EP = H_TP and TFP = L_TFP and VFP = H_VFP and = L_PC and PD = H_PD THEN LM = A_LM TP = H_TP and EP = L_EP and TFP = L_TFP and VFP = H_VFP and = M_PC and PD = L_PD THEN LM = R_LM

210

3. IF PC 4. IF PC

C. Troussas et al.

TP = H_TP and EP = L_EP and TFP = H_TFP and VFP = L_VFP and = H_PC and PD = L_PD THEN LM = T_LM TP = L_TP and EP = H_EP and TFP = H_TFP and VFP = L_VFP and = L_PC and PD = M_PD THEN LM = P_LM

The decision-making unit of the fuzzy model uses the Mamdani method to combine the active rules and export the fuzzy output. Afterwards, the defuzzifier applies the Center of Gravity (COG) technique to convert the fuzzy output into a crisp value. To better understand how this module works, two examples of operation are given. In the first case, analyzing Helen’s interactions the time the fuzzy model applied, the 25% of the learning units she had studied were related with theory concepts (TP = L_TP), whereas the resting ones were concerned exercises (EP = H_TP). Moreover, the 30% of the whole amount were in text-based format (TFP = L_TFP); whereas the 70% of them were in video formatting (VFP = H_VFP). Helen’s commenting on the learning units was very low in comparison to the average of comments students made on them (PC = L_PC); while her participation in discussion rooms was high comparing with the average of peers’ answers in them (PD = H_PD). Therefore, the model diagnosed that her learning modality fitted better with activist. Another example is this of Peter, who, the time the model applied, had studied learning units about theory in the percentage of 65% (TP = H_TP) and exercises in the percentage of 35% (EP = L_EP). In particular, 40% of them were texts (TFP = L_TFP), while the rest were video (VFP = H_VFP). Peter’s ratio of commenting was on the average (PC = M_PC), while his ratio of participation in discussions was lower than the average (PD = L_PD). As a result, the fuzzy model predicts that his learning modality had a great match with reflector’s preferences.

4 Evaluation Study and Discussion In the evaluation study, 40 students participated. All the students are studying computer science in a public university and are in the first year of their study. The population’s average age is 18-19 years and it has the same number of male and female students. The students used the system in the contexts of the tutoring of the undergraduate course of Java during one academic semester. Before the utilization of the system, the students were given directions on how to use it as well as information about its functionalities. The evaluation study was conducted with the use of self-supplemented scale questionnaires incorporating closed questions for the students. For our research, we have used the questionnaire presented in [15], consisting of 12 questions as seen in Table 2. It was observed that students became very familiar with the system’s use and functionalities. This can be explained by the fact that they are computer science students and have an inherent inclination in software utilization. During their interaction with the system, they were very attracted to the system and spent a lot of time each day using it.

Dynamic Detection of Learning Modalities Using Fuzzy Logic

211

Table 2. Questionnaire of system evaluation. Category User Experience

Effectiveness of adaptivity mechanism

Impact on learning

N 1 2 3 4

Questions Rate the user interface of the system (1–10) Rate your learning experience (1–10) Did you like the interaction with the system? (1–10) Did the system detect appropriately your learning modality? (1–10) 5 Rate the way the learning content were presented? (1–10) 6 Rate the learning content relevance to your personal profile (1–10) 7 Would you like to use this platform in other courses as well? (1–10) 8 Did you find the tutoring system simple in use? (1–10) 9 Rate the overall quality of the software (1–10) 10 Did you find the software helpful for your lesson? (1–10) 11 Would you suggest the software to your friends to use it? (1–10) 12 Rate the easiness in interacting with the software (1–10)

The evaluation results are illustrated in Fig. 3 respectively. More specifically, the answers of the questions were aggregated in categories, as shown in Fig. 3. The category “User Experience” presents 80% (32 students) of high results, showing that students really had a very good learning experience. Also, concerning the category of “Effectiveness of Adaptivity mechanism”, 90% of students (36 students) declared that the adaptivity mechanism (mainly used for the detection of learning modalities using fuzzy logic) was very effective for creating a student-centric environment. Finally, regarding the category “Impact on Learning”, the results are very promising showing 87.5% (35 students) success of the pedagogical potential of our software. Analyzing the results of the evaluation study, there is considerable evidence that our approach can enhance the adaptivity in ITSs by providing an automatic tool for detecting students’ learning modalities and thus it can create a fertile ground for building more personalized tutoring systems.

212

C. Troussas et al.

EVALUATION RESULTS 100%

90%

80%

87.5%

80% 60% 40% 20%

7.5% 12.5%

2.5

7.5%

7.5%

5%

0% User Experience

Effec veness of adap vity mechanism Low

Average

Impact on learning

High

Fig. 3. Evaluation results.

The evaluation study also included the exploration of the accuracy of the detection of students’ learning modalities. As such, the students who participated in the experiment were also asked to fill in the questionnaire to manually detect their learning modalities. Then, we compared these results to the classification results made by the system, incorporating the fuzzy logic model. Based on the results, the system achieved 34 cases of accurate detection of learning modalities, meaning that the success rate is 85%. This fact is rewarding the authors’ attempts towards more adequately automating the process of learning modality detection and moving digital education to the fastgrowing field of adaptive tutoring systems.

5 Conclusion and Future Work This paper presents a novel approach for learning modality detection. The novelty of this approach lies in the learning modality model that is used and the minimum number of students’ characteristics that the fuzzy logic system takes as input. More specifically, the fuzzy logic system receives data regarding learners, such as the kind of learning units visited, their type of media, the comments made by them on learning units and their participation in discussions. Towards the detection of learning modality, we use the Honey and Mumford model. It needs to be noted that this process is dynamic, given that the output of the fuzzy logic model can alter in case the input is different. The detected learning modality can be following used to provide individualized instruction to learners, which is, however, out of the scope of this article. The presented approach was incorporated in an intelligent tutoring system for learning the Java programming language. Our system was evaluated through questionnaires which were based on an established evaluation framework of the related literature. The results show that our presented approach for dynamic learning

Dynamic Detection of Learning Modalities Using Fuzzy Logic

213

modalities detection has great accuracy. Moreover, the detection of students’ learning modality made by the system was compared to manual learning modality detection with great precision. Our future steps include a more extensive evaluation of the proposed approach. Moreover, our future work will explore the incorporation of other learning modalities models for dynamic detection as well as the description of personalized tutoring strategies that will be based on the detected learning modalities.

References 1. Troussas, C., Krouska, A., Sgouropoulou, C.: Collaboration and fuzzy-modeled personalization for mobile game-based learning in higher education. Comput. Educ. 144, 103698 (2020) 2. Phobun, P., Vicheanpanya, J.: Adaptive intelligent tutoring systems for e-learning systems. Proc.-Soc. Behav. Sci. 2(2), 4064–4069 (2010) 3. Ostrander, A., et al.: Evaluation of an intelligent team tutoring system for a collaborative two-person problem: surveillance. Comput. Hum. Behav. 104, 105873 (2019) 4. Moser, S., Zumbach, J.: Exploring the development and impact of learning styles: an empirical investigation based on explicit and implicit measures. Comput. Educ. 125, 146– 157 (2018) 5. Azzi, I., Jeghal, A., Radouane, A., Yahyaouy, A., Tairi, H.: A robust classification to predict learning styles in adaptive e-learning systems. Educ. Inf. Technol. 25(1), 437–448 (2020) 6. Khamparia, A., Pandey, B.: Association of learning styles with different e-learning problems: a systematic review and classification. Educ. Inf. Technol. 25, 1303–1331 (2020) 7. Truong, H.M.: Integrating learning styles and adaptive e-learning system: current developments, problems and opportunities. Comput. Hum. Behav. 55, 1185–1193 (2016) 8. Felder, R.M., Silverman, L.K.: Learning and teaching styles in engineering education. Eng. Educ. 78(7), 674–681 (1988) 9. Sweta, S., Lal, K.: Personalized adaptive learner model in e -learning system using FCM and fuzzy inference system. Int. J. Fuzzy Syst. 19(4), 1249–1260 (2017) 10. Deborah, L.J., Sathiyaseelan, R., Audithan, S., Vijayakumar, P.: Fuzzy-logic based learning style prediction in e-learning using web interface information. Sadhana 40(2), 379–394 (2015) 11. Ngatirin, N.R., Zainol, Z., Rashid, N.A.A.: Fuzzy case-based approach for detection of learning styles: a proposed model. J. Eng. Appl. Sci. 13(2), 321–327 (2018) 12. Alian, M., Shaout, A.: Predicting learners styles based on fuzzy model. Educ. Inf. Technol. 22(5), 2217–2234 (2017) 13. Fleming, N.D.: I’m different; not dumb. modes of presentation (VARK) in the tertiary classroom. In: Zelmer, A. (ed.) Research and Development in Higher Education, Proceedings of the 1995 Annual Conference of the Higher Education and Research Development Society of Australasia (HERDSA), vol. 18, pp. 308–313 (1995) 14. Honey, P., Mumford, A.: The Manual of Learning Style. Peter Honey, Maidenhead (1992) 15. Alepis, E., Troussas, C.: M-learning programming platform: evaluation in elementary schools. Informatica 41(4), 471–478 (2017)

Adaptive Music Therapy for Alzheimer’s Disease Using Virtual Reality Alexie Byrns1, Hamdi Ben Abdessalem1(&), Marc Cuesta2, Marie-Andrée Bruneau2, Sylvie Belleville2, and Claude Frasson1 1

2

Département d’Informatique et de Recherche Opérationnelle, Université de Montréal, Montréal, Canada {alexie.byrns,hamdi.ben.abdessalem}@umontreal.ca, [email protected] Centre de Recherche de l’Institut de Gériatrie de Montréal, Montréal, Canada [email protected], {marie.andree.bruneau, sylvie.belleville}@umontreal.ca

Abstract. With Alzheimer’s disease becoming more prevalent, finding effective treatment is imperative. While no pharmacological treatment has yet proven to be efficient, we explore how technology can be integrated into nonpharmacological intervention to enhance its benefits. We propose a new and unique version of Music Therapy, an already existing therapy known to be beneficial. Music therapy has been shown to improve emotions and certain cognitive functions, which is the main focus of our study. To this aim, we designed a virtual reality environment consisting of a music theatre in which participants are immersed among the audience. A meticulously chosen selection of songs is presented on stage accompanied by visual effects. Results show that the environment decreases negative emotions, increases positive emotions, and improved memory performances were observed in most participants following the immersive experience. We speculate that by improving emotions through adaptive music therapy, our environment facilitates memory recall. With virtual reality now being easily accessible and inexpensive, we believe this novel approach could help patients through the disease. Keywords: Virtual reality  Alzheimer’s disease Intelligent health application  Emotions

 Music therapy  EEG 

1 Introduction Alzheimer’s disease (AD), the most common form of dementia, is rapidly becoming more prevalent as the population ages. By bringing about progressive cognitive impairment and neuropsychiatric symptoms, AD causes discomfort and suffering for both patients and caregivers. As no pharmacological treatment has yet been discovered, research has started to shift towards non-pharmacological interventions. Recent technological advances have made it easier to design non-pharmacological approaches aimed at increasing patient well-being. We believe introducing Virtual Reality (VR) in this new field of study is a promising avenue. Indeed, VR has proven to be useful in a variety of therapeutic interventions, such as anxiety disorders and phobias [1]. Most VR environments are however unable to © Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 214–219, 2020. https://doi.org/10.1007/978-3-030-49663-0_25

Adaptive Music Therapy for Alzheimer’s Disease

215

evolve according to the reactions of the patient. An important characteristic of AD is that it is accompanied by negative emotions, which may have an influence on cognitive abilities and memory access [2]. Empirical and anecdotal reports have pointed towards an already-existing non-pharmacological intervention, Music Therapy (MT), as a promising avenue in helping AD patients on a psychological, behavioral and cognitive level [3, 4]. For this reason, we designed a virtual environment which combines the benefits of both VR and MT, potentially giving rise to an accessible and low-cost solution to personalized MT. To adapt the environment to the patient, our design uses a measure of the participant’s emotions to make parameters of the environment adapt to the measured emotions. We focus our research on older adults with subjective cognitive decline (SCD), as these individuals progress to dementia at a higher rate than those with no subjective impression of decline and are sometimes in the early stages of the disease [5]. With these design objectives, our research questions are the following: Q1: is it possible to reduce negative emotions through virtual music therapy? And Q2: is it possible to improve memory and cognitive functions through adaptive virtual music therapy? The rest of this paper is organized as follows. In Sect. 2, we give an overview of the characteristics of AD. In Sect. 3, we examine how music can provide a therapy for Alzheimer’s disease and we present our solution of adaptive virtual reality environment. In Sect. 4, we detail the experimental procedure undertaken to validate our hypotheses. Finally, in Sect. 5 we present and discuss the obtained results.

2 Characteristics of Alzheimer’s Disease Alzheimer’s disease (AD) is a neurodegenerative disease whose most notable symptom is the deterioration of both short- and long-term memory. In addition to memory impairment, the disease affects behavior, non-memory cognitive abilities and physical abilities. Much research has revealed that neural damage in specific regions of the brain plays a significant role in the symptoms of AD [6]. As patients progress in the disease, cognitive and functional abilities become significantly impaired, resulting in difficulties in decision-making, daily tasks and communication. Individuals also experience a decrease in general interest and often become apathetic. During the final stages of the disease, patients become practically incapable of communicating, have difficulty eating and display extreme apathy [7].

3 A Music Therapy Virtual Environment with Adaptation for Alzheimer’s Disease 3.1

Music Therapy

Music displays great therapeutic potential for many neuropsychiatric conditions and AD is no exception. Indeed, empirical evidence suggests that music therapy (MT) can help improve cognitive, psychological and behavioral impairments induced by the disease [3, 8].

216

A. Byrns et al.

There are already many studies showing the benefits of MT for AD patients [3, 4, 8]. We propose to combine MT with a VR environment and an electroencephalogram device (EEG) able to assess the emotions felt by the participants. By focusing on the underlying neurological mechanisms which give music its therapeutic capability, we designed a new version of MT. As AD patients struggle at an emotional, cognitive, psychological and behavioral level, we target these symptoms directly. 3.2

Adaptive Virtual Reality Music Environment

Our therapeutic environment consists of a music theatre created using Unity 3D software in which the participant is immersed, facing the stage up front. Red curtains open and close as different songs are presented on stage. For each song, an appropriate selection of instruments is presented on stage, each instrument slightly animated with the music. In addition, the stage presents firework-like light visual effects taken from the Unity 3D Asset Store. These are designed to fit each individual song (Fig. 1).

Fig. 1. The virtual MT environment for two different songs.

The choice of music was based on empirical studies and theories of music [9–11]. A series of eight 30 s song excerpts are sequentially presented, accompanied by visual scenes designed with specific color shades and lightings in order to achieve the emotional purpose of the song (relaxation, engagement, etc.) [12]. In order to optimize the emotional and cognitive impact of the virtual experience, the environment was adapted to provide the most beneficial therapeutic experience to each individual participant.

4 Experiments In order to analyze the impact of the Music Therapy environment on memory and attention performances, we created 6 attention and memory exercises using Unity 3D software. Our approach was tested on 19 participants (13 females, 6 males) with subjective cognitive decline (SCD) and a mean age = 72.26 (SD = 5.82). The participants took part in two sessions: the first one to ensure eligibility for the study and the second one

Adaptive Music Therapy for Alzheimer’s Disease

217

for the actual experiment. During the pre-experimental session, participants were invited to sign a consent form and perform clinical tests to confirm diagnosis of SCD and characterize them. The second session was the experimental session. Participants were first invited to fill questionnaires: the Positive and Negative Affect Schedule (PANAS) scale [13], a self-assessment of emotions, and a questionnaire on cyber-sickness [14]. Once completed, the participants were equipped with an EEG headset and asked to solve attention and memory exercises. Following the cognitive tests, a FOVE VR headset was installed and the VR MT began. This relaxing environment lasted for about 10 min. Following the MT, participants completed again different variants of the same attention and memory tests. Lastly, the participants were asked to once again fill up the PANAS scale, cyber-sickness, as well as AttrakDiff 2 [15].

5 Results and Discussion The first objective of the research was to discover whether it is possible to reduce negative emotions through virtual music therapy. To this end, we started by analyzing the emotions from the participants before, during and after the virtual MT immersion. This was done using the measurement of frustration extracted from the Emotiv EEG. Results show that the mean frustration level before the music therapy was 0.69. The mean frustration level during the immersion was 0.45. After the MT, the mean frustration level was 0.51 (Fig. 2).

Fig. 2. Boxplot of general mean frustration

Overall, the frustration decreased when the participants were in the MT, and the positive effect on the frustration level was still observed after the MT. The effect obtained in our first analysis lead to our second research question: is it possible to improve memory and cognitive functions through adaptive virtual music therapy? To this end, we analyzed performance improvements on the attention and memory exercises. Results showed small improvements on two of the three attention exercises. On exercise 1, the general mean improvement was 6.59%. On the second exercise, there was a mean improvement of 1.91%. The performance

218

A. Byrns et al.

improvement on the third exercise was 3.51%. For the fourth exercise (first memory exercise), a mean improvement of 6.14% was observed. For the fifth exercise, the mean improvement was 8.95%. Finally, the sixth exercise showed the highest improvement, reaching 36.84% improvement. Finally, we compared improvement in attention exercises with the memory exercises. These results show a large increase in memory performance following the adaptive virtual music therapy and only a small improvement in attention abilities (Fig. 3).

Fig. 3. Histogram of performance improvement attention compared to memory exercises.

Our first analysis confirmed that the virtual MT reduces negative emotions such as frustration. Our second and final analysis showed that reducing negative emotions through MT improved memory performances.

6 Conclusion In this paper we presented a novel approach which could be used to improve the memory performance of subjective cognitive decline patients using adaptive virtual music therapy. Experiments were conducted during which the participants were first asked to perform attention and memory exercises, then were immersed in the music therapy environment in order to reduce negative emotions before completing a final set of exercises. The environment was built to react dynamically to the patient’s emotions and change accordingly. Results showed that the virtual music therapy environment helps reduce negative emotions, most notably frustration. In addition, results showed improved memory performance on selected exercises in most participants. We speculate that the reduction of negative emotions entailed by the adaptive music therapy environment helped improve short-term memory. Acknowledgement. We acknowledge NSERC-CRD and Beam Me Up for funding this work.

Adaptive Music Therapy for Alzheimer’s Disease

219

References 1. Parsons, T.D., Rizzo, A.A.: Affective outcomes of virtual reality exposure therapy for anxiety and specific phobias: a meta-analysis. J. Behav. Ther. Exp. Psychiatry 39, 250–261 (2008) 2. Cavalera, C., Pepe, A., Zurloni, V., et al.: Negative social emotions and cognition: shame, guilt and working memory impairments. Acta Physiol. 188, 9–15 (2018) 3. Gallego, M.G., García, J.G.: Music therapy and Alzheimer’s disease: cognitive, psychological, and behavioural effects. Neurol. (Engl. Ed.) 32, 300–308 (2017) 4. Fang, R., Ye, S., Huangfu, J., Calimag, D.P.: Music therapy is a potential intervention for cognition of Alzheimer’s disease: a mini-review. Transl. Neurodegener. 6, 2 (2017). https:// doi.org/10.1186/s40035-017-0073-9 5. Jessen, F., Amariglio, R.E., Van Boxtel, M., et al.: A conceptual framework for research on subjective cognitive decline in preclinical Alzheimer’s disease. Alzheimer’s Dement. 10, 844–852 (2014) 6. Pini, L., Pievani, M., Bocchetta, M., et al.: Brain atrophy in Alzheimer’s disease and aging. Ageing Res. Rev. 30, 25–48 (2016) 7. Gottesman, R.T., Stern, Y.: Behavioral and psychiatric symptoms of dementia and rate of decline in Alzheimer’s Disease. Front. Pharmacol. 10 (2019) 8. Ray, K.D., Mittelman, M.S.: Music therapy: a nonpharmacological approach to the care of agitation and depressive symptoms for nursing home residents with dementia. Dementia 16, 689–710 (2017) 9. Krumhansl, C.L., Zupnick, J.A.: Cascading reminiscence bumps in popular music. Psychol. Sci. 24, 2057–2068 (2013) 10. Belfi, A.M., Karlan, B., Tranel, D.: Music evokes vivid autobiographical memories. Memory 24, 979–989 (2016) 11. de la Torre-Luque, A., Caparros-Gonzalez, R.A., Bastard, T., et al.: Acute stress recovery through listening to Melomics relaxing music: a randomized controlled trial. Nordic J. Music Ther. 26, 124–141 (2017) 12. Valdez, P., Mehrabian, A.: Effects of color on emotions. J. Exp. Psychol. Gen. 123, 394 (1994) 13. Watson, D., Clark, L.A., Tellegen, A.: Development and validation of brief measures of positive and negative affect: the PANAS scales. J. Pers. Soc. Psychol. 54, 1063 (1988) 14. Kennedy, R.S., Lane, N.E., Berbaum, K.S., Lilienthal, M.G.: Simulator sickness questionnaire: an enhanced method for quantifying simulator sickness. Int. J. Aviat. Psychol. 3, 203– 220 (1993) 15. Lallemand, C., Koenig, V., Gronier, G., Martin, R.: Création et validation d’une version française du questionnaire AttrakDiff pour l’évaluation de l’expérience utilisateur des systèmes interactifs. Eur. Rev. Appl. Psychol. 65, 239–252 (2015)

Improving Cognitive and Emotional State Using 3D Virtual Reality Orientation Game Manish Kumar Jha1, Marwa Boukadida1, Hamdi Ben Abdessalem1(&), Alexie Byrns1, Marc Cuesta2, Marie-Andrée Bruneau2, Sylvie Belleville2, and Claude Frasson1 1

Département d’Informatique et de Recherche Opérationnelle, Université de Montréal, Montréal, Canada {manish.jha,marwa.boukadida,hamdi.ben.abdessalem, alexie.byrns}@umontreal.ca, [email protected] 2 Centre de Recherche de l’Institut de Gériatrie de Montréal, Montréal, Canada [email protected], {marie.andree.bruneau, sylvie.belleville}@umontreal.ca

Abstract. Patients suffering from Alzheimer’s Disease (AD) exhibit an impairment in performing tasks related to spatial navigation. Tasks which require navigational skills by building a cognitive map of the surrounding are found effective in cognitive training. In this paper we investigated the effect of cognitive training using a fully immersive 3D VR orientation game. We implemented an intelligent guidance system which helps to reduce the negative emotions if the participants experience difficulty completing the quests of the game. We found that after playing the orientation game, participants performed better in memory and in certain attention exercises. We also studied the effects of guidance system to reduce the frustration during cognitive training using VR environments. Keywords: Virtual Reality  Orientation Game adaptation  Assistance system

 EEG  Immersive environment 

1 Introduction The ability to find one’s way using spatial reference frames and cues from our surrounding is a complex cognitive process. Studies show that navigational skills decline with ageing [1]. However, people with Alzheimer’s disease (AD) display a significantly higher decline, which is also one of the early symptoms of the disease. Research shows that spatial navigation training programs in older persons led to improvements in spatial performances [2]. Virtual Reality (VR) applications can be used to address the challenges of cognitive training of dementia patients due to a high level of interaction possible within a virtual environment (VE) without being in any risk otherwise posed by real-life surroundings. In this paper, we present a fully immersive VE where the participant must find items of interest in a public garden. However, navigating in an unfamiliar place can be challenging, consequently leading to higher negative emotions and a tendency to give up © Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 220–225, 2020. https://doi.org/10.1007/978-3-030-49663-0_26

Improving Cognitive and Emotional State

221

before completing the experiment. Research shows that it is more beneficial for patients with cognitive impairment to be helped through the completion of a challenge, rather than see the challenge failed [3]. It is important to present both audio and visual cues to cater to the needs of a specific profile of patients suffering from either visual or auditory impairments [4]. Thus, real-time assistance with audio and visual feedback is one of the mandatory components in games for elders, to incorporate mechanism which achieves high-level engagement by keeping player filled with positive emotions. Hence, we implemented an intelligent guidance system based on participant’s behavior, that helps them to complete the tasks without being explicitly asked for help. Our goal for this study is to investigate the effect of the orientation game and the guidance system on the cognitive functions of people suffering from subjective cognitive decline (SCD), a preclinical state of possible Alzheimer’s Disease (AD). We state our research objectives as the following: Q1: is it possible to stimulate the brain using a virtual maze game in order to enhance attention and memory? Q2: is it possible to help participants in order to reduce negative emotions? The rest of this paper is organized as follows. In Sect. 2, we discuss the related works and the cognitive map theory. In Sect. 3, we describe the orientation game: the environment, the objectives and the guidance system. In Sect. 4, we detail the cognitive tests and the experimental procedure undertaken to validate our hypotheses. Finally, in Sect. 5 we present and discuss the obtained results.

2 Related Works According to the cognitive map theory, the formation of representations of spatial information – in other words, the creation of a cognitive map – helps reduce cognitive load and increases recall and the encoding of novel information [5, 6]. Studies report that decreases in the volume of the hippocampus – a structure playing a key role in memory – correlate with a decline in cognitive function. Indeed, it is speculated that increasing grey matter in the hippocampus could entail better memory [7]. Interestingly, playing 3D video games over a period, such as Super Mario 64, reportedly increases hippocampal volume [8], as well as increases performance in episodic and spatial memory quests. It is speculated that 3D games, such as Super Mario 64, lead players to create a cognitive map of the environment. It has been observed that virtual reality environments (VE) lead to formation of cognitive maps similar to the real environments [9]. This makes VR games ideal for simulating real life scenarios for cognitive training of elderly. Certain VR applications have concentrated on games focused on performing activities of daily life such as cooking, driving and shopping [10]. Another key advantage of using VE is that they offer a safe way to achieve high level of interaction adaptable to the characteristics and needs of individual patients [10, 11]. A fully immersive VE offers higher sense of ‘presence’ and interaction which subsequently affects the behavioral responses of patients.

222

M. K. Jha et al.

3 Orientation Game 3.1

Environment: Orientation Game

The fully immersive VR environment simulates a botanical garden in the form of a 5  5 maze. In this environment, trees form the walls of the maze and clearings through the trees are the pathways. The participant starts at one end of the garden and has to navigate using a joystick by clicking in the direction in which he/she intends to move. Other elements in the game are: 1) a map of the garden with geographical directions, 2) the position and direction of the user shown by a red arrow in the map, 3) a flashing blue circle representing the location of the items, 4) visual hints displayed when needed and, 5) verbal messages to the participant. The game starts with a tutorial to let the users familiarize themselves with the environment and the controls. It consists of four quests. For the first three quests, the participant is asked to collect an item located at a specific location of the 5  5 maze. We display the name of the item and its location by a flashing blue circle on the map for 5 s. The user needs to reach the location and collect the requested item. When the item is collected, we remove it from the list and display the next item and its location. 3.2

Guidance System

We implemented a rule-based guidance system that provides navigational hints or audio and visual messages to the participants and helps them in completing the quests. It actively monitors the participant’s emotions, namely frustration, excitement, engagement, meditation, and valence using Emotiv electroencephalograph (EEG) headset data in real-time. On sensing a situation where the participant may need a hint, it sends a message to the VR system, which displays the hints in form of location in the map or audio and visual messages in case of text-based hints. Figure 1 shows the different hints provided by the guidance system.

Fig. 1. Different level of hints

The hints provided by the guidance system are activated in three different cases: 1. Emotions: At every second, the mean of the change and the rate of the change of emotion values in past ten seconds are used to calculate a net score for each

Improving Cognitive and Emotional State

223

emotion. The emotion with the maximum score is compared with an empirically defined threshold to activate the emotion-based hints. 2. Away from target: If the participant takes three steps or more, all of which are at four blocks or more from the target, the map displaying target location is activated. 3. No Movement: If the participant doesn’t move for more than a given amount of time, the map displaying target location is activated. The details change with different levels of hints provided by the guidance system. Level 1 provides the least information and displays only the participant’s location and the object’s location on the map. Additionally, level 2 displays a text message in a prompt in the VR environment along with the verbal narration of the message. Level 3 hint highlights a path in the map, which the participant can follow to reach a location immediately next to the actual location of the object. This leaves some scope of exploration and the participant needs to search for the object in all possible directions. Level 4 hint highlights the complete path leading to the object’s location on the map. In case of activation based on participant’s emotion, if the hint is triggered by frustration, the level of hint increases which provides more details to find the object. On the other hand, if the hint is triggered by positive emotions such as excitement or engagement, the level of hint decreases. In case of hint activated due to no movement of the participant, the level increases every fifteen seconds till the participant moves. When the participant is away from the object as determined by the guidance system, the hint provided is always level 2-1.

4 Experiments Since our main goal is to analyze the impact of orientation therapy and building a cognitive map on the attention and memory performances, we developed 3 attention exercises and 3 memory exercises to compare participants’ performances before and after playing the game. We tested our approach with 15 participants (11 females) with subjective cognitive decline (SCD) and a mean age = 73.4 (SD = 5.73). The participants took part in two sessions. In the first session, we performed some assessments to ensure that they were eligible to conduct the experiments. During the second session (experimental session), the participants were invited to fill a preexperiment form. Afterwards, we equipped them with an EEG headset, and they start resolving attention and memory exercises. When they complete the exercises, we equip them with Fove VR headset, and they start the Orientation Game. Following the game, we removed the Fove VR headset and the participants were asked to complete the attention and memory exercises again but with different examples. Finally, we remove the EEG headset and the fill up a post-experiment form.

224

M. K. Jha et al.

5 Results and Discussion In order to study the enhancement in attention and memory, we analyzed performance improvement before and after the orientation therapy of the first three exercises (Attention exercises). On exercise 1, the general mean improvement was 6.67%. on the second exercise, there was a mean improvement of 0.61%. And the performance improvement of the third exercise was 0%. We also analyzed the performance improvement before and after the orientation therapy of the memory exercises (exercise 4, 5 and 6). For the fourth exercise, the mean improvement was 1.11%. For the fifth exercise, the mean improvement was 12%. Finally, the mean improvement is 26.67% for exercise 6 which is the highest percentage of improvement. These results show increase of memory performance following the orientation therapy and an improvement in attention abilities in certain participants. Next, we analyzed the frustration of the participants before and after the hints are provided. As shown in Fig. 2, the guidance system provided at least one or more hints to 9 out of the 15 participants. In the rest of the cases, participants didn’t have any difficulty in completing the quests.

Fig. 2. Average values of frustrations of each participant, ten seconds before and after the hints

We notice that in 6 out of 9 participants, frustration values in next ten seconds decreased for the hints provided to them. Participants 8, 10, 11 and 13 were respectively provided one, two, one, and one hint, while the rest of the participants received at least five or more hints. For the different types of hints, we observed that except for hint level 2-2 and hint level 4, the average values of frustration for all the participants were lesser in the next ten seconds after the hints were provided. Hint level 2-2 provides a warning message: ‘You’re too far. Try to take few steps back.’. This led the participants to believe that they might be doing something wrong leading to higher frustration. Hint level 4 displays the complete path to the item’s location. But, since we configured the hint to appear for only four seconds, it was not enough for the participants to memorize the complete path, which may lead to a more frustration.

Improving Cognitive and Emotional State

225

6 Conclusion In this paper, we designed a 3D VR orientation game in with real-time guidance system which can be used for cognitive training of the patients suffering from pre-clinical states of Alzheimer’s disease. The results show an improvement in memory performance for most of the participants, and better attention abilities for some of the participants after the therapy. The guidance system is effective in reducing the frustration while solving quests of the game. In some cases, the decrease is not significant, but the stabilization of frustration after a continuous increase after the hint is provided shows the usefulness of hints. Also, the increase in negative emotions after two hints shows that hints need to be carefully designed to give positive messages and the time taken to understand the hints should be taken in consideration. Acknowledgement. We acknowledge NSERC-CRD and Beam Me Up for funding this work.

References 1. Kirasic, K.C.: Spatial cognition and behavior in young and elderly adults: implications for learning new environments. Psychol. Aging 6, 10 (1991) 2. Lövdén, M., et al.: Spatial navigation training protects the hippocampus against age-related changes during early and late adulthood. Neurobiol. Aging 33, 620-e9 (2012) 3. Pigot, H., Mayers, A., Giroux, S.: The intelligent habitat and everyday life activity support. In: Proceedings of the 5th International conference on Simulations in Biomedicine, April 2003 4. Marin, J.G., Navarro, K.F., Lawrence, E.: Serious games to improve the physical health of the elderly: a categorization scheme. In: International Conference on Advances in Humanoriented and Personalized Mechanisms, Technologies, and Services. Barcelona, Spain (2011) 5. Kitchin, R.M.: Cognitive maps: What are they and why study them? J. Environ. Psychol. 14, 1–19 (1994) 6. Tolman, E.C.: Cognitive maps in rats and men. Psychol. Rev. 55, 189 (1948) 7. Konishi, K., Bohbot, V.D.: Spatial navigational strategies correlate with gray matter in the hippocampus of healthy older adults tested in a virtual maze. Front. Aging Neurosci. 5, 1 (2013) 8. West, G.L., et al.: Playing Super Mario 64 increases hippocampal grey matter in older adults. PLoS ONE 12, e0187779 (2017) 9. Péruch, P., Gaunet, F.: Virtual environments as a promising tool for investigating human spatial cognition. Cahiers de Psychologie Cognitive/Curr. Psychol. Cogn. (1998) 10. Ball, K., et al.: Effects of cognitive training interventions with older adults: a randomized controlled trial. JAMA 288, 2271–2281 (2002) 11. Imbeault, F., Bouchard, B., Bouzouane, A.: Serious games in cognitive training for Alzheimer’s patients. In: 2011 IEEE 1st International Conference on Serious Games and Applications for Health (SeGAH) (2011)

A Multidimensional Deep Learner Model of Urgent Instructor Intervention Need in MOOC Forum Posts Laila Alrajhi(&), Khulood Alharbi, and Alexandra I. Cristea Computer Science, Durham University, Durham, UK {laila.m.alrajhi,khulood.o.alharbi, alexandra.i.cristea}@durham.ac.uk

Abstract. In recent years, massive open online courses (MOOCs) have become one of the most exciting innovations in e-learning environments. Thousands of learners around the world enroll on these online platforms to satisfy their learning needs (mostly) free of charge. However, despite the advantages MOOCs offer learners, dropout rates are high. Struggling learners often describe their feelings of confusion and need for help via forum posts. However, the often-huge numbers of posts on forums make it unlikely that instructors can respond to all learners and many of these urgent posts are overlooked or discarded. To overcome this, mining raw data for learners’ posts may provide a helpful way of classifying posts where learners require urgent intervention from instructors, to help learners and reduce the current high dropout rates. In this paper we propose, a method based on correlations of different dimensions of learners’ posts to determine the need for urgent intervention. Our initial statistical analysis found some interesting significant correlations between posts expressing sentiment, confusion, opinion, questions, and answers and the need for urgent intervention. Thus, we have developed a multidimensional deep learner model combining these features with natural language processing (NLP). To illustrate our method, we used a benchmark dataset of 29598 posts, from three different academic subject areas. The findings highlight that the combined, multi-dimensional features model is more effective than the text-only (NLP) analysis, showing that future models need to be optimised based on all these dimensions, when classifying urgent posts. Keywords: MOOCs  Intelligent tutoring system  Urgent intervention  Deep learning  Mixed data

1 Introduction MOOCs are open distance-learning environments with large-scale enrolment [1]. Since their emergence as a popular mode of learning in 2012 [2], they have been delivering learning opportunities to a wide range of learners free or at low cost across different domains around the world [3], attracting thousands of learners to take advantage of the offered opportunities [4]. Amongst these, MOOC online discussion forums offer opportunities for learners to ask questions and express their feelings about course © Springer Nature Switzerland AG 2020 V. Kumar and C. Troussas (Eds.): ITS 2020, LNCS 12149, pp. 226–236, 2020. https://doi.org/10.1007/978-3-030-49663-0_27

A Multidimensional Deep Learner Model of Urgent Instructor Intervention

227

content and their learning progress, via posts. These can connect learners to learners, or learners to instructors. Instructor intervention is sought after, and could make the difference between a learners completing the course or not. However, due to the largescale participation in these platforms and extremely high ratios of learners to instructors, it is difficult for instructors to monitor all posts and determine when to intervene [5]. Therefore, researchers, MOOCs designers, and universities have begun to pay more attention to instructors’ presence and their interventions in MOOC-based environments. As a result, many recent studies have focussed on detecting struggling learners’ posts, to predict when they require intervention by instructors. Some of these approaches use features extracted from the properties of posts [6] and others are based on text-only features [7–9]. However, few studies have combined mixed data such as text data with metadata [10, 11], and they are limited, as they are all based on shallow machine learning (ML) only. Recently, deep learning models have been used for textclassification tasks [12]. Thus, we formulated the following two research questions: RQ1: Is there a relationship between the various dimensions of the learners’ posts and their need for urgent instructor intervention? RQ2: Does using several dimensions as features in addition to textual data increase the model’s predictive power of the need for urgent instructor intervention, when using deep learning? In this paper, we contribute thus by answering the above questions via building a new classifier for this area, based on a deep learning model that incorporates different dimensions of MOOC posts, i.e., numerical data in addition to textual data, to classify urgent posts.

2 Related Work 2.1

Analysis in MOOCs

Recently, in the MOOC context, there have been significant efforts to study, analyse and evaluate different aspects of learners including sentiment [13], confusion [14] or need of urgent intervention [8], to improve the educational quality of MOOC environments and improve MOOCs’ overall educational outcomes. In terms of sentiment analysis, researchers have employed sentiment analysis for different purposes; for instance, they used it to predict attrition [15], performance and learning outcome [16], emotions [17] and dropout [13] by using different machine learning approaches. These methods include statistical analysis, shallow machine learning and deep neural networks. A growing number of researchers have studied confusion; [18] explored click patterns to identify the impact of confusion on learner dropout; [14] attempted to assist confused learners, by developing a tool that recommends relevant video clips to learners who had submitted posts that indicated learner confusion.

228

L. Alrajhi et al.

However, while all of these studies focus mainly on employing learner sentiment and confusion to achieve different goals, they do not exploit sentiment and confusion indicators to predict urgent instructor intervention. Therefore, our research seeks to use these aspects as a metadata to predict urgency posts. 2.2

Urgent Intervention in MOOCs

Detection of the need for urgent instructor intervention is arguably one of the most important issues in MOOC environments. The problem was first proposed and tackled [6] as a binary prediction task based on instructors’ intervention histories. They [6] used traditional models (logistic regression [LR], the linear Markov chain model [LMCM], and the global chain model [GCM]). A follow-up study [10] proposed the use of L1-regularised logistic regression as a binary classifier. They [10] predicted when learners required intervention or not, by adding prior knowledge about the type of forum (thread) as a feature, in addition to linguistic features of posts. Another study [11], tried to build a generalised model, using different shallow ML models with linguistic features with metadata (‘Up_count’, ‘Reads’ and ‘Post_type’) - some extracted using NLP tools. In general, studies used as inputs for classification models either text-only data [5, 7–9, 19], or different post-specific features, such as linguistic features, other metadata [6], or a combination of textual data and post features [10, 11]. Moreover, they either used traditional machine learning classifiers [10, 11], or, more recently transfer [5, 7] and deep learning [8, 9, 19], as explored next. Transfer learning, as cross-domain classification was proposed [7] by training different traditional classifiers (support vector machine [SVM] and logistic regression) on three different dimensions (confusion, urgency, and sentiment), before validating them across different domains. The study [7] found low cross-domain classification accuracy, but mentioned that transfer learning should be given more attention. Moreover, this model is based on text-only data. A follow-up study [5] proposed a transfer learning framework based on deep learning (Convolution-LSTM [long short-term memory]) to predict different dimensions (confusion, urgency, and sentiment) in posts, using textual data only. This study is the first to apply deep learning in filtering posts, to predict which learners require urgent intervention. The following studies are all based on deep learning and used only textual data as an input to the model. [9] classified urgent posts with recurrent convolutional neural networks (RCNN), which use the embedded information of a current word, to capture contextual information. [8] proposed a hybrid character-word neural network based on attention, to identify posts that require urgent instructor intervention, also adding course information associated with a given post for contextualisation. [19] produced EduBERT as a pretrained deep language model for learning analytics, trained on forum data from different online courses. They classified the urgency of instructor intervention as a text classification tasks, by fine-tuning EduBERT. To the best of our knowledge, no studies have used deep learning as an urgencyclassifier model with mixed-input data. In our study, we incorporated several different dimensions combining numerical data with textual data.

A Multidimensional Deep Learner Model of Urgent Instructor Intervention

229

3 Methodology We aim here to analyse combining several different dimensions with textual data, to predict posts where learners require urgent intervention in a MOOC environment. 3.1

Dataset

In this study, we used the Stanford MOOC benchmark posts dataset [14], which is available to academic researchers by request. It covers three different domain areas: education, humanities/sciences, and medicine, and contains 29,604 anonymised posts from 11 courses. Each post was manually labelled by three independent human coders to create a gold-standard dataset. Each post was evaluated against six categories/dimensions (sentiment, confusion, urgency, opinion, question, and answer). Opinion, question and answer were assigned binary values while sentiment, confusion and urgency were assigned values based on a scale of 1–7. To explain, for sentiment, 1 = extremely negative and 7 = extremely positive; for confusion, 1 = extremely knowledgeable and 7 = extremely confused; for urgency, 1 = no reason to read the post and 7 = extremely urgent: instructor definitely needs to reply. The final goldstandard dataset contains a column for each dimension, based on computing scores between coders. For more information about the coding process and the creation of the gold-standard dataset see their website [20]. Although the original dataset is multivalued, in order not to add additional complexity, we followed [8] and structured the problem of detecting urgent posts as a binary classification task by converting the (1–7) scale to binary values: • Urgent intervention required > 4 ) Need for urgent intervention (1) • Otherwise 4 and negative otherwise. 3.3

Predictive Urgent Intervention Models

The first step towards answering the second research question was to develop a basic model based on text-only data and then incorporate other dimensions (sentiment scale, confusion scale, opinion value, question value and answer value) as numerical features. In general, we trained the text data (learners’ posts) with a convolutional neural network (CNN) model and the numerical data (multiple dimensions) with a multi-layer perceptron (MLP) model (Fig. 1). We selected CNN to classify text by following [8], as they reported that TextCNN outperforms LSTM. Note though that our goal was to show the power of the multidimensional approach and not optimise the individual parts of our classifier.

Fig. 1. Different types of data with different networks.

We divided the data into two distinct sets: one for training and the other for testing (80% and 20%, respectively) using stratified sampling to ensure that the training and testing sets have approximately the same distribution of the different classes (nonurgent and urgent), although the dataset has a large number of non-urgent posts. Text Model. As shown in Fig. 2, in the text model, the first layer is the input layer, with a maximum length = 200, as we padded out each post to a predetermined length (200 words) by following the current state of the art [8], to control the length of the input sequence to the model. Then, the embedding layer reused pre-trained word embeddings (Word2vec GoogleNews-vectors-negative300) and was fine-tuned during training. We selected (Word2vec) as the pre-trained model, as [8] showed that it outperformed Glove on classifying urgency tasks. Next, for the CNN layer, we applied 1D Convolution with (128 filters, kernel size of {3, 4, 5} and Rectified Linear Unit ‘ReLU’ as activation function) as in [8], to derive interesting features, followed by 1D Global max pooling, to produce our features. Then, for the drop-out layer, we used a drop-out rate of 0.5 as in [8] to prevent overfitting. Then, the fully connected layer with the sigmoid as an activation function was used to classify the output I as: 1 – needs urgent intervention or 0 – no intervention required:

A Multidimensional Deep Learner Model of Urgent Instructor Intervention

 I¼

1; if [ :5 0; if  :5

231

ð1Þ

After constructing the model, we trained it using the Adam optimisation algorithm, as in [8]. We used binary cross-entropy as a loss function because our problems involve binary decisions, and we used the popular metrics of accuracy to measure performance. In addition, for a more comprehensive result and to deal with potential majority class bias, we calculated precision, recall and F1-score for each class. Overall Model (Text Model + Other Dimensions Model). The overall model is a general model that contains mixed data to predict urgent posts. Here, we added numerical data as features in addition to text. As an initial study, we combined the text data with meta-data in one single model; however, the model’s performance was unsatisfactory. As our model combines multiple inputs and mixed data, we therefore constructed two different sub-models (Fig. 2), with the first sub-model being the text-only model. The second sub-model is a multi-layer perceptron (MLP) neural network, with 5 inputs that represent the 5 dimensions (sentiment, confusion, opinion, question and answer). Then we added these features one-by-one to the MLP model as single inputs (one dimension at a time) to check the individual effect of each particular dimension. The next layer is a hidden layer with 64 neurons. This is followed by a fully-connected layer with the sigmoid as an activation function to classify the posts as in the text model. The outputs from these two sub-models were combined via concatenation, to construct the overall model. Finally, a fully connected layer with the sigmoid activation function was used at the end of the network to classify the output, as in the sub-models.

Fig. 2. Overall model.

232

L. Alrajhi et al.

After training, we applied McNemar’s statistical hypothesis test to check if the observed differences between any two classifiers were statistically significant. We also applied the Bonferroni correction, to compensate for multiple comparisons.

4 Evaluation and Discussion In this section, we present the charts and the results of the analysis of the relations between non-urgent and urgent posts with different dimensions, to address RQ1. Then, we review the results obtained after training each model to address RQ2. 4.1

Analysis

We analysed the relationship between the rates of non-urgent/urgent posts across the 5 different dimensions. As shown in Fig. 3 (left: Sentiment (1–7)), we observe that the number of urgent posts exceeds the number of non-urgent posts in the negative sentiment scale (1–3) and vice-versa: the number of urgent posts is less than that of nonurgent posts on the positive sentiment scale (5–7). We interpreted sentiment (4) as neutral. To reach this conclusion, we compared the values of (4) and (4.5) on the sentiment scale and found a higher proportion of non-urgent learners with a sentiment of (4.5). The figure also shows that for (right: Confusion (1–7)) the ratio of non-urgent posts is higher than that of urgent posts for non-confused posts, i.e. with confusion value between (1–3), in contrast to confused posts (5–7). We compared value (4) and (4.5) for confusion as well, and here, unlike for sentiment, results show a higher number of learners requiring urgent attention for the (4.5) value.

Fig. 3. The relationship between the ratio of the number of (non-urgent & urgent) posts and sentiment scale (1–7) (left), confusion scale (1–7) (right).

We performed a similar analysis for the remaining dimensions (opinion, question and answer), which are binary (Fig. 4). For opinion, most of the posts are non-urgent. For question, there are more urgent posts; this highlights that questions often represent

A Multidimensional Deep Learner Model of Urgent Instructor Intervention

233

posts where learners require urgent intervention. In answer, we found that, in general, most posts are not answered, indicating that most learners do not like to answer their peer’s questions; this highlights the importance of instructor intervention. Answer posts, as expected normally represents non-urgent posts.

Fig. 4. The relationship between the ratio of the number of (non-urgent & urgent) posts and opinion (1/0) (left), question (1/0) (middle) and answer (1/0) right.

Next, we computed the averages on the sentiment dimension: the mean of the urgency sentiment was 3.83 and the mean of non-urgency sentiment was 4.25 (see Table 1). Importantly, this difference is statistically significant (Mann-Whitney U test: p < 0.05). Then, we repeated the same steps for all dimensions, as shown in Table 1. We then applied a Bonferroni correction and found that p < 0.01, indicating that the set of all comparisons is significant.

Table 1. Average different dimensions with (non-urgent/urgent). Dimension Sentiment Confusion Opinion Question Answer

Mean (non-urgent) 4.25 3.75 0.61 0.06 0.23

Mean (urgent) 3.83 4.59 0.29 0.77 0.05

p< p< p< p< p