Computer-aided data analysis in chemical education research (CADACER) : advances and avenues 9780841232433, 0841232431, 9780841232440

435 73 32MB

English Pages 165 [176] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Computer-aided data analysis in chemical education research (CADACER) : advances and avenues
 9780841232433, 0841232431, 9780841232440

Table of contents :
Content: Introduction to Computer-Aided Data Analysis in Chemical Education Research (CADACER) / Gupta, Tanya / Learning Management System: Education Research in the Era of Technology / Mehta, Akash, Independent Researcher and Consultant, Brookings, South Dakota 57007, United States
Kalyvaki, Maria, High Performance Computing Domain Specialist, South Dakota State University, Brookings, South Dakota 57007, United States / Crossing Boundaries in Electronic Learning: Combining Fragmented Test Data for a New Perspective on Students’ Learning / Hedtrich, Sebastian
Graulich, Nicole / Leveraging Open Source Tools for Analytics in Education Research / Elluri, Sindhura / Making the Most of Your Assessment: Analysis of Test Data in jMetrik / Leontyev, Alexey, Department of Chemistry, Computer Science, and Mathematics, Adams State University, Alamosa, Colorado 81101, United States
Pulos, Steven, School of Psychological Sciences, University of Northern Colorado, Greeley, Colorado 80639, United States
Hyslop, Richard, Department of Chemistry and Biochemistry, University of Northern Colorado, Greeley, Colorado 80639, United States / Putting the R in CER: How the Statistical Program R Transforms Research Capabilities / Harshman, Jordan, Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska 68588, United States
Yezierski, Ellen, Department of Chemistry and Biochemistry, Miami University, Oxford, Ohio 45056, United States
Nielsen, Sara, Department of Chemistry, Hanover College, Hanover, Indiana 47243, United States / Likert-Type Survey Data Analysis with R and RStudio / Komperda, Regis / Assessment of the Effectiveness of Instructional Interventions Using a Comprehensive Meta-Analysis Package / Leontyev, Alexey, Department of Chemistry, Computer Science and Mathematics, Adams State University, Alamosa, Colorado 81101, United States
Chase, Anthony, STEM Education Innovation and Research Institute (SEIRI), Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States
Pulos, Steven, School of Psychological Sciences, University of Northern Colorado, Greeley, Colorado 80639, United States
Varma-Nelson, Pratibha, STEM Education Innovation and Research Institute (SEIRI), Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States, Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States / A Study of Problem Solving Strategies Using ATLAS.ti / Gupta, Tanya / Editor’s Biography /

Citation preview

Computer-Aided Data Analysis in Chemical Education Research (CADACER): Advances and Avenues

ACS SYMPOSIUM SERIES 1260

Computer-Aided Data Analysis in Chemical Education Research (CADACER): Advances and Avenues Tanya Gupta, Editor Department of Chemistry and Biochemistry South Dakota State University Brookings, South Dakota

Sponsored by the ACS Division of Chemical Education

American Chemical Society, Washington, DC Distributed in print by Oxford University Press

Library of Congress Cataloging-in-Publication Data Names: Gupta, Tanya (Chemistry professor), editor. | American Chemical Society. Division of Chemical Education. Title: Computer-aided data analysis in chemical education research (CADACER) : advances and avenues / Tanya Gupta, editor, Department of Chemistry and Biochemistry, South Dakota State University, Brookings, South Dakota ; sponsored by the ACS Division of Chemical Education. Other titles: CADACER Description: Washington, DC : American Chemical Society, [2017] | Series: ACS symposium series ; 1260 | Includes bibliographical references and index. Identifiers: LCCN 2017052434 (print) | LCCN 2017055742 (ebook) | ISBN 9780841232433 (ebook) | ISBN 9780841232440 (alk. paper) Subjects: LCSH: Chemistry--Computer-assisted instruction. Classification: LCC QD49.6.C6 (ebook) | LCC QD49.6.C6 C66 2017 (print) | DDC 540.78/5--dc23 LC record available at https://lccn.loc.gov/2017052434

The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences—Permanence of Paper for Printed Library Materials, ANSI Z39.48n1984. Copyright © 2017 American Chemical Society Distributed in print by Oxford University Press All Rights Reserved. Reprographic copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Act is allowed for internal use only, provided that a per-chapter fee of $40.25 plus $0.75 per page is paid to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. Republication or reproduction for sale of pages in this book is permitted only under license from ACS. Direct these and other permission requests to ACS Copyright Office, Publications Division, 1155 16th Street, N.W., Washington, DC 20036. The citation of trade names and/or names of manufacturers in this publication is not to be construed as an endorsement or as approval by ACS of the commercial products or services referenced herein; nor should the mere reference herein to any drawing, specification, chemical process, or other data be regarded as a license or as a conveyance of any right or permission to the holder, reader, or any other person or corporation, to manufacture, reproduce, use, or sell any patented invention or copyrighted work that may in any way be related thereto. Registered names, trademarks, etc., used in this publication, even without specific indication thereof, are not to be considered unprotected by law. PRINTED IN THE UNITED STATES OF AMERICA

Foreword The ACS Symposium Series was first published in 1974 to provide a mechanism for publishing symposia quickly in book form. The purpose of the series is to publish timely, comprehensive books developed from the ACS sponsored symposia based on current scientific research. Occasionally, books are developed from symposia sponsored by other organizations when the topic is of keen interest to the chemistry audience. Before agreeing to publish a book, the proposed table of contents is reviewed for appropriate and comprehensive coverage and for interest to the audience. Some papers may be excluded to better focus the book; others may be added to provide comprehensiveness. When appropriate, overview or introductory chapters are added. Drafts of chapters are peer-reviewed prior to final acceptance or rejection, and manuscripts are prepared in camera-ready format. As a rule, only original research papers and original review papers are included in the volumes. Verbatim reproductions of previous published papers are not accepted.

ACS Books Department

Contents Preface .............................................................................................................................. ix 1.

Introduction to Computer-Aided Data Analysis in Chemical Education Research (CADACER) ............................................................................................ 1 Tanya Gupta

2.

Learning Management System: Education Research in the Era of Technology ................................................................................................................ 9 Akash Mehta and Maria Kalyvaki

3.

Crossing Boundaries in Electronic Learning: ..................................................... 21 Sebastian Hedtrich and Nicole Graulich

4.

Leveraging Open Source Tools for Analytics in Education Research ............... 39 Sindhura Elluri

5.

Making the Most of Your Assessment: Analysis of Test Data in jMetrik ......... 49 Alexey Leontyev, Steven Pulos, and Richard Hyslop

6.

Putting the R in CER ............................................................................................. 65 Jordan Harshman, Ellen Yezierski, and Sara Nielsen

7.

Likert-Type Survey Data Analysis with R and RStudio .................................... 91 Regis Komperda

8.

Assessment of the Effectiveness of Instructional Interventions Using a Comprehensive Meta-Analysis Package ............................................................ 117 Alexey Leontyev, Anthony Chase, Steven Pulos, and Pratibha Varma-Nelson

9.

A Study of Problem Solving Strategies Using ATLAS.ti .................................. 133 Tanya Gupta

Editor’s Biography ....................................................................................................... 157

Indexes Author Index ................................................................................................................ 161 Subject Index ................................................................................................................ 163

vii

Preface This motivation of this book comes from several discussions I had with research students and colleagues in the field of chemical/ science education. The discussions started with the research methods however these boiled down to the availability of computer-aided tools that were available. That further led to thinking if these tools were accessible, easy to use and/or specific-to-specific research methods. I hope that this book on Computer-Aided Data Analysis in Chemical Education Research (CADACER): Advances and avenues will contribute to advancing these discussions and provide some new ideas to the readers on the use of data analysis tools. A single book cannot answer all questions on this topic. Computer-Aided Data Analysis is vast area and there are several resources and avenues available. This book can definitely influence the way we think about our research, particularly our approach and our choice of tools for data organization, analysis and interpretation. I have not been reserved on the idea of seeking contributors for this book. Doing so would have constrained the very belief I stand up for, which can be best summed in these words of Dean Kamen (An American engineer inventor and a businessman) –“Every once in a while, a new technology, an old problem, and a big idea turn into an innovation”. Every contributor has presented interesting ways to look at research problems in education research by using Computer-aided tools and this is the focus of the book – to solve problems, to answer research questions in innovative ways using modern technology, to look at and make sense of data by using tools that add both order and efficiency.

Dr. Tanya Gupta Assistant Professor Chemistry & Biochemistry South Dakota State University Box#2202 361 SAV Brookings, South Dakota 57007

ix

Chapter 1

Introduction to Computer-Aided Data Analysis in Chemical Education Research (CADACER) Tanya Gupta* Department of Chemistry and Biochemistry, South Dakota State University, Brookings, South Dakota 57007, United States *E-mail: [email protected].

This chapter is an introduction to computer aided data analysis. It also provides an overview of computer-based data analysis tools (R, jMetrik, Apache Drill, ATLAS.ti, D2L) used in various educational research projects that form various chapters in this book. The intent is to provide information about availability and application of various computer-based tools of qualitative and quantitative research to diverse researchers, including both novices and experts. This book is valuable for chemical and science or education researchers alike. The chapters in this book can be read in any order as the reader may see a connection of any given tool portrayed in a chapter to their own research project. That is where the journey of seeking information or exploring these tools for a specific idea of a chemical education or other discipline based education research projects may begin for our reader, and hopefully culminate into applying one or more tools for digging into the data to make sense of numbers and words

Introduction Educators began using computers for developing and disseminating curriculum in the 1960s (a collaborative effort between the University of Illinois, Stanford University for elementary level education at California and Mississippi based schools on a large scale). The next five decades (1960-2010) revolutionized © 2017 American Chemical Society

teaching and learning with advancement of interactive communications technology (ICT), and the development of several new and unique applications including courseware (course software), reference works such as encyclopedia and dictionaries on Compact Discs and online, classroom aids such as interactive whiteboard, and alternative methods for assessment and testing for example – Moodle, QuestionMark, and Assessment Master. Educational research also underwent a transformation from being manual to computer based. The advancements in educational research focused hardware and software can be attributed to advances in computer hardware during the 1990s. The graphic capacity of computers became better, the cost of manufacturing computers became cheaper, and the Internet became popular spurring the need to quickly harness the vast amount of data generated during this period for various research purposes (1–3). Prior to technological advancements conducting research in any discipline was a tedious process. Every step of research, from planning for research to data collection and analysis had to be conducted manually. Tools such as notebook or journals were used to plan a systematic study and various data gathering tools for collecting audio or audio-visual data involved bulky equipment (video cameras and cassette recorders). Post-collection, the data was cleaned and sorted in physical bins for analytic purposes. The data analysis involved cutting and pasting segments of data and sorting these data segments into bins, and re-organizing (coding qualitative data) until meaningful categories were generated. For quantitative research, the data had to be manually organized and calculators were used to conduct statistical calculations. Irrespective of research methods being qualitative, quantitative or mixed methods, the entire process was complicated, tedious, and time consuming. Recent changes in technology in several forms has made it easier for researchers to gather and organize data in ways that allows systematic analysis and encourages researchers to collaborate across the geographical locations for collective sense making. For people involved in conducting quantitative research (statistical data) software programs such as SPSS, SAS, R, Stata are extremely useful. SPSS and SAS are commercially available programs that involve a licensing fee, whereas R and Stata are open access software and freely available. Commercial packages in particular include pre-programmed commands that facilitate very basic to advanced forms of statistical analysis. On the other hand free-access software such as R provide the flexibility to researchers to develop and use their own codes to conduct data analysis that is specific to the data and the project. These programs also have a possibility of generating visualizations that are useful to interpret data quickly and make presentation of research data and findings attractive for the audience. Researchers now have advanced tools at their disposal to engage in research process. This book presents tools available to chemical education researchers for engaging in qualitative, quantitative or mixed methods research. It is not an exhaustive book with respect to various software that are available to facilitate research using any of the above-mentioned methods. However, it is anticipated that this book will spur ideas, and provide some examples to the audience for using these tools as per their research needs. 2

Focus of This Book Technology, in the form of desktop software and hardware has become a part of research analysis, as are the underlying concepts and techniques that have been with us for years. Although, sometimes analysis is complex, the challenge is not so much in the area of understanding concepts and techniques but is more often in terms of their applications. The fundamental question of putting various resources of data analysis to use dominates the minds of many experienced and novice researchers alike. For many folks delving on various aspects of designing and conducting research and gradually entering the realm of literature review, research design and data analysis leads one to wondering if there is a software available that could model some of these research problems. Is such a software user friendly and flexible for use? Maybe the software has some potential advantages when focusing on specific aspects of research, however this can only be determined based on hearing the experience of people who have embarked on various research projects by putting different available software to use. The availability of software has opened doors for advanced research that was practically impossible with limited resources and manual nature of research. The emphasis of this book is on how one might use various computer software or programs in different ways to support the management of the data and the analysis process. Research software is used to help support the research endeavor. The researcher needs to consider the type of data in order to decide whether qualitative or quantitative software will be helpful. The chapters in this book do not focus on data gathering techniques; the reference to such techniques in various chapters is minimal. Prior ACS series have provided more information in these area (4, 5). It is possible to collect qualitative data such as interview using computer-aided support system, and to conduct surveys using web-based tools such as QuestionPro or learning management system such as D2L for pre-post data. Our emphasis is mainly on the type of tools one can use once the data is collected. Different types of software are available for such analyses. Several quantitative packages are available for data analysis and management and they offer similar capabilities (SPSS, SYSTAT, R, SAS, STATA, MS-Excel with Add-in for advanced analysis). For example, SPSS (The Statistical Package for the Social Sciences) is very popular (6–8). It is important for researchers to learn about features and capabilities of such software packages. Commercially available packages may come in different versions and have different capabilities. Researchers need to be aware of differences in data handling and analysis procedures among softwares. Sometimes software require front-end work by researcher for data organization and data labeling before conducting the analysis. A data could have missing values or might need labels. For example, SPSS (Statistical Package for Social Sciences) needs users to provide specific labels for gender or institutions, and other demographic data before doing analysis. The default outputs of various software depends on the sequence of events selected by the user in the software interface. These ouputs may not include a complete analysis sought by the user. Such default outputs that can be generated by clicking 3

a few buttons in the software interface lack additional tests that are useful. An example is the calculation of effect size, which is important in addition to significance test and p-values for statistical analysis. Awareness of such important aspects of any program is essential before adding it to one’s research toolkit. Researchers engaged in qualitative research have to deal with a range of issues when it comes to choosing a qualitative research software or tool. Qualitative Data Analysis Software includes features that support the process of qualitative research. These include data transcription and transcription analysis, coding and text interpretation, recursive abstraction, content analysis, and discourse analysis. The use of software saves time, increases flexibility and improves the validity of qualitative research. There are several freeware and commercially available programs for qualitative research (ATLAS.ti, NVivo, QDA-Miner, MAXQDA, HyperResearch, XSight, Focuss On, Dedoose etc). Researchers have their own preference for such packages depending on the accessibility and features that one seeks in such packages (9, 10). Qualitative software provides proximity to data, helps with triangulation, and keeping the research aligned with the context of the study. When using qualitative software tools, the data needs to be converted into a file type that can be uploaded or imported into the software. Some software may allow collecting interview data (audio file) and offer possibility to transcribe within the software. However the analysis or coding process requires transcribed data or a file that is ready to be coded. Very few packages make it easy to use raw qualitative data. These raw data could be images, script or other artifact, yet it needs to be in a format that is recognized by the software. Unlike quantitative software wherein one can press buttons to get default outputs for statistical tests, the qualitative data requires researcher to think about the data by planning and defining research parameters and codes based on the research methods, either before or during the data analysis process (Grounded theory) (11–13). The next section provides an overview of the organization of this book and a glimpse into what each chapter entails.

Organization of Book The book has nine chapters. categories • •



These chapters represent three different

Using Learning Management Systems (LMS) as analysis tools (14, 15) Open source tools for data organization and analysis (16–21) (R and JMetrik, Comprehensive Meta Analysis package - CMA and Apache Drill) Commercially available software package for qualitative data analysis (ATLAS.ti) (22, 23)

4

Learning Management Systems as Analysis Tools The chapter titled Learning Management System: Education Research in the era of technology by Mehta and Kalyvaki (Chapter 2) provides an introduction to the use of LMS for data organization and analysis especially for novice researchers struggling with data collection, storing and making sense of preliminary data. Mehta and Kalyvaki present a researcher oriented perspective of the value of LMSs and invite people with access to LMS to explore these beyond the regular surface applications for course management, content delivery and grading. For example, the discussion boards in the LMS such as D2L (Desire to Learn) can be used to gain insights into student thinking and help researchers generate pre- and post assessment data for specific research questions. Likewise, the survey tool in LMS can be used for both Likert Scale and open-ended surveys. The chapter by Hedtrich and Graulich extends the information presented by Mehta and Kalyvaki and focuses on extending the capabilities of a Learning Management System through a software application LMSA-Kit. The LMSA-Kit developed by Hedrick and Graulich is a software solution that captures data on electronic learning from data logs in the LMS. Though their chapter on Crossing Boundaries in Electronic Learning (Chapter 3), Hedtrich and Graulich have nicely addressed a gap between the data generated in a blended learning and face-to-face instruction. Much of this potential data that provides insight into student cognition and course capabilities is lost due to the limitations of the LMS. The LMSA-kit addresses this gap by establishing a stronger connection between instruction components offered through LMS and face-to-face instruction. The LMSA kit extends abilities of traditional LMS by incorporating timely feedback for students, and for instructors by identifying at-risk students early on. This chapter presents benefits and the limitations of using LMSA-kit and its future potential as the authors work on improving current capabilities of LMSA-kit. Open Source Tools for Data Organization and Analysis (R and jMetrik, Comprehensive Meta-Analysis Package - CMA and Apache Drill) The chapters in this category provide examples and applications of open-source programs and packages that are great tools for data management and analysis. As defined by Elluri in her chapter on Leveraging Open Source Tools for Analytics in Education Research (Chapter 4), open source refers to a program or a software in which the source code is available to the general public for use or modification from its original design and is free of charge. Thus an open source software can be modified from its original design to incorporate additional functionality based on the needs of the users. It is an open platform for users to share and contribute ideas for the development of reusable software packages that can be harnessed or improvised by others. Elluri discusses the importance of data analytics in the domain of educational research for gaining insights into various factors that influence students’ academic performance and conceptual understanding. The chapter begins with an introduction to the data types that intrigue education researchers and the procedures usually followed for sorting and analyzing such data through various 5

open-source tools that are available for qualitative and quantitative research. Elluri provides an overview of the analytic processes for such data, choosing the right software for analysis, and introduces various open source tools available for handling research data (R, Python, Wrangler, Apache Drill, Weka, AQUAD & Data Applied). This chapter includes a detailed introduction to using Apache Drill – an open source software that supports data intensive distributed applications for interactive analysis of large scale data sets. In addition, the chapter provides examples of the application of Apache Drill for exploration, data cleaning, query visualization and data transformation. Next chapter by A. Leontyev, S. Paulos, and R. Hyslop is titled Making the most of Assessment Data: Analysis of test data (Chapter 5) is on the use of jMetrik, an open source computer program for psychometric analysis. Leontyev and co-authors provide an overview of the jMetrik program for analysis of test data. The program has a user-friendly interface, integrated database and a variety of statistical procedures and charts. The chapter highlights applications of jMetrik data analysis by using data set from students’ responses to the Stereochemistry Concept Inventory. Various aspects of using jMetrik program such as data uploading, scoring the test, scaling the test at the scale, item and distractor level, and the main steps of data analysis are discussed along with several examples of data interpretation. The chapter by Harshman, Yezierski & Nielson titled Putting the R in CER: How a statistical program transforms research capabilities (Chapter 6) provides information on another open-source software R. R allows users to build, change or adapt features and functions on their local copies. The chapter describes the capabilities of R to transform data analysis in education research. Authors discuss several advantages (and some limitations) of using R for education research to effectively analyze and visualize data by defining custom functions, writing programmatic loops, and also for enhancing reproducibility and effective documentation of research process via interactive notebooks. According to Harshman et. al., R has several advantages over other existing alternatives such as SPSS and Excel. The chapter includes several examples of the applications of R from data organization to data analysis. Chapter titled Survey Data Analysis by R and R-Studio (Chapter 7) by R. Komperda extends the application and benefits of using R for quantitative and survey data. Kompreda provides an in-depth description of the visualization and analysis of survey data using R and R studio. R is a programming language and R studio is an Integrated Development Environment (IDE) that provides comprehensive facilities to computer programmers for software development and modification. Komperda presents R as a viable alternative to the traditional software such as Excel, SPSS, Stata, Mplus and LISREL for conducting survey data analysis including the psychometric evidence of instrument quality. The chapter demonstrates several uses of R and R studio using a sample data set available within R. Examples of applications of R and R studio such as visualizations of response distributions, descriptive statistics, principle components analysis and exploratory factor analysis using pscyh package and confirmatory factor analysis using lavaan package are included. 6

The focus of the next chapter in this category is on the Asessment of instructional interventions using a Comprehensive Meta Analysis (CMA) package (Chapter 8). In this chapter the authors Leontyev, Chase, Pulos & Verma-Nelson introduce the CMA software solution to perform meta-analysis on a set of papers previously published on the effectiveness of Peer-Led Team Learning (PLTL) approach. Meta-analysis is a statistical procedure that combines data from multiple studies. It is used on several research studies of the same topic that have the treatment effect (or effect size) consistent from one study to the next. The chapter includes examples of calculating the effect sizes, forest plots, and a variety of other analysis to provide an overview of the data organization and performing the meta-analysis procedures in the CMA software.

Commercially Available Software Package for Qualitative Data Analysis (ATLAS.ti) The computer-aided data analysis tools presented in prior chapters include commercially licensed Learning Management System applications and extension kits and the open-source software. These programs can be utilized for both qualitative and quantitative research. The last chapter in this book (Chapter 9) is focused on a commercially licensed qualitative research software ATLAS.ti. Through her study on student problem-solving behavior, Gupta has demonstrated the process of data organization and qualitative data analysis using ATLAS.ti. The chapter includes process of transcription, data coding, memos and the visual displays of relationships between various concepts unraveled during the analytic process in the problem-solving study.

Conclusion This book is for researchers grappling with the challenges for deciding what specific tools they should use for engaging in data organization and analysis. With several open source and subscription based computer software and programs available, such a decision is often difficult. This book is by no means a presentation of the applications of every single computer software or tools that are available. It provides perspectives of researchers engaged in various educational research projects who have the first hand experience of using specific tools presented in various chapters in this book, and the pros and cons of using these tools for research. Researchers new to education research will perhaps gather some ideas from here to conduct their own analysis specific to their project gaols. Experienced researchers may find that this book presents to them more insights or a different perspective on using some analysis tools and approaches mentioned in the chapters in this book. It is hoped that this book will foster ideas (big and small) and discussions among various educational researchers on the tools that can be used most effectively to answer specific research questions using qualitative, quantitive, or mixed metods research approaches. 7

References 1. 2. 3. 4.

5. 6. 7. 8. 9. 10. 11. 12. 13.

14. 15. 16. 17. 18. 19. 20.

21. 22. 23.

Seels, B. Educ. Technol. 1989, 11–15. Niemiec, R. P.; Walberg, H. T. J. Res. Comput. Educ. 1989, 21, 263–276. Bainbridge, W. Science 2007, 317 (27), 471–476. Nuts and Bolts of Chemical Education Research; Bunce, D. M., Cole, R. S., Eds.; ACS Symposium Series 976; ACS Publications: Washington, DC, 2008. Tools of Chemistry Education Research; Bunce, D. M., Cole, R. S., Eds.; ACS Publications: Washington, DC, 2014; Vol. 1166. Dayal V. In An Introduction to R for Quantitative Economics; SpringerBriefs in Economics; Springer: New Delhi, India, 2015. Gandrud, C. Reproducible Research with R and R Studio, The R Series; CRC Press, Taylor & Francis Group: New York, 2014. Prvan, T.; Reid, A.; Petocz, P. Teach. Stat. 2002, 24, 68–75. Creswell, J. W.; Plano Clark, V. L. Designing and Conducting Mixed Methods Research; Sage Publications: Thousand Oaks, CA, 2011. Liamputtong, P. Qualitative Research Methods, 4th ed.; Oxford University Press: New York, 2013. Fielding, N. G.; Lee, R. M. Computer analysis and qualitative research; Sage Publications: London, U.K., 1998. Patton, M. Q. Qualitative research and evaluation methods; Sage Publications: Thousand Oaks, CA, 2002. Kelle, U.; Prien, G.; Bird, K. Computer-Aided Qualitative Data Analysis: Theory, Methods and Practice; Sage Publications: Thousand Oaks, CA, 1995. Watson, W. R. TechTrends. 2007, 51, 28–34. Ellis, R. A.; Calvo, R. A. Educ. Tech. Soc. 2007, 10, 60–70. http://www.itemanalysis.com/; information on jMetrick retrieved on March 16, 2017. Meyer, J. P. Applied Measurement with jMetrik; Routledge: New York, 2014. Meyer, J. P.; Hailey, E. J. Appl. Meas. 2012, 13, 248–258. Horton, N. J.; Klienman, K. In Statistical Anlaysis and Graphics, 2nd ed.; CRC Press, Taylor & Francis Group: New York, 2015. Melnik, S.; Gubarev, A.; Long, J. J.; Romer, R.; Sivakumar, S.; Tolton, M.; Vassilakis, T. Proc. of the 36th Int’l Conf on Very Large Data Bases 2010, 330–339. Borenstein, M.; Hedges, L. V.; Higgins, J. P. T.; Rothstein, H. R. Introduction to Meta-Analysis; John Wiley & Sons Inc.: New York, 2009. Konopásek, Z. Hist. Soc. Res. Suppl. 2007, 19, 276–298. Friese, S.; Qualitative Data Analysis with ATLAS.ti; Sage Publications: Thousand Oaks, CA, 2011.

8

Chapter 2

Learning Management System: Education Research in the Era of Technology Akash Mehta*,2 and Maria Kalyvaki1 1High

Performance Computing Domain Specialist, South Dakota State University, Brookings, South Dakota 57007, United States 2Independent Researcher and Consultant, Brookings, South Dakota 57007, United States *E-mail: [email protected].

Reports show that 99% of colleges and universities use Learning Management Systems (LMS). In this new online teaching environment, instructors have the ability to track their students’ interactions with each other and with the educational resources. The progress is recorded, and instructors can access those measured variables and later extract data to evaluate the effectiveness of their students’ learning experience. Performance analytics can lead to a deeper understanding of the learning process and propose improvements. This chapter aims to navigate students and young researchers from a variety of instruments to be used to assess data correlated with their student’s academic achievements.

Introduction It was in the 1990s when Higher Education started to emerge the use of Learning Management Systems (LMS) on campuses, and now almost 99% of colleges and universities are currently using an LMS. The use of the LMS has become part of the students learning experience either they select a face to face course, an online or hybrid (1). Modern learning management systems like BlackBoard, BrightSpace by D2L, Moodle, Canvas, etc. offer the opportunity to the instructors to create rich digital courses. The use of animations, videos, and other multimedia provides © 2017 American Chemical Society

an engaging environment for the students. Nowadays, in contrast to traditional learning environments, students have use of asynchronistic and synchronistic interaction and communication within a virtual environment (2, 3). Web-based courses provide flexibility and accessibility for those students that are located off campus and those that either their schedule or their physical conditions do not allow them to participate in a traditional class (3). LMS systems are multidisciplinary by nature. Many different sciences are involved in the success of a digital course. Instructors that plan to teach by using an LMS are expected to have some experience with computer science, information systems, psychology, education and instructional technology (4). The LMS used acts like the bridge between the instructors and learners (5). Apart from being a learning port, the LMS serves as a teaching platform for the student’s progress and success (5). Part of the e-learning course planning is the selection of the appropriate techniques and tools (6). Studies have shown that learning outcomes have been the same as in traditional courses and students with prior experience using computers were more satisfied with online learning environments (7). LMS provide a broad range of tools to enhance their students learning experience, but relatively few are those that use those systems in their full capacity (1). The progress of technology evolves and every interaction and resource accessed can be captured and stored inside the LMS (8).

The Struggles of a Beginner Researcher For a beginner researcher (9–11) in the area of chemical education, often the production of research may seem overwhelming. Nearly every researcher has to create a research plan and go through a particular set of steps like obtaining institutional approvals, and a well thought out and planned strategy to collect data. The thought of data collection also brings the question of what data to gather and disregard, how to collect it, how to securely store it and where to store it? Almost every educational research project requires institutional review board (IRB) (12) approval, and the approval process requires a detailed description of these and related questions about data privacy, confidentiality, access and safety which often contain information about the research subjects (students and/or instructors). While going through this process and while actually collecting data, a beginner researcher is always looking for ways to store all the papers generated through the collection of original or copies of students’ work whether it is homework, quiz, lab reports, exams, etc. A one-semester research investigation involving such data collection creates a huge stack of papers that requires proper cataloging and enough physical space to securely store the data that will frequently be accessed by the researcher to further code, organize and analyze. The collected data helps the researcher gain insight of his proposed research questions. Such data will contribute to answer the research questions and conclude the investigation with valid evidence and reliability. It soon becomes overwhelming for a researcher when they have to collect data over several semesters. This is 10

often the case when a research investigation is being carried out by establishing a case for or against a teaching intervention, new curriculum material, course restructure, etc. This often leads to an office full of documents. Also, under such situation, it may also become difficult to ensure the data security requirements as required under IRB protocol for researchers with limited physical space. Almost every education institute, be it school district or college/university, uses Learning Management Systems (LMS) for conducting a variety of courses that they offer to their students pursuing academics in any grade/ major. Those LMS encapture helpful information that has the potential to help students, teachers and institutions make better choices that will lead to improved learning outcomes (13). While using LMS for course delivery to students, often researchers neglect the analytics tools that could help them develop their research. The underutilization of LMS despite its robust infrastructure could be attributed to poorly structured training provided to use LMS, which is often mainly focused on training instructors about how to use them for course content delivery in face-to-face or hybrid/online structure and creating and maintaining the grade books!

Why Using Learning Management Systems in Chemical Education Research? Learning management systems are very well integrated into the current academics. Some of the concerns that a researcher and IRB may have is the data authenticity confidentiality and privacy. Inside the LMS those concerns are adequately addressed at the highest priority as the access to information is strictly regulated within an academic institution. For example, a course access to an instructor or a teaching assistant (TA) is granted only upon approval of the appropriate departmental procedure and can be very well regulated for the duration (typically one semester) and type of information (student name, campus identification only) available to conduct the course successfully. Such access may be authorized for LMS data analysis. Data authenticity is very well maintained as all the information collected remain on a secure server that is only accessible to authorized individuals (instructor, TA) that they can only access through their own individually credentials. Moreover, as stated above, the access to data may be limited by adding only institute’s personnel in different roles that have pre-determined access allowances/ restrictions based on role, e.g., course administrator, course instructor, course teaching assistant, etc. Most academic boards (board of regents, school district boards) require that academic data, such as generated through LMS, be securely stored for an extended period of time (7-10 years) and after that, a determination may be made if further storage is required. This provides a real assurance for any researcher who wishes to either look at the data from the past for comparison purpose or for an ongoing sequential research investigation. In all these, the data is securely stored with access available through proper documentation. Also, this ensures that data is very well organized as each academic semester progresses. 11

Exploring Tools within the Learning Management System Most LMS offer an opportunity to create digitally rich courses with potential to incorporate interactive simulations, videos, and multimedia that provides an engaging learning environment (3) to the students. Instructors have the ability to track their students’ interactions with each other and with the educational resources. The progress is recorded, and instructors can access those measured variables and later extract data to evaluate the effectiveness of their students’ learning experience (2). Performance analytics can lead to a deeper understanding of the learning process and propose improvements. LMS serves as a tracking platform of the students’ progress and success (5). LMS provide a broad range of tools to enhance their students learning experience, but relatively few are those that use those systems in their full capacity (1). Various tools/functions are available in most LMSs where their utilization is mostly restricted to content delivery/collection (assignment submission) only while conducting a course (face-to-face or hybrid/online). Despite the merits of these tools, their use in conducting data collection for research investigation is seldom heard! A few of these tools are described below with the focus to collect factual data for research study while remaining a part of a course and without distorting the course structure:

Discussion Board The discussion board (DB) is a tool in the LMS that enables students to share their thoughts, opinions, and understanding of a topic with the subject matter. A discussions board includes discussion forums and threads (Figure 2.). When is utilized by the instructor can either promote a topic based discussion among student-student, student-instructor or any other possible combinations to promote critical thinking and social interaction. Often this tool is used as an aid to ask questions when an instructor is not available immediately and allows for anyone to respond. A discussion board is a fine example for adopting the constructivism learning theory (14) (Figure 1). The discussion board analytics could provide an important insight because it confirms what has been learned in the literature and provides interval-level data (14).

Cloud Storage Cloud storage (15, 16) systems such as Dropbox, Box, Microsoft OneDrive, Google Drive, etc. are often provided within LMS to allow for submission of assignments in a secure manner and without having to access them through a separate login. This tool is often used for submitting assignments that require students’ report/ opinion on content based topics. Based on the requirement of the course, a cloud submission may be individual or group submission. Assignments submitted through cloud could be graded online without being printed. Figure 12

3 depicts a typical course with multiple assignments within a course that may be activated at an appropriate time in the semester for students to submit their assignment.

Figure 1. Knowledge creation in discussion board.

Figure 2. D2L Discussion Board. 13

Figure 3. D2L Dropbox Folder System.

Quiz A familiar and a valuable tool in LMS for conducting either a face-to-face or hybrid/online course. Few of its features allow conducting quizzes in various modes, such as timed, conditional, randomized, sectional/individual release, etc. Often these modes are rarely used with most commonly used being randomized and timed. However, when employed in all such modes, it has the potential to generate most authentic research data without compromising on the course structure. Survey The survey tool (Figure 4) is almost never utilized tool in conducting a chemistry course thus; assuming that instructor/ researcher could use it for their research might be a long shot in utilizing this very useful tool whose potential is very crucial to most of the education research including chemical education research.

Discussion The world we live in today is highly data intensive (17). Almost everything that we do today generates a variety of data in digital format thus, leaving a trail of predictable behaviors about anyone in any setup. This insight into user’s pattern of living is of immense benefit to private companies that use such data for 14

their product design, development, and marketing to generate increased revenues. This also conveys that the job market will always be looking for candidates who are adept at such big data generation, management, and analysis using a variety of computation tools that are available today and grow further on them. All these are excellent news for a student education researcher who is undertaking such data-intensive studies during their academics, however, the caveat is that they use current day methods of data collection, management, and analysis. As mentioned earlier, most of the research in education setup is a mix of current day computer based as well as not so current manual data collection methods. While the computer based part is mostly used when it comes to data arrangement, coding and analysis and manual part are where data is stored in hard copy format, which provides no additional benefit for such storage that requires physical space to store and does not always guarantee the security needed for the regulatory boards.

Figure 4. D2L Survey Tool.

A well-planned research study has a well-planned data collection strategy that can be efficiently executed using LMS. The time requirements for such data collection during and before/after classroom meetings, e.g., lecture, lab, recitation, etc., would be almost same and may be less in some instance based on the nature of research study. A discussion board (DB) may be effectively used to forge in class discussion among students; such platform can serve the purpose of in-class observations. However, it has the potential to provide more in-depth data on students’ thinking process in their own writing in a classroom setting. During such discussions, an instructor may also interact posing probing questions that may prompt the same or other students to join in the discussion. Such DB also offers an excellent platform to students who would be too shy to speak out but may find it easy to communicate through posting in such DB. The merits of DB are often utilized by creating DBs, such as, ‘ask the professor ’or ‘student discussion board’ that allows only for out of class communication on non-urgent matters. Authors (17, 18) have used DB for in-class discussions and have first-hand observed an intense discussion among students-students and students-instructor on a topic under discussion and a participation rate higher than the one found during in-class oral discussions. A similar observation was made when authors created Dropbox folders for submission of lab-reports in a multi-section lab-course. The very first appreciate that student showed was about how much paper it saved for a one-semester 15

course that requires 11 lab-reports per student with at least 4-5 pages per report. A simple calculation for a course enrolling about 300 students puts this number to around 13,200 – 16,500 sheets of paper in original student reports, and when collecting data for research investigation, almost all of these reports will have to be copied again that will generate an equal number of pages. Collecting lab-report on Dropbox reduced this number zero pages being used for lab-report, provided very organized data for each student, moreover, cloud-based submission allowed students to resubmit their reports that allowed for researchers to directly compare their evolving thinking pattern for the same concept as well as over the period of the semester. The data collected from cloud storage could be submitted in any file format required by the course instructor, e.g., MS-Word, MS-Excel, PDF, etc., and most of this files are digitized thus, allowing for easy data integration into quantitative/ qualitative analysis software packages, viz., ATLAS.ti, IBM SPSS, NVivo, etc. On the contrary, a scanned document often requires a tedious task of manually entering each piece of information into such analytical software that takes additional time while leaving room for errors. The cloud-based data collection is very efficient and less error prone which offers for faster data analysis compared to the manual collection. The quiz function in LMS is a widely used feature and mostly associated with ‘end of chapter/lecture’ quiz or sometimes with homework assignments wherever, the course doesn’t have publisher provided homework system, such as OWLv2 (19), Mastering Chemistry (20), etc. Most quizzes are offered as timed activities and sometimes with randomization enabled. However, in this format, it is an underutilized function that even doesn’t serve a course instructor as well and not at all to the researcher trying to derive more information from such virtual tests. Most LMS offer quiz functions with a variety of possibilities of conducting them as well as generating data that may be useful to the course instructor and most definitely to a researcher. Such possibilities are often not known to either of them. In addition to the quiz being timed, conditional, randomized, sectional/individual release, a quiz can be set to generate reports on individual students’ time log, responses, and group/class log. These are readily available to any instructor/ researcher that provide detailed information about how students are approaching each question. It is certainly helpful to an instructor if they wish to make changes in their teaching strategy based on this information moreover; it is very useful to a researcher (21) who wishes to find a correlation between the intervention method and learning strategy adopted by students or any such variable that may be crucial to an investigator’s research. During their teaching, authors utilized quiz in this variety of functionality and generated data that helped them two folds’ way: 1) improvising their teaching strategy in an ongoing course while addressing the pressing concerns of the majority of students, 2) gain better insight into the intervention strategy adopted (3). These tools are very efficient compared to manual data collection method where, any such insight will only come when data is organized, which in such cases is almost always at the end of the semester. Thus, the insight gained from manual data collection cannot lead to addressing learning concerns of students of the current semester, however; they may be only incorporated in subsequent semester only. This doesn’t help students who face the problem at first in a semester for which such data is collected nor does it 16

give instructor/researcher any information about the improvised intervention’s suitability as it is first time tried in the next semester and not with the student who actually faced the challenge. Surveys are widely used in education research, a variety of surveys based on standardized and widely accepted instruments are regularly used. LMS offers to develop such surveys with almost the same amount of time it takes to create them first time after that, only requiring them to be copied into each course where such surveys need to be conducted. This is immensely time-saving and learning from the example of cloud-based submission lab-report, LMS based surveys also have no use of papers. Like LMS based quiz, surveys can also be conducted in a variety of question-answer format, viz., multiple choice, multiple select, Likert, short/ long answer, fill in the blank, etc. Report generated through such surveys can be readily imported into the analytical software. A conditional release is a tool that is rarely used in LMS. However, authors have utilized this function in the following order: At the first student would participate in ‘during and after’ class discussions this will let them submit the assignments on cloud storage. And upon submission, the quiz associated with the assignment will be open for them to take. Once they complete the quiz, they will be able to take the survey. Thus, it is not possible for a student to take any component without following a systematic order. And at every step, an instructor/ researcher is getting a better insightful data on individual student, and in many instances, such information is very helpful in providing instant help to students struggling in subject matter and may be considering to drop the course.

Conclusion With this paper, our effort has been to highlight the utility of Learning Management System in conducting an efficient, authentic, and secure research in chemical education. When fully utilized, LMS has the potential to provide substantial insight into a course on an ongoing basis that may be helpful to the instructor for making in-course interventions to keep their students interested, engaged and enrolled in course as well as a reliable tool to collect data for pursuing research investigation. The author will be publishing their own research studies conducted using LMS for all of their data collection in a separate manuscript. Academic institutions should incorporate into their training on LMS about how the various features may be utilized to collect data and conduct education research. Data-intensive research is already proving its potential in all walks of our lives, and it is important that such tools be completely utilized in conducting chemical education research. LMS based research meet the merits of data authenticity, security, and efficiency. A well-planned research study using LMS will certainly help collect better quality data that will offer better insight into current teaching practices and new potentials in research-based teaching interventions. The increased integration of LMS in conducting chemical education research will happen with more tools and functionalities being added, and as researchers evolve in their use of LMS for conducting their research study. 17

Acknowledgments Authors wish to thank Dr. Tanya Gupta for the constant support, encouragement, and critical feedback, which were instrumental in writing this manuscript.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

12.

13.

14. 15.

16.

17. 18.

Dahlstrom, E.; Brooks, D. C.; Bichsel, J.; Brooks, D. C.; Bichsel, J. Educ. Res. Rep. 2014. Ozkan, S.; Koseler, R. Comp. Educ. 2009, 1285–1296 DOI:10.1016/ j.compedu.2009.06.011. Broadbent, J.; Poon, W. L. Internet High. Educ. 2015, 1–13 DOI:10.1016/ j.iheduc.2015.04.007. Ozkan, S.; Koseler, R. Comp. Educ. 2009, 1285–1296 DOI:10.1016/ j.compedu.2009.06.011. Cavus, N.; Uzunboylu, H.; Ibrahim, D. J. Educ. Comp. Res. 2007, 301–321. Ćukušić, M.; Garača, Ž.; Jadrić, M. Comp. Educ. 2014, 100–109 DOI:10.1016/j.compedu.2013.10.018. Gikandi, J. W. W.; Morrow, D.; Davis, N. E. E. Comput. Educ. 2011, 2333–2351 DOI:10.1016/j.compedu.2011.06.004. Pardo, A.; Siemens, G. Brit. J. Educ. Technol. 2014, 438–450 DOI:10.1111/ bjet.12152. Kluge, M. A. J. Aging Phys. Act. 2001, 329–335 DOI:10.1123/japa.9.3.329. Schwarz Mccotter, S. Qual. Rep. 2001, 1–24. Horvat, E. M.; Heron, M. L. The Beginner’s Guide to Doing Qualitative Research : How to Get into the Field, Collect Data, and Write up Your Project; Teacher’s College Press, 2013. Commissioner, O. of the. Search for FDA Guidance Documents Institutional Review Boards Frequently Asked Questions - Information Sheet; U.S. Food and Drug Administration, 2016. Analytics: getting helpful information out of an LMS; Salvetore, 2012; https://salvetore.wordpress.com/2012/07/26/analytics-getting-helpfulinformation-out-of-an-lms/ (accessed May 17, 2017). Harman, K.; Koohang, A. Interdiscip. J. Knowl. Learn. Objects 2005, 1, 67–77. Karabayeva, K. Z. Creation of a Unified Educational Space Within a SLA University Classroom Using Cloud Storage and on-line Applications. Int. J. Environ. Sci. Educ. 2016, 11, 8669–8678. Karamete, A. Computer Education and Instructional Technology Teacher Trainees’ Opinions about Cloud Computing Technology; 2015; pp 2043–2050 DOI: 10.5897/ERR2015.2297. Mehta, A. Use of LMS in Conducting Pretequisite Chemistry Course and Education Research. Manuscript in preparation. Mehta, A.; Tummala, H. Using LMS to Assess Students’ Understanding of Concepts, Critical Thinking, and Problem-Solving in the Formulation and Compounding in Pharmacy Course. Manuscript in preparation. 18

19. OWLv2 - Cengage Learning; http://www.cengage.com/owlv2/ (accessed May 17, 2017). 20. Mastering Chemistry; Pearson, 2017 http://www.pearsonmylaband mastering.com/northamerica/masteringchemistry/ (accessed May 17, 2017). 21. Gámiz Sánchez, V.; Montes Soldado, R.; Pérez López, M. C. SelfAssessment via a Blended-Learning Strategy to Improve Performance in an Accounting Subject. RUSC. Univ. Knowl. Soc. J. 2014 DOI:10.7238/rusc.v11i2.2055.

19

Chapter 3

Crossing Boundaries in Electronic Learning: Combining Fragmented Test Data for a New Perspective on Students’ Learning Sebastian Hedtrich and Nicole Graulich* Institute of Chemistry Education, Justus-Liebig-University Gießen, Heinrich-Buff-Ring 17, 35392 Gießen, Germany *E-mail: [email protected].

Blended-learning has become a well-known and widely used scaffolding tool to support a student’s learning process. Thereby, blended-learning offers the opportunity to profit by best practices in face-to-face learning situations and computer-based learning. Nevertheless, teachers use the computer-based part mostly as a black box scaffolding tool to support their students. “Test passed” or “test not (yet) passed” are generally the only information teachers receive. Hence, the influence of the computer-based learning on the face-to-face lessons can be described by a simple access control or some digital-assisted homework check. Our idea to extend the benefit of both learning worlds is the development of a software solution that can offer a more detailed insight into a student’s learning during the computer-based learning. Consequently, teachers can provide face-to-face lessons that meet the needs of individual students better and new types of automatically delivered feedback can be realized.

Blended-learning has become a widespread technique to support and supplement traditional face-to-face instructions in higher education during the last few years. Blended-learning is also gaining increasing influence in the secondary sector. The students’ preparation for laboratory classes is improved by offering electronic supporting material, especially video demonstrations, via a Learning

© 2017 American Chemical Society

Management System (LMS) (1). Flipped classroom approaches, as one special form of blended-learning, have demonstrated how to improve students’ learning in chemistry lectures (2), as well as their preparation for laboratory courses (3). The students work actively with the electronic learning material between the face-to-face lessons in a traditional blended-learning scenario (Fig. 1). The computer-based part is used to support the students in their preparation for the upcoming instruction or to revise the topics of the past one. Beyond that, there is usually no strong connection between the parts. The result is that the function of the computer-based part is often reduced to a tool that helps students to prepare and to support teachers to check whether the homework has been done correctly.

Figure 1. Blended-learning occurs in two more or less distinct worlds.

Students spend a huge amount of time using the LMS. Face-to-face lessons and digital learning at home are sometimes equally time-consuming. During their time using the LMS, the students leave a lot of data trails. Considering this large data pool, the information which is delivered back to students and teachers by the LMS is astonishingly small. Students only receive help on the level of single, isolated tasks and teachers can supervise the homework (Fig. 2). These possibilities for supervision are limited to the percentage of points earned for each student in the whole assignment. There are, for instance, no options to monitor how students’ strengths and weaknesses progress in types of LMS used commonly, such as ILIAS, Moodle or openOLAT. 22

Figure 2. Different views of assessments in blended-learning. Therefore, learning in the context of blended-learning scenarios can be described best as learning in two separate worlds. Every approach to maximize the benefit of blended-learning should aim at making connections between both worlds, as this offers the most potential for learning improvement. Although students can estimate their current performance in an assessment approximately, they often fail to realize how much needs to be done to master the learning objective or to acquire a competency in a satisfactory manner (4). This shortcoming can be solved by bringing the learning world of the LMS closer to the face-to-face lessons. Our idea was to create a software solution that could establish stronger connections between both worlds. Software that offers educators a deeper insight into their students’ learning process occurring in the LMS. In this way, face-to-face lessons can be designed based on the individual student’s needs. Additionally, there are possibilities to implement new automatic information systems, such as a feedback system that delivers feedback directly to the students about their strengths and weaknesses.

Try To Make Connections Big Data in Education – Educational Data Mining – Learning Analytics Students generate a huge data pool while they are working within a blended-learning scenario. One idea to handle this vast amount of data is to look at other disciplines in the IT sector which deal with similar amounts of data. Every person surfing on the World Wide Web leaves large trails of digital data. Handling these tons of data material has become a big business and companies are making money by working with this big data. The term “Educational Data Mining” (EDM) describes the application of methods of big data mining in educational contexts (5). The traditional data mining techniques consist of methods and algorithms from statistics, data modeling, and visualization, as well as machine learning. In EDM, those techniques must be revised and supplemented for 23

educational purposes (6). Techniques from psychometrics, for instance, also play a role in EDM. The main influence of data mining in EDM is the field of clustering and classifying students into different groups (7). This offers the possibility of detecting critical behavior in the LMS and to inform the educator about students who are at risk of failing (8). Within the last few years, EDM has been focusing strongly on log data from learning management systems (5). Morris et al. derived eight variables from the access logs of LMSs (9). These variables can explain 31 % of the variability in achievement. Given the fact that most educators are not familiar with data mining tools, how to visualize the results is an important task of the EDM. Good visualizations can help educators to understand the learning process itself and can even help people who are not concerned with clustering, statistics and other data mining techniques to get information about students’ learning (10). Another discipline which deals with logged educational data is Learning Analytics (LA). The latter tries to find information about learners and the context in which their learning occurs. This information provides an understanding of learning and opportunities for optimization (10). It is not possible to separate both disciplines completely and to give distinct definitions. Both LA and EDM make extensive use of data mining techniques, and LA additionally uses social network and discourse analysis, among others, as study methods to inspect the context of learning on a course or a department level (11). Thus, LA appears to be the discipline which uses a wider view of the data material than EDM. Learning Analytics offers diagnostic tools for educators. These tools allow educators to improve their teaching or their teaching material. Learning Analytics, for instance, offers help in test construction. Abductive machine learning decreases the number of tasks in electronic testing, while the accuracy of the exam remains nearly constant (12). Consequently, educators have less work in creating new test items and the students’ workload is reduced during testing, but test accuracy does not suffer simultaneously. By contrast, the predominant aim of EDM is to provide ready-to-use solutions. Both disciplines depend on data access, especially LMS data. This dependence on data access is often a constraint for the implementation of any new educational diagnostic tool. Educators at universities are usually not assigned to the LMS’s administration and, thus, cannot implement new diagnostic software within the LMS. Software that runs on personal computers at home instead of an LMS is strongly dependent on the data material available. Therefore, educational diagnostic software must consider this narrow availability.

Ways of “Data Mining” Educators Can Do Most universities currently use LMS to offer their students electronic learning material. Educators need access to the data stored in the LMS to use their own diagnostic software. The type of LMS has a strong impact on the availability of additional data access. Two types of LMS can be found: locally and remotely hosted. 24

“Locally hosted” means that the LMS is hosted on servers under university control. The university can decide which software is installed and has complete access to all data. “Remotely hosted” means that the university rents the LMS as a web service of a specialized commercial provider. The LMS is ready to use, and maintenance service is covered. Conversely, access to stored data is limited to products which are available for sale. In the second case, the question of receiving additional logged data to run a diagnostic instrument can be easily answered. If the LMS’s provider offers special diagnostic instruments or further access to more stored data, it is possible to buy them. If not, there is no chance to make any changes to the LMS or to gather more data than those already collected. In the other case, when the LMS is hosted locally, the university owns the LMS’s data completely. Thus, reuse of the stored data is easily possible. Furthermore, most common LMSs are open source software and can, thus, be modified easily. Hence, in this case, there are options to realize new diagnostic tools directly within the LMS. However, the LMS’s data is stored primarily for operating purposes. This means that information about the performance in different assessments is spread over the whole database and is organized more to guarantee a fast and stably running LMS. Consequently, it is more than helpful to reorganize the stored data before any diagnostic reuse (13). This leads to a lot of exhausting work with the database before any extraction of new information can start. It is hardly possible for educators to stem this exhausting work alone. Additionally, access to the pure database is restricted due to ethical and privacy reasons. To sum up, there is no realistic chance for educators to be in a position to receive additional data access in either remotely or locally hosted systems. Consequently, software tools which are intended to help educators to benefit from students’ LMS data in their lessons should use accessible data. In their position as educators, they must be able to supervise their students’ learning. For this reason, almost every LMS provides control opportunities for teachers, for example, the LMS presents pages informing educators about students’ learning progression. Software can grab the information on those pages in two ways. The data material can be collected directly by reading the content of the specific pages. In this way of data collection, all the information the educator can see is transferred to the software. Unfortunately, this way of data generation is strongly dependent on the LMS used and this way of reading the data requires changes with almost every new LMS version. Contrary to the direct way of data mining, there are ways of indirect data collection within the LMS. Almost every LMS offers export functionalities. The export features are usually intended to provide backups for the teacher’s records, but these export files can also be used for data mining purposes. The export files are an enormously useful data basis that is well-structured and, thereby, ready-to-use. In contrast to the educator’s information pages within the LMS, the export file’s structure does not usually change at all, even if there are major changes in the LMS. That is why export files are reliable data sources, but they are less detailed than the educator’s direct view into the LMS. Exemplarily, the answers in assessments in export files are reduced to single scores and, consequently, the information about which answer was given is lost. 25

For this reason, software which intends to offer diagnostic support for teachers should be able to use different ways of data import and types of data material. The LMS’s export files constitute the data basis and the basic functionality because they can be utilized easily and there is a high chance that they can be processed even during version changes of the LMS. Also, other ways of direct data import should be realized to gather more detailed data. This direct import feature can generate a broader data basis while the basic data import features with the export files offer the guarantee that the software is operational even when the LMS installation has changed. These two methods of data mining mentioned are the type that educators can perform.

The Learning Management System Analysis Kit “Personal Data Mining” – Data Generation within the LMS Our idea was to offer educators possibilities to get deeper insights into their students’ learning progress and to bring them closer to their students’ learning. The only way for us to realize this idea was to develop our own software tool that requires only the preceding ways of data mining that educators can perform. Consequently, we started to work on the LMS Analysis Kit (LMSA Kit). This software should help close the gap between both learning worlds. It should offer information about the learning progress like other diagnostic instruments from LA or EDM, but without the pitfalls that are generated by the necessity for nonaccessible data. The LMSA Kit uses all possibilities for data import from the LMS and features to analyze the data material for educational purposes. A basic connection to the LMS’s data is provided by import functionalities of different export file formats offered by LMSs. Hence, basic data material, such as scores earned in assessments or single tasks, can be transferred easily to the LMSA Kit. Additionally, advanced connection to the LMS’s data will be provided by reading the content of the LMS’s pages directly in future versions. This also allows the import of single given answers, for instance, to recognize the presence of misconceptions or reoccurring mistakes. The data connections are established via a plug-in system, which makes it easily scalable and simply adaptable. A new published LMS can be included in the LMSA Kit through a new plug-in. Major changes in a supported LMS do not affect the software as a whole; they only affect the specific plug-in. Thus, it is easy to keep the software up-to-date. Educators are usually interested in the competencies a student has acquired and the learning objectives that have been mastered. However, the diagnostic tools within the LMS do not offer this type of information easily. In most cases, it is almost impossible to retrieve this information with the built-in educational diagnostic tools. They are generally intended to offer a view of complete assessments rather than customized combinations of solved tasks in detail (Fig. 2), apart from that, it is impossible to measure competency. The acquisition of a competency is indirectly measured by solving different tasks – aiming at this specific competency – across different tests (14). 26

The LMSA Kit can close this gap. It can inspect the competency acquisition during different tasks across different tests due to importing all the LMS’s data on assessments. In the same way, it is also possible to get an insight into the mastery of learning objectives. All student-solved tasks are stored in the LMSA Kit. Consequently, the tasks can be viewed independently from the context of a single test and can, therefore, be rearranged into other combinations. New collections of tasks are composed that now allow one to describe a competency or learning objective. It is also possible to combine all tasks of a single topic, so an overview is given about the learning progress on a specific topic. Educators are responsible for the theoretical construct behind the collection of tasks and, accordingly, they also define which criterion is measured (Fig. 3). The LMSA Kit does not depend on any specific learning theory, nor does it prefer specific types of criterion.

Figure 3. Combination of tasks describing one competency. In contrast to the scores the students have earned, the answer given is usually not exported directly by the LMS’s export functions. In further stages of development, the LMSA Kit should also be able to deal with the answers given themselves, instead of only the earned scores, using direct import techniques. There is large potential in the answers given, especially the wrong ones that hitherto cannot be used. It allows the elucidation of common misconceptions and learning difficulties. One of our main developmental aims is to realize this shortly, but the difficult method of data collection excludes this feature at this early stage of development. Recombination of Test Data for Educational Diagnostics The LMSA Kit offers an opportunity to measure the learning progress in a criterion. All the tasks the students have solved during classes are stored in one single database. The strong connection to the test in which the task has been solved is, thus, broken up. Educators can select the tasks that define a competency or a learning objective (Fig. 4). Subsequently, the student’s performance in the collection of tasks can be seen. 27

Figure 4. Usage of the LMSA Kit. Of course, not every combination of tasks is convenient to measure a criterion. Sometimes there is an unexpected and, therefore, hidden level structure within a competency, or a learning objective can be divided into different sub-learning objectives. To avoid this set of problems, EDM and LA make massive use of machine learning by employing classification algorithms to classify identical answer patterns within the whole data material (15). The students are, thus, divided into classes of comparable performance in the criterion. However, these algorithms must learn from existing data. This means the level of competency acquired, or the progress in learning is known already. This data material is necessary since it allows the algorithms to find the right classifications. A support vector machine or an artificial neural network, for example is dependent on known data material for learning purposes (16). The more data they assess, the more accurate their future classifications are. Even if these clustering techniques work well in EDM or LA, they are not applicable for use at home as an educational diagnostic software for educators. Teachers do not generally have access to a huge amount of old data material to train algorithms in machine learning. Additionally, the configuration of these algorithms is not carried out easily by educators who are laypeople in the usage of machine learning. Another problem is that students do not complete every test in an LMS and incomplete data is more or less useless for machine learning (16). The algorithms used within a software for teachers must manage the lack of training material for algorithms. Consequently, machine learning and other classification algorithms cannot be used in the LMSA Kit. One way this can be 28

solved is to use educators’ professionality with the right technical support. On the one hand, educators know which criterion they want their students to reach. On the other hand, educators know the criterion a task belongs to. Hence, educators can define collections of tasks which describe a criterion instead of a training process by machine learning. This contrasts with LA, which sees the pedagogical work outside the domain of LA. The pedagogical work is “coded” into the data material and educators are doing pedagogical work, while they are working with the information LA offers. There are no pedagogical decisions within the models of LA at all (17). However, educators need additional support to combine a set of tasks meaningfully. If the subparts of the learning objective or the competency are too divergent, a decrease of accuracy and measurement errors, such as false positives, will be the results. Whether the structure of a competency or a learning objective needs to be divided into smaller pieces cannot be seen easily by just comparing the specific properties of the tasks. The LMSA Kit tries to help educators to identify such problematic task compilations. Due to the lack of training material, the LMSA Kit is not able to mark task compilations for a competency or a learning objective as definitely wrong or unacceptable regarding accuracy. Nevertheless, it is possible to offer support in identifying problematic task compilations through a missing one-dimensionality. Cronbach’s alpha is a necessary, but insufficient condition for a proof of one-dimensionality. High values in Cronbach’s alpha tend to more acceptable task compilations, whereas low values tend more to identify problematic ones. The threshold value is much discussed in the literature, especially in pedagogical contexts with higher learning objectives and more complex competencies. Low values of Cronbach’s alpha can be accepted if the compilation of tasks has been carried out conscientiously (18). The sufficient condition for a unidimensional task compilation is the unanimous loading to one factor in a factor analysis. Additionally, if the factor analysis identifies more than one factor, it is a useful hint regarding into which groups the tasks should be divided. The LMSA Kit offers this analysis tool to identify problematic task compilations as a core feature. Furthermore, cross-tables show the inter-item correlations between all tasks of one compilation. Therefore, misleading tasks within the compilation can be identified by selecting items with a low overall correlation. Educators, thus, receive as much support as possible while they are defining task collections. Nevertheless, we hope to improve this process in the future by using old data material to train algorithms. As one result, it could be possible to offer at least secure intervals of Cronbach’s alpha which identify critical or non-critical task compilations. Another improvement can be to automatize the factor analysis which could possibly present alternative task compilations directly. Consequently, teachers need to verify whether new compilations are, in fact, different sub-criteria. In other words, we are using our existing data material to train algorithms and to verify their accuracy to derive general rules and new algorithms to support educators like LA does. Thus, educators do not depend on existing old data material. In contrast to LA, we are using pedagogical decisions 29

on purpose to allow the transfer of our general calculation models for application at home. Online Chess and Student’s Abilities After the tasks are collected into different compilations, the earned scores of a student in all tasks of the compilation are transferred into a number. This single number must represent the progress in the defined criterion. We have the fundamental assumption in Classical Test Theory (CTT) that that every measure contains a measurement error, but this error in measurement has an expectation value of zero. Thus, the more often a property is measured, the more accurate the mean value tends towards the true value. If all tasks in a compilation are measuring the performance in the same criterion, there are enough recurrences in measurement to get a close approximation of the true value. Unfortunately, this concept has some weak points. It must be clear that all tasks that are used to measure the competency acquisition or the mastery of a learning objective are really measuring the same property. Some differences within the task can lead to the wrong measurement. Such a difference can already be a result of the varying difficulties of the tasks. For this reason, the compilation of tasks should be subdivided into smaller groups of tasks with the same degree of difficulty. However, the resulting subgroups normally consist of only a few tasks, in most cases, only one single task. Consequently, the measurement error cannot be reduced. The Item Response Theory (IRT) can be used to overcome this problem during educational testing. In IRT, the probability that a learner can give the right answer is not only related to his or her ability level, but it is also related to the difficulty of a specific task. In this framework of testing, the difficulties of the tasks and the abilities of the learners are estimated in one single step. Taken this into account, there is no need to subdivide all task compilations into groups of tasks with the same difficulty. Conversely, IRT requires complex estimation processes, mostly based on maximum likelihood estimations, which is strongly dependent on complete and well-structured data material, in other words, all students have done every assessment once and solved every task. This criterion is met merely in the context of blended-learning. Thus, calculation models based on CTT or IRT cannot be used to estimate students’ performance in a criterion during blended-learning. The LMSA Kit solves these problems by using matchmaking algorithms to estimate students’ abilities and tasks’ difficulties. Matchmaking is the process of selecting players for a new match in online games. The process of matchmaking requires player’s abilities to be estimated validly to avoid boring player constellations in matches (19). The idea is that matches consisting of players with equal abilities are well-balanced and, thereby, motivating. Especially young internet companies which offers web-based online games are interested in well-balanced matches that fascinate players and keep them playing and paying. That is why they are interested in improving this estimation process and keeping their efforts secret. One exception is Microsoft’s “True Skill” rating system (20). It is used in multiplayer games on the Xbox Live platform. Unfortunately, it is well secured with international patents and not free of charge. 30

One of the first attempts to make a matchmaking process was to play wellbalanced chess matches. The ELO rating algorithm was the first algorithm to estimate the ability of a chess player and is still in use today. The ability of chess players is expressed by the ELO number and, for instance, a chess player must have an ELO number higher than 2500 at least once in his or her career to be able to become a chess grandmaster (21). The ELO rating algorithm is free to use and its usage in the case of pedagogical diagnostics is well documented (22). To do so, the situation of solving a task by a student is regarded as a single match between the student and the task. After enough “matches” have been played, the ability of the student and the “abilities of the opponent tasks,” i.e. the task’s difficulty, can be estimated. We used different matchmaking algorithms while developing the LMSA Kit: the ELO rating algorithm and Glicko V1 and V2. The last two were designed by Mark Glickman to improve the ELO rating algorithm (23). All algorithms were tested with old data material, i.e. scores from the final exams. Thus, the matchmaking algorithms could be trained in the same way, similar to the classification algorithms in EDM or LA, because all matchmaking algorithms need to set some parameters to work correctly. The value of these parameters depends on the purpose of application and the data material used (22). There are few or no values for these parameters in literature. Hence, most parameters in the configuration must be figured out by testing. Consequently, we performed an intensive testing and training of parameter combinations of the algorithms used during the development of the LMSA Kit. We used the Pearson Product-Moment Correlation Coefficient (PPMCC) to evaluate the accuracy of the estimation process. The PPMCC is used to inspect linear correlations between the two sides of a paired data set. The estimated ability and the shown ability in the same criterion in an examination are such a paired data set. The PPMCC is often used to verify the accuracy of pedagogical classifications (24). The values of the PPMCC vary between -1 and 1, hereby, values near -1 and 1 indicate the existence of a linear correlation, whereas values close to 0 neglect it. A modified version of the ELO rating algorithm is currently leading the race, but we are also still looking for further improvements. The rating algorithms were tested on two old data sets of undergraduate students who are doing a minor in chemistry. These students must pass a laboratory class with a final exam. During their laboratory class, they are supported by blended-learning and must solve weekly tests aiming at learning objectives and competencies that are tested later in the final exam. The big data set consists of data material from 300 students, the smaller one consists of 120 students. The rating algorithm used within the LMSA Kit shows an average correlation to the exam data of about r = 0.37 in the big data set and r = 0.27 in the smaller data set. If two different raters are rating the same short answer question, the correlation between the raters is r = 0.59 (24). The best correlation between an automatic scoring system and a human rater is quite close to this value, but with r = 0.52, a little less (24). The difference between the two groups shows the “learning effect” of the ELO algorithm and that the training is more effective with more people. 31

The LMSA Kit does not using matchmaking algorithms as a scoring system for the same tasks that a human rater scores. It estimates the ability in criterion by using test data from electronic assessments during blended-learning. The educator, as the human rater, tries to estimate the ability in the same criterion, but the rater uses some other tasks from the final exam that usually differ from the digital ones. For this reason, the smaller correlation between the prediction and the final score is hardly surprising. Additionally, the students prepare themselves for the final exam between the assessments in blended-learning and the final exam. Nevertheless, the correlation between prediction and final exam is in a span that is relevant to educational purposes. The LMSA Kit offers different matchmaking algorithms to estimate the ability in a criterion. Educators can select and configure different matchmaking algorithms. It is possible to calculate different algorithms or the same algorithm with different configurations. Thus, the varying results are directly comparable to improve the single specific estimation process and to figure out the right configuration. There are additional features especially for small data sets, such as a “training option,” whereby the data set is reused several times for improving the accuracy in estimation. After the calculation process, the results are presented as a table (Fig. 5). The columns are the values of the different rating algorithms. The first rows contain the difficulties of the tasks within the tasks compilation and the second section of rows contains the students’ abilities. We are currently working on improving the visualization, for example, information displayed as diagrams might be more beneficial. Automatically generated reports can inform about critical developments in more than one criterion. Educators can identify students at risk at an early stage so that they can take action to support these students in a specific manner. In the same way, weaknesses of the whole course can be revealed, and support is possible.

Figure 5. Results of an estimation process. 32

The LMSA Kit allows the construction of a bridge between both learning worlds. Educators can now see more about their students’ learning when they are looking at their students through the eyes of the LMSA Kit. The students’ work within the LMS is more than just electronically supervised homework. Educators can reveal learning difficulties and weakly performing students. Face-to-face lessons can be designed closer to the students’ needs because educators can to identify them.

Automatic Generated Criteria-Based Feedback Educators can see and supervise their students’ learning progress with the LMSA Kit. However, educators cannot support each student individually every time. This is nearly impossible, especially in large classes. For this reason, the information generated by the LMSA Kit should be available to the students. Thus, the students can at least see their learning progress and may come to a realistic insight regarding what must be done for the final exam. This reveals students’ miscalibrations and helps students to find a calibration that allows them to estimate the work to be done more realistically (4). Consequently, a more self-regulated learning process is feasible (25). On the other hand, this information should contain more than just the scores earned within a compilation of tasks, because feedback that contains a grading character has been shown to hinder learning (26). Consequently, the feedback must be more formative. The students need information about how close they are to the learning objective to estimate the necessary effort of learning towards gaining mastery or acquisition (4). Further information about the next steps to master the learning objective or how to acquire the competency is also necessary. This can be realized by an automatic feedback system. The LMSA Kit offers the possibility to generate this type of feedback by calculating the students’ abilities in each task compilation and providing text templates for the feedback messages. There is an editor tool within the LMSA Kit to create custom text templates for feedback generation (Fig. 6). Moreover, a coding scheme and a graphical programming interface are implemented in the templates. This allows educators to define their own text templates without any knowledge of programming languages. Knowledge of how to create presentation slides are enough. The educator arranges the text templates and the logical parts of the template on the work area. The parts that are linked to each other are connected by “digital” wires. Hence, every person who can operate an office application can manage the generation of feedback templates. Less automated feedback systems have already been able to show that students perceive such a template-based feedback as valid feedback (27). Even though the templates do not allow a complete individual feedback to be written for each student, the students do not evaluate this feedback as too impersonal. They even feel that this feedback is more objective than grades or other formative resources (28). 33

Figure 6. Text template editing tool.

We tested the LMSA Kit in combination with the automatic feedback system with students who attend laboratory classes at our institute. The class is supported by blended-learning techniques; therefore, students had to pass electronic tests, which were the data basis for the LMSA Kit. During the summer semester of 2015, 39 students received a feedback email, and a year later, 41 students got feedback via email. From these cohorts 19 and 22 students, respectively, participated in the survey (Table 1 and 2). The feedback dealt with the different topics of the laboratory. Instead of providing feedback based on the single laboratory days and the students’ performance on the electronic tests, learning objectives, and competencies that were interspersed throughout the whole class were addressed. Such topics were, for example, correct assembling of the experiments, safety rules and working safely, and the chemical background of the experiments. The feedback text offered all topics in the same structure (Fig. 7). Firstly, it described the topic or the problem and gave supporting examples or exemplary tasks to make the criterion clear to our students. In the second part of such a feedback paragraph, the actual performance and the learning progression in the criterion is presented. Finally, the progression aimed for is shown to allow students to evaluate their learning progress and to plan further steps. In the last part of a chapter, additional learning opportunities are presented. 34

Table 1. Results of Feedback Evaluation 2015 (n = 19)

2016 (n = 22)

Accuracy

2.9

2.2

Transparency

2.4

1.7

Comprehensibility

1.9

1.8

Benefit

1.9

1.8

Benefit in test preparation

3.0

2.7

Request for further feedback

2.0

1.9

Students’ ratings regarding different aspects of the feedback received. accordance, 6: highest discordance.

Figure 7. Default feedback template. 35

1: highest

The feedback messages were presented by the university’s questionnaire survey system. In this way, the feedback could be presented step-by-step and every step addressed a different feedback topic. After each part of the feedback, the students were asked to rate their satisfaction with the given information. Additionally, they were asked to rate their benefit or describe their wish regarding future feedback. The evaluation of the first year uncovered different weaknesses of our feedback system. One of the main problems was the low rated satisfaction in the estimation accuracy felt. We ran an interview study during the winter semester of 2016 to get a more detailed insight into the problems and how the feedback is perceived by our students.

Table 2. Overall Accordance with the Feedback 2015

2016

t-test

In total

-0.5 (p = .008)*

0.3 (p = .030)*

+0.75 (p = .000)

Description of performance progress

0.4 (p= .049)*

0.25 (p = .056)*

+ 0.64 (p= .006)

Accordance with the feedback in total. The feedback was too bad: -2, the feedback was too good: 2. 0 indicates a proper estimation. * indicates t-test for mean is 0 and is significant

Consequently, the feedback is now presented a few days earlier and it offers more hints for further learning activities. The interview study revealed that the additional learning material did not satisfy the students’ needs for further learning opportunities completely. The biggest change to improve the credibility of the feedback was to implement a final preparation test. This special test has the character of a trial exam. It can only be done once, in contrast to the other electronic tests in the blended-learning material. This allowed students to learn with the tests during the course, but they did not appreciate that the feedback messages were based on these tests, which they perceived as “often not seriously done.” Moreover, in addition to learning electronically at home, students also improved their competencies during lab classes; it is, therefore, not surprising that they did not perceived feedback based on electronic data as trustworthy enough to judge their learning progress. The additional trial exam seems to change this feeling in the student’s minds. This trial exam is “seriously done,” and they can show what progress they have made. Finally, only one year later, we could offer an improved and more helpful feedback to our students. The results of the evaluation showed that the improved feedback was more accepted by the students and the accuracy and transparency felt increased. Consequently, the benefit to our students also increased. Additionally, the attitude felt towards the feedback delivered has been changed. The slightly negative attitude is replaced by a moderately positive one. The moderately positive attitude felt helps students to accept the feedback given and this acceptance is necessary if feedback is intended to change the students’ self-calibration (29). 36

Conclusion and Outlook The LMSA Kit is a software solution which allows us to build a bridge between the two worlds. It offers the possibility for educators to get an insight into the students’ learning within the LMS. The acquisition of a competency or the mastery of a learning objective is transparent for the educators. Consequently, the face-to-face lessons are closer to the specific student’s needs. The students’ benefit from working in the LMS is increased because teachers are receiving more information from their students’ digital homework. The built-in diagnostic support features allow educators who are usually laypeople to carry out data mining and clustering techniques to generate applicable educational diagnostic data. They could identify wrong combinations of tasks, therefore, errors in measurement are as minimal as possible. The benefit of EDM and LA is partially available for educators. Continuing research and development of our software will help us to improve this process further. Another way to support students in their learning process at home is an automatically generated and delivered feedback. The LMSA Kit allows the sending of this new type of feedback to the students. The calculated scores earned within an acquired competency or a learning objective are not only used to inform educators about their students, but also to inform students immediately. In combination with a feedback delivery system, students can even profit directly from the work done within the LMS. The additional work for educators is thereby minimal and is not increased by the number of participants, on the other hand, students positively rate such feedback and they benefit from it. We are further optimizing the quality of the feedback in our research efforts. We started to create a software tool with the LMSA Kit that allows educators and students to be more aware of the learning progress during blended-learning. Much work must still be done, but the first results make us feel satisfied and confident that we will be able to support educators and students even more in the future.

References 1. 2. 3. 4. 5. 6. 7.

8. 9.

Chittleborough, G. D.; Mocerino, M.; Treagust, D. F. J. Chem. Educ. 2007, 84, 884–888. Seery, M. K. Chem. Educ. Res. Pract. 2015, 16, 758–768. Teo, T. W.; Tan, K. Ch. D.; Yan, Y. K.; Teo, Y. Ch.; Yeo, L. W. Chem. Educ. Res. Pract. 2014, 15, 550–567. Hattie, J. J. Learn. Instr. 2013, 24, 62–66. ALMazroui, Y. A. Int. J. Inf. Tech. Comput. Sci. 2013, 7, 8–18. Baker, R. S. J. D.; Kalina, Y. J. Educ. Data. Min. 2009, 1, 3–16. Romero, C.; Ventura, S.; Espejo, P. G.; Hervás, C. In Mining Algorithms to Classify Students, Education Data Mining 2008 International Conference on Educational Data Mining; de Baker, R. S. J.; Barnes, T.; Beck, J. E.; Eds.; June 20−21, 2008; Proceedings, Montréal, Québec, Canada, 2008; pp 8−17. Macfadyen, L. P.; Dawson, S. Comput. Educ. 2010, 54, 588–599. Morris, L. V.; Finnegan, C.; Wu, S. Internet Higher Educ. 2005, 8, 221–231. 37

10. Johnson, L.; Levine, A.; Smith, R.; Stone, S. The 2010 Horizon Report; The New Media Consortium: Austin, TX; 2010; pp 29−32. 11. Long, Ph.; Siemens, G. Educause Rev. 2011, 46, 31–40. 12. El-Alfy, E.; Abdel-Aal, R. E. Comput. Educ. 2008, 51, 1–16. 13. Krüger, A.; Marceron, A.; Wolf, B. A Data Model to Ease Analysis and Mining of Educational Data. In Educational Data Mining 2010, 3rd International Conference on Educational Data Mining; de Baker, R. S. J.; Merceron, A.; Pavlik, P. I., Jr.; Eds.; June 11-13, 2010; Pittsburgh, PA, 2010; pp 131−139 14. Ghanbari, S. A. Competency-Based Learning. In Encyclopedia of the Sciences of Learning; Seel, N. M, Ed.;Springer: New York, 2012; pp 668−671. 15. Nisbet, R.; Elder, J.; Miner, G. Handbook of Statistical Analysis and Data Mining Applications; Academic Press/Elsevier: San Diego, CA, 2009; pp 121−172. 16. Nisbet, R.; Elder, J.; Miner, G. Handbook of Statistical Analysis and Data Mining Applications; Academic Press/Elsevier: San Diego, CA, 2009; pp 235−258. 17. Greller, W.; Drachsler, H. J. Educ. Tech. Soc. 2012, 15, 42–57. 18. Schmitt, N. Psychol. Assess. 1996, 8, 350–353. 19. Coulom, R. Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength. In Computer and Games – 6th International Conference; Herik, H. J., Xu, X., Ma, Z., Winands, M. H. M., Eds.; September 29-October 1, 2008, CG 2008, Beijing, China, Proceedings; Springer :Berlin Heidelberg, 2008; pp 113−124. 20. Herbrich, R.; Minka, T., Graepel, T. True Skill: A Bayesian Skill Rating System. In Advances in neural information processing systems 19; Schölkopf, B., Platt, J. C., Hofmann, T., Eds.; Proceedings of the 2006 conference, Vancouver, British Columbia, Canada, December 4−7, 2006; 2007; pp 569−576. 21. World Chess Federation. Handbook FIDE Title Regulations, Article 1.53. https://www.fide.com/fide/handbook.html?id=174&view=article (accessed Nov. 4, 2016). 22. Pelánek, R. Comput. Educ. 2016, 98, 169–179. 23. Glickman, M. E. J. R. Stat. Soc. C 1999, 48, 377–394. 24. Liu, O. L.; Rios, J. A.; Heilman, M.; Gerard, L.; Linn, M. C. J. Res. Sci. Teach. 2016, 53, 215–233. 25. Boekaerts, M. Learn Inst .1997, 7 (2), 161–186. 26. Shute, V. J. Rev. Educ. Res. 2008, 78, 153–189. 27. Debuse, J. C. W.; Lawley, M. Br. J. Educ. Technol. 2016, 47, 294–301. 28. Denton, Ph.; Madden, J.; Roberts, M.; Rowe, Ph. Br. J. Educ. Technol. 2008, 39, 486–500. 29. Lundgren, D. C.; Sampson, E. B.; Cahoon, M. B. Psychol. Rep. 1998, 82, 87–93.

38

Chapter 4

Leveraging Open Source Tools for Analytics in Education Research Sindhura Elluri* Department of Computer and Information Technology, Knoy 372, Purdue University, West Lafayette, Indiana 47906, United States *E-mail: [email protected].

Computer-based assessment has gained a lot of importance in the last couple of years. The need to identify the gaps and factors affecting educational processes and settings better has become vital. This chapter discusses the growing importance of data analytics in the domain of education research to identify the parameters effecting the students’ academic performance and conceptual understanding. It also gives an overview of general procedure followed in conducting data analysis and different open source tools available for both quantitative and qualitative research.

Introduction A cyclical process of steps that typically begins with identifying a research problem or issue of study. It then involves reviewing the literature, specifying a purpose for the study, collecting and analyzing data, and forming an interpretation of information. This process culminates in a report, disseminated to audiences, that is evaluated and used in the educational community (1). The basic educational research process involves identifying the problem to be addressed, identifying the data collection methods to collect data required for analysis, analyze the data, making inferences from the data to identify the supporting theory for problem being addressed and suggest measures to address this gap. Data handling is a vital part of educational research. This includes processes of inspecting, cleansing, transforming, and modeling data so that then in can be properly analyzed with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. Then, traditional educational data © 2017 American Chemical Society

analysis techniques in educational research can be performed, such as qualitative, quantitative, and statistical analyses; or more recent ones at the intersection of machine learning and education, such as learning analytics and educational data mining. Quantitative/Statistical analysis refers to the numerical representation and manipulation of observations for the purpose of describing and explaining the phenomena that those observations reflect .It involves the techniques by which researchers convert data to numerical format and subject them to statistical analysis to test the research hypothesis.Qualitative analysis refers to development of concepts which help us to understand social phenomena in natural settings, giving emphasis to the meanings, experiences and views of the participants. With increase in size and complexity of the data to be analyzed, researchers are more interested in an automated method for discovery of patterns from the data. A confluence of advances in the computer and mathematical sciences has unleashed an unprecedented capability of conducting a data intensive research. Data mining as well as machine algorithms and natural processing techniques are slowly being adopted into education research. Modern machine learning and natural language processing techniques can analyze data and identify how students learn, what students know, and furthermore, what they do not know.

Types of Data Data collected for analysis in educational research has multiple formats based on how it is being collected and stored. Data can be broadly classified into structured, semi-structured and unstructured data. Structured data refers to information with a high degree of organization, such that inclusion in a relational database is seamless and readily searchable by simple, straightforward search engine algorithms or other search operations. It is generally stored in the form of relational databases. Semi-structured data refers to form of structured data that does not conform with the formal structure of data models associated with relational databases or other forms of data tables, but nonetheless, contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. XML and JSON are forms of Semi-structured data. One of the examples of this from education research would be recording the students’ action in the JSON log when they are trying to interact with a learning tool. Unstructured data refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. Text files and spreadsheets are forms of unstructured data. One of the examples of this from education research is think-aloud/interview. Most of the data collected in educational research falls under the category of semi-structured or unstructured data. The tools and methods used for analysis vary depending on the format of the data, the size of the data, how the data is stored, and what type of analysis is being used. Some of the open source data analysis tools 40

like R and Python support any type of data. The data is also analyzed manually first by assigning scores or values to convert it into numerical data and then perform statistical analysis for testing the hypothesis.

Overview of Data Analysis Processes Data analysis is a sequential process. Figure 1 depicts the general process of data analysis.The most essential component of the data analysis process starts with identifying what questions need to be answered from the data and collecting the relevant data. The method of data collection in education research is diverse. Different researchers employ different techniques for data collection based on the issue they are trying to address. Some researchers explicitly design their own instruments to collect data based on the hypothesis being tested. Some of the most common forms of data collection include questionnaires, interview, observation, records, tests, or surveys, among others. The next step is to organize the data collected for analysis. This includes data cleaning and data integration. If data is collected from multiple sources and each of the sources have a different format, the data from these sources must be integrated to form a single data set with unique format to make it easier for the analysis. Data is not always perfect and can present multiple problems which need to be identified and corrected in order to ensure accurate results. The decision of what part of the data needs to be retained and what parts of the data need to be eliminated depends on the research context. In statistical analysis, the data needs to be checked for duplicates, outliers or missing values for analysis. In qualitative analysis which uses text from interviews, it needs to be checked for mistyped words as well the content irrelevant to the research needs to be identified and removed. In a research context which uses text analytics, the singular words irrelevant to the research context or for analysis are marked as stop words and are removed from the data. An exploratory data analysis can provide additional insights enabling researchers to make decisions regarding any further data cleaning or preprocessing and data modeling. The next step is to identify an approach for data analysis depending on the type of research and identify the right software required for analysis. Quantitative data analysis includes classifying the features and constructing statistical models to explain what is observed. Statistical methods are often used in for quantitative data analysis. It uses univariate, bivariate and multivariate variables for analysis depending on number variables being used in the analysis to answer the research question. Identifying the right statistical test is crucial in statistical data analysis. Some of the statistical methods used in the analysis based on research context include but are not limited to checking the distribution of the data, Factor Analysis, Hypothesis testing , regression, T-test, Anova, correlation cluster analysis, and so on. Qualitative data analysis has two different approaches: (a)The deductive approach, which uses research questions to group the data and then look for similarities and differences in the data, and (b)The inductive approach, which uses an emergent framework to group the data and then look for relationships 41

Figure 1. Summary of the general process of data analysis.

42

Qualitative analysis involves identifying recurrent themes and patterns in the data, clustering the data to identify related themes and developing a hypothesis to test the data. Traditional qualitative analysis involves researcher doing the analysis manually which is labor-intensive. The use of software tools provides the researchers with the flexibility to use any format of data like text, picture, audio, and video. There are different machine learning, data mining and natural language techniques which can help researchers identify themes and patterns in data an build predictive models. The software tools often have inbuilt libraries which can be leveraged to perform analysis using these algorithms. The results of the algorithm should be carefully investigated in order to answer the research questions.The summary of the data is often presented in the form of visualizations. It enables decision makers to grasp difficult concepts or identify new patterns in the data based on the visualizations. The analysis cycle is iterative. Based on the results, the data analysis cycle chain can be repeated with a different data set or a different data model to identify which gives better results in order to justify the hypothesis or to be able to answer the research question in the context.

Choosing the Right Software for Analysis The selection software tool used for analysis depends on the researcher’s personal inclination. There are multiple open source tools which could be more efficient in performing analysis the researcher intends to do. The parameters which govern the decision of which software to choose is different for quantitative and qualitative data analysis. Qualitative Analysis Software tools are used in quantitative data analysis to transcribe the data, to code/tag the data, to search and retrieve the data and to identify pattern or relations in the data. As described in the book “Computer Programs for Qualitative Data analysis” (2), below are the key questions which need to be assessed before choosing a particular software for data analysis: 1) 2) 3) 4) 5) 6)

Type of the data and size of the data Theoretical approach to analysis Time required to learn the software and time required for analysis Identify the depth of analysis required: Simple or detailed Desired quantification the results Software support available in case of any issues

Quantitative Analysis Software tools are used in quantitative analysis for statistical modeling and data wrangling. Most of the parameters are similar to what have been described for qualitative analysis above. Below are some of the key factors which need to be assessed before choosing particular software for analysis: 43

1) 2) 3) 4)

If it allows the user to perform the intended statistical test Size of the data Time required to learn the software and time required for analysis Programming capability of the researcher and the amount of programming required by the software to perform the action. 5) Software support available in case of any issues

Open Source Tools Open source refers to a program or software in which the source code is available to the general public for use and/or modification from its original design free of charge.The open source software can be modified to incorporate any additional functionality as required by the users. It provides an open platform for the researchers to contribute ideas for the development of reusable packages which will be useful for their research and can also, be used by other researchers who require that type of analysis in their research. Many researchers in the domain of education have started using different software tools for their data analysis. In Table 1 below are some of the open sources software tools which can be used for quantitative/statistical research, qualitative research and data visualization in education research:

Sample Case Study Using Apache Tools With the increase in the size of data to be analyzed, big data has become a ubiquitous term in Learning analytics as well. Apache tools are very popular for big data analysis.They are easy to learn and use. Apache Drill is one of the many Apache tools which offers the flexibility of data transformation, querying and visualization. It allows the users to analyze without having to define complex schema, or having to rebuild their entire data infrastructure. Drill is a boon in disguise for anyone that relies on SQL to make meaningful inferences from data sets. Another advantage of Drill is that it does not require schema or type specification for data to start the query execution process. Drill starts data processing in record-batches and discovers the schema automatically during processing. Self-describing data formats such as Parquet, JSON, AVRO, and NoSQL databases have schema specified as part of the data itself, which Drill leverages dynamically at query time. Another exclusive feature of Apache Drill is Drill Explorer. Drill Explorer is a user interface for browsing Drill data sources, previewing the results of a SQL query, and creating a view and querying the view as if it were a table. Drill explorer helps you to examine and understand the metadata in any format before querying or designing views, which are used to visualize data in BI/Analytics tools like Tableau. It allows the user to explore structure, size, content of data in any format. 44

Table 1. Open Sources Software Tools Open Source Software Tools

Description/In-built Packages

Functionality Supported

1

R

An active open source project that has numerous packages available to perform any type of statistical modeling.

Exploration, Visualization, Analysis(Qualitative & Quantitative), Data Wrangling

2

Python

Python is a widely used high-level, general-purpose, interpreted, dynamic programming language. It has libraries like Pandas,Numpy,Statsmodels,scikitlearn, NLTK, matplotlib to support data analysis and visualization

Exploration, Visualization, Analysis(Qualitative & Quantitative), Wrangling

3

Wrangler

Interactive tool for data cleaning and transformation into data tables which can be exported into Excel and other tools for analysis

Data Wrangling

4

Apache Drill

An open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets

Exploration, Data cleaning and Transformation, Querying, Visualization

5

Weka

Collection of visualization tools and algorithms for data analysis and predictive modeling with GUI for easy access

DataPreprocessing,Visualizati-on,clustering,classification, Regression

6

AQUAD

Content analysis tool which supports search and retrieve

Text Analytics, Coding

8

Data Applied

Online tool for data mining and visualization supporting multiple analytical tasks including Time series forcasting,correlation analysis,descision trees, clustering

Analysis, Visualization

The Apache Drill Explorer window has two tabs: browser tab and SQL tab. The Browse tab lets the user view any existing metadata for a schema that you access with Drill. SQL tab allows the user to preview the results of custom queries and save the results as a view. Drill is extensible and can combine data from multiple data sources on the fly in a single query, with no centralized metadata definitions thus mitigating ETL(Extraction-transformation-loading) process to combine data from multiple sources to perform the required analysis. We need to add required storage plugins based on the available data sets in order to be able to explore these disparate datasets. 45

Below is a short tutorial of how to experiment with Yelp dataset using Apache Drill. The data set can be found at YELP (https://www.yelp.com/ dataset_challenge). The data is in JSON format. The term dfs. used in each of the queries below refers to the path where the data set is saved on a machine and it needs to be modified to include the path to yelp dataset on your machine when trying this example. The first step is to explore the data present in those JSON files using Drill. We can use SQL SELECT to view the contents of the JSON file. We can restrict the number of rows to be returned by using LIMIT in the SELECT statement. We need to provide the location of the JSON file in the SELECT statement in place of table name in a regular SQL query. We can directly query self-describing files such as JSON, Parquet, and text files. We can explore the review dataset further by examining specific columns corresponding to JSON file. We can also use aggregation functions like Sum in the SQL statements. We can view the attributes in Yelp business dataset by turning on the text mode in Drill. We need to turn off the text mode when trying to perform arithmetic operations on the dataset. We can use an alter statement to set the text mode in Drill alter system set ‘store.json.all_text_mode‘ = false; alter system set ‘store.json.all_text_mode‘ = true; Business users, analysts and data scientists use standard BI/analytics tools such as Tableau, Qlik View, Micro Strategy, SAS and Excel to interact with nonrelational data stores by leveraging Drill’s JDBC and ODBC drivers. Drill’s symmetrical architecture and simple installation make it easy to deploy and operate very large clusters. Drill is the world’s first and only distributed SQL engine that doesn’t require schemas. All of these features make Apache Drill most desirable tool for Data Analysis.

Conclusion There are several open source data analysis software tools can be leveraged directly to perform data analysis. With the increase in number of open source software tools available for use, the researchers’ should start using these tools to improve the efficiency of their research and automate several components of data analysis.

Acknowledgments I dedicate this chapter to my family & Emroz for their unwavering support and for believing in me. I am grateful to Dr. Alejandra Magana for all the motivation and support. 46

References 1. 2.

Creswell, J. W.. Educational research: planning, conducting, and evaluating quantitative; Prentice Hall: Upper Saddle River, NJ, 2002; Vol. 2, pp 24−25. Apache Drill - Schema-free SQL for Hadoop, NoSQL and Cloud Storage; https://drill.apache.org/ [accessed Sep 8, 2017].

47

Chapter 5

Making the Most of Your Assessment: Analysis of Test Data in jMetrik Alexey Leontyev,*,1 Steven Pulos,2 and Richard Hyslop3 1Department

of Chemistry, Computer Science, and Mathematics, Adams State University, Alamosa, Colorado 81101, United States 2School of Psychological Sciences, University of Northern Colorado, Greeley, Colorado 80639, United States 3Department of Chemistry and Biochemistry, University of Northern Colorado, Greeley, Colorado 80639, United States *E-mail: [email protected].

The chapter provides an overview of the jMetrik program designed to analyze test data. We used a dataset of students’ responses to the Stereochemistry Concept Inventory to illustrate the functionality of jMetrik. Steps of data analysis in jMetrik include uploading data, scoring the test, scaling the test, and analysis of the test at the scale, item, and distractor level. The following chapter provides step-by-step guidance for the use of jMetrik software.

Introduction As chemists need high-quality measurement tools, chemistry education researchers also find themselves in the situation where they need instruments that produce reliable data and valid inferences. However, while the measurements in chemistry are objective and can be observed directly (e.g., melting point), in educational research, variables are latent and cannot be measured as a result of direct observation. Quite often a need for a measurement tool is crucial, for example, in quasi-experimental and experimental studies in which performances of two or more groups are compared to determine the effect of a certain pedagogical intervention. The importance of high-quality assessment has been emphasized in the chemical education research literature in recent years (1, 2). Several reviews (2, 3) addressed best practices to write multiple-choice items that © 2017 American Chemical Society

may be beneficial to both researchers and practitioners in the field of chemistry education. Similar to chemistry, which is an experimental science, in chemistry education, evidence that supports the quality of multiple-choice items is obtained experimentally, by multiple administration of tests and analysis of data sets that are produced by students. Towns (2) suggested using item analysis to improve item writing. This manuscript covers uses of jMetrik, a test analysis software, that can be used to conduct item analysis and examine multiple estimators of the quality of multiple-choice items. We used student responses to the Stereochemistry Concept Inventory to show how jMetrik works and types of analyses that can be performed. The Stereochemistry Concept Inventory was developed with the intent to assess organic chemistry students’ knowledge of stereochemistry. The Stereochemistry Concept Inventory consists of 20 trial-tested multiple-choice questions that assess the most important aspects of stereochemistry. Distractors are based on students’ misconceptions previously identified in a qualitative study. An example of an item from the Stereochemistry Concept Inventory is presented in Figure 1. We collected data from 439 students from various institutions across the United States. In this manuscript, we show how various stages of the data analysis of the Stereochemistry Concept Inventory can be performed in jMetrik and what inferences can be drawn from analysis of data in jMetrik.

Figure 1. Item 18 from the Stereochemistry Concept Inventory. Response options A, B, and C are distractors that represent misconceptions, while response option D is the correct answer.

jMetrik Overview jMetrik is free software that was developed and routinely updated by J. Patrick Meyer from the Curry School of Education at the University of Virginia. This software is intended for everyone who is interested in analysis of test data. Types of analysis that can be done in jMetrik vary from basic (e.g., calculation of the sum score) to advanced (e.g., test equating), which makes jMetrik a suitable tool for a broad audience. The most recent version of jMetrik software (4.0.3) can be downloaded from www.ItemAnalysis.com. jMetrik can be used on Mac, PC, 50

or Linux, which is one of many advantages of this software compared to other software packages for test analysis. Analysis of data in jMetrik includes sequential steps that can be summarized in the scheme presented in Figure 2.

Figure 2. Data analysis steps in jMetrik include importing data, item scoring, test scaling, and test analysis at the scale, item, and distractor level.

While this review is not meant to be comprehensive, it covers mechanics of the main steps of data analysis using jMetrik and provides several examples of data interpretation. For more detailed information, we refer our readers to the book (4) which provides the comprehensive coverage of jMetrik functionality from its developer.

Analysis of Data in jMetrik Preparing Data for the Analysis in jMetrik The dataset should be organized to have participants (cases) in rows and variables (items) in columns. Data in this format can be obtained from reading Scantron sheets. In cases when a test is administered without Scantron sheets, data in this format have to be produced manually. As tedious as this task can be, it can produce rewarding insights into your data. Responses to web-based surveys administered through Qualtrics or SurveyMonkey can be downloaded in the desired format that allows analysis in jMetrik. Data may also be imported from any spreadsheet or statistical software package by simply saving the file from this software in a .csv file format. A dataset for analysis can contain any number of cases and 1024 variables. It is unlikely that a single test would contain this number of questions; however, when you combine responses on several tests, it may yield datasets with a large number of variables. You are quite likely to add additional variables to your dataset over the course of analysis, so it is advisable to use less than 1024 variables in the initial dataset for uploading to jMetrik. The dataset may contain student identification information in a separate column if you plan to obtain and report their individual score or percentiles. Usually, the first column is used for these purposes. The first row may contain variable names that are case insensitive. The dataset may contain letters or numbers for response options. Scantron software produces data in the letter format (responses are coded as A, B, C, D, and E), while web-based surveys produce data in the number format (responses are coded as 1, 2, 3, etc.). Since both formats can be handled, recoding is rarely necessary. However, it is important to use lowercase or uppercase format consistently. 51

Importing Data into jMetrik Since jMetrik utilizes database format, you first need to create a database to store your data. When you first start jMetrik, you will see a screen similar to the one shown in Figure 3.

Figure 3. Starting window of jMetrik. Click Manage → New Database to create a database. You will see a popup window (Figure 4).

Figure 4. Create New Database popup window. Type in the name for your database and press the Create button. It is advisable to keep datasets from the same test in one database to allow easy navigation between subsets of data. Click Manage → Open Database and select the database that you just created. You are now in the newly created database that is empty at this point. To import data click on Manage → Import Data. You will see a dialog box illustrated in Figure 5. 52

Figure 5. Import Data dialog box. To find your file click on the Browse button and you will see your file directory (Figure 6) where you need to find the .csv or .txt file you want to upload, as well as specify the type of delimiter you have in your data (tab, comma, semicolon, or colon). You should also specify if the first row in your dataset includes variable names.

Figure 6. Import Data file directory dialog box. 53

Once you find and specify the file you want to import, click on it and its name will appear in the File Name window, then hit return and you will be returned to the previous window of the Import Data dialog box. Be sure to specify a name for the file in the Table Name window (Figure 5). Once you have done this simply press the Import button. Scoring Data in jMetrik At this point the responses are just variables, and jMetrik does not recognize which response is correct and which response is not. You need to score the items. There are two types of scoring possible in jMetrik. Basic scoring is used for binary and polytomous items. Advanced scoring is used when several correct answers are possible, or you wish to award partial credit for certain responses. Another advantage of advanced item scoring is that you can produce syntax with the answer key and reuse it if you plan to have multiple databases or tables from the same test. This is an especially useful feature for the data collected at multiple sites, and you plan to analyze data and produce reports separately for each site. To do basic scoring click Transform → Basic Item Scoring. Enter the correct answers in the first row and the total number of response options in the second row. Use Tab or Enter to switch between cells. Figure 7 represents a dialog box for basic item scoring. For example, items q1, q4, q7, and q8 have four response options and C is a correct answer, while item q3 has only three response options and B is the correct option.

Figure 7. Basic item scoring of responses of the Stereochemistry Concept Inventory. The top line contains correct responses. The bottom line contains the number of possible responses. At the bottom of the dialog box, there are options where you need to indicate how you want to proceed with omitted and not reached responses. Omitted responses are usually scored as zero. However, you may score them separately by assigning them into a special category. Not reached responses occur if participants stop answering questions, which may occur if you use timed tests. jMetrik allows you to handle missing, omitted, and not reached responses distinctly. There is no universal solution for these categories; you should base your decision on the nature of your dataset, conditions under which a test was administered, and type of analysis that you wish to perform. A fairly comprehensive description for handling missing data in education research is available (5). 54

Scoring the polytomous items (e.g., response to Likert-type scales) is done differently. In this case, in the first row put “+” for the items that are scored in ascending order or “–” if the items are scored in reverse order. The second row still should include the number of response options. In cases, when you want to award partial credit or your multiple-choice tests have several correct responses, basic scoring is not sufficient for your analysis. To do advanced scoring click Transform → Advanced Item Scoring. In the dialog box that will appear provide scores for all options. Then select items that match the key that you entered and click Submit. The key will appear in Scoring Syntax box. Score all items and notice that they become bold. Figure 8 represents an example of advanced item scoring for items q1, q4, q7, q8, q9, and q14 of the Stereochemistry Concept Inventory. These items have four response options (A, B, C, and D) where a response of “C” is a correct answer and awarded 1 point. Responses A, B, and D are distractors and thus are awarded 0 points.

Figure 8. Advanced item scoring dialog box. The correct answer (response option C) is awarded 1 point, and distractors (A, B, and D) are awarded 0 points. Items with the same key are selected.

Note that the bottom of the Advanced Item Scoring dialog box contains the Scoring Syntax window. All commands will be entered in this box. If you plan to reuse the key for another dataset, you can save the syntax to a separate file and reuse it. This approach is particularly useful when data on performance on the same test are collected from multiple sites. 55

To view which items are scored and the key, go to the Variables tab. Occasionally, it requires hitting the button “refresh data view” in the top panel. In the Variables tab you can see which variables program recognizes as items (binary or polytomous) and the scoring key for each item (Figure 9).

Figure 9. Variables tab contains information on which variables are treated as items and the scoring keys for all items.

Scaling of Test Before running any analysis, the test should be scaled. In other words, responses on individual items need to be converted into meaningful aggregate numbers. Combining individual responses into some aggregate score is called scaling. Traditionally, in the educational setting, cognitive tests are scaled by calculating the sum score. To do this, click on Transform → Test Scaling and then select items that you want to include in the total score and type of the total score. You also need to name the new variable that you are adding to your database as a result of scaling. Figure 10 represents a dialog box for Test Scaling. 56

The sum score is the most often used but not the only one. The current version (4.0.3) of jMetrik also allows computing average scores, Kelley scores, percentile ranks, and normalized scores. A test can be scaled in multiple ways. If you go to the Data tab, you will see that composite scores are added to the data at the very end of the table.

Figure 10. Test Scaling dialog box. Items that are used to produce the composite scores should be moved into the empty box. The new variable must be named, and the type of composite score should be selected. 57

Figure 11. Item Analysis dialog box. Items for analysis should be dragged into the empty box.

Item Analysis in jMetrik jMetrik provides insights into properties of the scale, properties of individual items, and properties of distractors. Most of that information can be obtained by item analysis. To run item analysis, go to Analyze → Item analysis. This will open a dialog box, presented in Figure 11. In this dialog box, you need to select items that you want to analyze. In the Options tab, you are presented with 58

multiple options for the analysis. The most important here are Compute item statistics that would yield difficulty and discrimination indices and All response options that would give these parameters for all response options, including distractors. By default, these options are selected. If you want to run item analysis only for correct options, you need to uncheck the All response options checkbox. By default, jMetrik analyzes data adjusting for spuriousness. Correction for spuriousness removes variance from the total score that is due to the item when computing item-total correlations. We endorse using this correction, especially for tests with a small number of items. After you run this analysis, you will see the output (Figure 12). In this output, you will see test level statistics, such as a total number of items, a number of examinees, minimum and maximum scores, mean, median, standard deviation, interquartile range, skewness, kurtosis, and Kuder-Richardson coefficient 21. These statistics can be instrumental in examining the shape of the distribution and its normality.

Figure 12. Output for the scale level analysis and reliability estimates.

jMetrik provides multiple estimates of reliability (Guttman, Cronbach, Feldt-Gilmer, Feldt-Brennan, Raju as can be seen in Figure 12) which may be used to estimate the amount of measurement error in an instrument. Reliability coefficients estimate to what degree test scores are consistent with another hypothetical set of scores obtained in a similar process. Different estimates have different underlying assumptions and ease of calculation. Coefficient alpha is appropriate when responses at least satisfy assumptions of the essentially tau-equivalent model, while Feldt-Gilmer and Feldt-Brennan estimates are suitable for congeneric models (4, 6). 59

While coefficient alpha is the most commonly used estimator, it is unlikely that underlying assumptions for the essentially tau-equivalent model are met. A more appropriate method would probably be Guttman’s L2 which estimates lower bound to reliability of a test assuming the scores are congeneric (4, 6). Multiple estimates of reliability provide better insight into the quality of the scale than a single one. One of the unique features of jMetrik is that confidence intervals for all reliability coefficients are also computed which provide insight into how much trust can be put into certain reliability estimates. If you select Compute item statistics, the output would also contain item analysis for the selected items. Item analysis output is organized as a table (Figure 13) containing item numbers, all response options and their scores, difficulty, standard deviation, and discrimination index. For example, item q1 from the Stereochemistry Concept Inventory had a difficulty value of 0.6424 meaning that 64% of students chose the correct answer. Difficulty values for distractors simply indicate the fraction of students who selected them. The discrimination index for item q1 was 0.1529, indicating that high-scoring students tend to select the correct option, while low-scoring students are more likely not to select it. Discrimination indices for distractors are all negative suggesting that low performing students are more likely to select these distractors rather than high performing students. Generally, the higher the discrimination index is, the better the item differentiates between low and high performing students. Popham (7) suggested some cut-off values for revising items based on the discrimination index value. jMetrik also allows for item analysis at a distractor level. While examining percentages of students who endorsed distractors might be important to estimate fractions of the population that possess misconceptions, addressed in distractors, it is utmost important to research the distractor-total correlations. These coefficients may provide some insights into the relation of a particular misconception and the level of ability as measured by the test. Negative correlations suggest that a misconception is prevalent to lower ability levels; zero correlations suggest that a misconception is independent of the ability level, while positive correlations may suggest that the distractor is misleading to higher ability students. Quite often, positive correlations for distractors simply indicate a coding error or miskeyed item.

Figure 13. Item analysis of items q1 and q2 of the Stereochemistry Concept Inventory. 60

If you wish to produce only difficulty and discrimination indices for the correct option, you should uncheck the All response options checkbox in the item analysis dialog box. Low discrimination indices may suggest ambiguously worded stems. Negative discrimination indices indicate that the most knowledgeable students answer the item incorrectly and the least knowledgeable students answer the item correctly. Often, an item with a negative discrimination index is a miskeyed item.

Figure 14. Nonparametric Characteristic Curves dialog box. You need to select items for which you wish to perform item analysis and select All options in the right bottom box. 61

Nonparametric Response Curves While in the previous section we addressed the importance of evaluating the quality of distractors by examining their correlation coefficients, the conclusions should be drawn carefully because correlation coefficients are based on the assumption that the relationship between variables is linear. As appealing as this model is, it does not provide an accurate picture if this relationship is not linear. Recently, item response curves (8, 9) have been employed to examine the quality of concept inventory items. Item response curves (IRCs) can be generated in jMetrik by plotting the probability of students selecting response options versus the total score for that group of students. Item response curves are useful tools for examining the relationship between the probability of selecting the correct response or distractor and the person’s ability as determined by the sum score. IRCs allow evaluating the overall quality of the item and the performance of each distractor. This analysis is especially useful in the earlier stages of test development because it allows spotting poorly functioning distractors. To generate item response curves, click Graph → Nonparametric Curves. This opens the dialog box Nonparametric Characteristic Curves (Figure 14). In this box, you should select items for which you want to produce curves. For the box Independent Variable, you should select an estimator of the ability level, for example, the sum score. To get a complete picture, select All options.

Figure 15. Nonparametric IRC for item q11 from the Stereochemistry Concept Inventory. 62

Figure 15 shows the IRC for item q11 from the Stereochemistry Concept Inventory. Response options B and C are distractors. Option B is the most likely response for examinees with the total score below 3, and option C is the most likely response between 3 and 13. At higher ability levels (total score higher than 13), the correct answer (option A) is the most probable response. These results show that options B and C are not only plausible distractors, but there is an order to their degree of plausibility, which is indicative of the underlying cognition model to this question.

Conclusions jMetrik is a powerful instrument designed specifically to handle test data. In jMetrik, you can analyze your data using both classical test theory analysis and modern psychometric methods. The graphical interface of jMetrik makes it intuitively easy to use even for practitioners with no prior experience with psychometric software. Data from Scantron or web-based surveys can be directly uploaded into jMetrik and requires very little preparation. If you decide to rerun the analysis with recoded responses or omitted items, it can be easily done with point-and-click menus and the drag-and-drop interface. Computation power of jMetrik allows for many useful analyses that can provide insights into the quality of tests, individual items, and their distractors. In the past, various hand calculations have been employed to compare responses for individual items with total scores using high and low scoring groups of students. One of the most commonly used approaches is to compare responses between the upper and lower 27% of examinees. While Kelley (10) suggested that approach in 1939, it was based on convenience and ease of calculations. Using jMetrik allows for the computational analysis that provides more accurate assessment of the discrimination power of items because all student responses are taken into account. As we mentioned previously, we did not intend to provide comprehensive guidance for all types of analyses that can be done in jMetrik. In addition to the types of analyses described in this chapter, you can perform the Rasch analysis that allows measuring items and examinees on the same scale, differential item functioning analysis to compare the performance of two groups of examinees on the same items, or score equating for participants that were administered several different forms of the same test. Moreover, jMetrik is constantly updated, and we expect to see additional functionality in the future.

References 1. 2. 3. 4.

Holme, T.; Bretz, S. L.; Cooper, M.; Lewis, J.; Paek, P.; Pienta, N.; Stacy, A.; Stevens, R.; Towns, M. Chem. Educ. Res. Pract. 2010, 11, 92–97. Towns, M. H. J. Chem. Educ. 2014, 91, 1426–1431. Haladyna, T. M.; Downing, S. M.; Rodriguez, M. C. Appl. Meas. Educ. 2002, 15, 309–333. Meyer, J. P. Applied measurement with jMetrik; Routledge: London, 2014. 63

Cheema, J. R. Rev. Educ. Res. 2014, 84, 487–508. Meyer, J. P. Understanding measurement: Reliability; Oxford University Press: Oxford, 2010. 7. Popham, J. W. Classroom Assessment: What Teachers Need to Know; Pearson: London, 2010. 8. Brandriet, A. R.; Bretz, S. L. J. Chem. Educ. 2014, 91, 1132–1144. 9. Linenberger, K. J.; Bretz, S. L. Biochem. Mol. Biol. Educ. 2014, 42, 203–212. 10. Kelley, T. L. J. Educ. Psychol. 1939, 30, 17–24. 5. 6.

64

Chapter 6

Putting the R in CER How the Statistical Program R Transforms Research Capabilities Jordan Harshman,*,1 Ellen Yezierski,2 and Sara Nielsen3 1Department

of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska 68588, United States 2Department of Chemistry and Biochemistry, Miami University, Oxford, Ohio 45056, United States 3Department of Chemistry, Hanover College, Hanover, Indiana 47243, United States *E-mail: [email protected].

When researchers employ quantitative methods in their investigations, they have the choice of many programs to conduct their analyses. In this chapter, we argue that the statistical programming language called R demonstrates the greatest utility in these analyses. R is capable of conducting one of the largest varieties of statistical techniques compared to other programs and has the potential to transform how researchers analyze their data. Throughout the chapter, we will discuss the significant benefits of using R to more efficiently and effectively analyze data by re-conceptualizing data visualizations, defining custom functions, writing programmatic loops, and enhancing reproducibility and documentation.

Introduction As scientists in chemistry education research (CER), we are tasked to present evidence of the effects of pedagogical interventions, students’ knowledge, skills, affect, and other complex constructs that cannot be directly observed. When our © 2017 American Chemical Society

research questions warrant quantitative measurements, we are often held to far different standards than the r = 0.9999 set by our analytical chemistry colleagues due to the nature of our widely varying subjects, human beings. Considering this research context, it becomes crucially important to identify and effectively report all possible evidence in our investigations. By extension, we would also posit that, due to a number of reasons discussed throughout this chapter, the statistical programs used to analyze data play an important role of what evidence can be identified and brought to bear throughout the analytic process. Throughout this chapter, we argue that one program, R, has the ability to serve the needs of researchers in CER better than existing alternatives. To make this argument, we need to first address a statement we’ve heard commonly: “The choice between R, SAS, Stata, Excel, SPSS, Matlab, JMetric, and all other programs is really just about preference.” We liken this with the belief that someone’s choice of automobile is just about preference. To some extent, this is true, as you need to feel comfortable with operating the vehicle. Some prefer manual versus automatic, and some like certain designs and colors. But foremost is the consideration of function. A small, commuter car might be your favorite car for getting around town, but is not a good option for someone who needs to haul a couple tons of gravel. Similarly, common programs such as SPSS and Excel are great programs for many quantitative needs, but often provide limited options, techniques, and efficiency in comparison to R. Additionally, and perhaps more importantly, we contend that the choice of statistical program plays a role in a researcher’s selection of statistical techniques and visualizations. The choice of one may seem unrelated to the choice of the other, but there are plenty examples of two things that theoretically should not affect each other, but do. For example, consider three decades worth of evidence that suggests that use of a credit card versus cash increases propensity to spend (1, 2), decouples reward (purchase) from pain (3) (cost), and can affect the way that consumers perceive products (4). With statistical programs, it is possible that defaults and frustrating procedures can actually affect methods used in research. Consider the 46-year old problem (5) of confusing principal components analysis (PCA) with exploratory factor analysis (EFA). It is possible that this problem has been exacerbated by the fact that the default extraction method in SPSS for a “Factor Analysis” actually conducts a PCA. In a similar line of thought, researchers that exclusively used Excel to produce their visualizations may have been less likely to use boxplots – one of the most fundamental displays of data –because in prior Excel versions, it was very time consuming to manipulate a stacked bar chart to look like a boxplot in Excel (6). R is not excluded from these problems, as it comes with defaults and frustrations just like other programs. The point is to recognize that a researchers’ choice of program can impact what analyses are and are not conducted. While this should not be the case, many of the researchers who use applied statistics are not statisticians by training. Because researchers have limited training, they may be more likely to accept the defaults of many programs simply because they do not know the consequences of each choice. It is worth stressing again that users can willingly copy/paste code in R from another resource without fully realizing the ramifications, but we contend that many features that we discuss here will help facilitate thoughtful consideration of analysis procedures. 66

It is in light of this recognition that we unveil the thesis for this chapter: Using R does not just allow researchers to perform techniques not available in other programs. Rather, we hope to convince the reader that R has the ability to transform how researchers view data analysis. But before we can present the argument, we will first summarize what R is, list its advantages and disadvantages, and then include four sections that describe how use of R can transform how researchers see visualizations of data via graphing packages, analyses via custom functions, analyses via programmatic loops, and documentation via interactive notebooks.

What is R? While there exists a program that you can download called “R,” R is technically a programming language, not a program. When you download R (https://www.r-project.org/), the program is simply a text editing program with build in menus and abilities to display graphs produced by the R language. Many users are currently running R in a front-end program called RStudio® (https://www.rstudio.com/home/), which contains a number of features that make writing code and managing results more efficient while opening up many additional features not available in R. R is an open-source software, meaning that the source code is public, further meaning anyone can build, change, or remove features and functions on their own local copies. The base R language is copyrighted and governed by a non-profit organization. Throughout the years, researchers and programmers have added 9,886 (as of January 11, 2017) packages, all of which contain additional functions and capabilities. These are also open-source contributions available on existing mirrors of the Comprehensive R Archive Network (CRAN).

Advantages and Disadvantages of Programming in R Broadly speaking, many will be quick to point out the biggest advantage to R, which is that because it is open-source, it is free to use and modify anywhere around the world. While certainly a huge advantage, free does not necessarily mean good. It wouldn’t be surprising if researchers favored paying for a statistical program over taking a free one if the paid program had enhanced features and capabilities. However, in this case, it is rare to find a statistical technique or visualization another program can do that R cannot. Basic statistical functions, data manipulation, and graphics are a part of the base R package, and more advanced techniques are found in the nearly 10,000 additional packages (think of these as add-ons). Additionally, users can write their own functions in R. This means that if there doesn’t exist a defined function to carry out something the user wants to do, they can write one themselves. It sounds daunting to write your own function, but in a later section, we’ll demonstrate that it is not nearly as difficult as it sounds to customize R to give a researcher exactly the output that the researcher wants to examine. This is made considerably easier by the propensity of coders to make their code available in order to prevent individual users from having to 67

reinvent the wheel, which is a core philosophy of the open source movement. Lastly, a relatively recent overhaul of the graphics system has made R a top contender in production of quality, completely customizable graphics, which provides researchers a huge advantage in effectively telling their stories with data. For its use in CER, there are two main disadvantages to R. The first is the generally steep learning curve associated with R. It is not “like learning a new language,” it is learning a new language, which takes time and practice. If a user has previous coding experience, this process likely won’t be as long as a beginning coder. To overcome this barrier, it is recommended that R users learn one function at a time and eventually start combining those functions for enhanced capabilities. However, we believe that R is the last statistical program a researcher will have to learn because of its very wide array of capabilities. The other main disadvantage is in the eye of the beholder: If anyone can develop and publish new functions and features, how can researchers trust that functions do as they advertise and that results are accurate? First, while anyone can build a package, to develop and release one on the CRAN, many standards of design and documentation must be met. Second, because R is open-source, everyone has access to what these functions do, down to the source code that defines the function. Therefore, with proper expertise, anyone can read the code and find out exactly what a function does and compare it to what the authors of the function claim it does. This is something researchers usually cannot do with commercial programs because the source code is owned and generally not released to the public. Lastly, many of the packages in R are written by statistical experts in academia that lead to publications that go through a peer review process. There are great incentives to produce accurate packages because mistakes could be damaging to the authors’ reputations.

Data and R Code Presented Throughout this chapter, we will primarily be referencing hypothetical data. In the spirit of the open source philosophy, we have included all of the code used to produce the various figures and analyses discussed in this chapter. Readers are encouraged to download R and RStudio® to conduct these analyses themselves because there is simply not enough space in this chapter to display all of the code. Therefore, we are encouraging an interactive reading experience that will give the reader an experience and a “feel” for working in R. Whenever this bolded and italicized phrase, TryThis(#), appears in the text, there will be a section in the supplemental R files containing the code relating to that section. Supplemental files can be accessed at http://bit.ly/2jGmIfy and include the following files (unzip folder prior to opening): 1. 2. 3.

Benefits of R Supplemental.R – contains all TryThis examples kmeans example.Rmd – produces interactive notebook when run in RStudio® JACS.csv – data file containing most common word pairs of JACS titles 68

Transforming Data Visualization Data are too often seen as substitutes for long tables as opposed to “instruments for reasoning about quantitative information (7).” Visualizations can reveal (and conceal) information, which greatly affects the evidence authors present, for better or for worse. An example of this used frequently is the Anscombe quartet (8), TryThis(#1). In these four sets of data (11 observations each), every set of x values has μ = 9.0 and σ = 3.3 and every set of y values has μ = 7.5 and σ = 2.0. Simply reporting these means and standard deviations, however, fails to reveal the clear patterns shown in Figure 1. As this exemplifies, reporting only means and standard deviations runs the risk of concealing additional evidence that may support or refute a researchers’ conclusions. We also encourage the readers to look into world-renowned data visualization expert Edward Tufte’s famous Challenger rocket example of how tragedy may have been avoided if more effective displays of information were available to key decision makers (9).

Figure 1. Anscombe’s quartet. So how does R incorporate graphical design principles and transform how researchers view data visualizations compared to other programs? Many programs give the user the option to create one of several types of graphs, such as line graphs, bar charts, scatter plots, etc. Similarly, perhaps in earlier education you learned about these types of graphs. However, the goal of data visualization is not to force data into a limited number of types of graphs and instead, it is about presenting the evidence to tell the story represented by the data. For example, imagine 100 students taking a pre and posttest. We have created such a data set in TryThis(#2). To measure change from pre to post, researchers commonly report a change in means, portrayed in text, table, or graph such as a so-called “dynamite plot” (shown in Figure 2A). Here, we can see that students increased their scores from the pretest (μ = 75.30%, σ = 17.85%) to the post test (μ = 77.55%, σ = 17.80%). The story told by Figure 2A is one of indifference or if there is a difference, it is small. Dynamite plots such as this have been heavily scrutinized as a poor representation of data (10). Now consider the graph Figure 2B. It is difficult to label this as a particular “type” of graph, but if it is a type, most programs do not have built in function to produce it. This illustration presents a story that is much more descriptive of the data itself (not just summaries of it). Two boxplots, one for pre and one for 69

post scores, show aggregated results on top of the individual 100 students’ scores. Each line connects the individual student’s pre score to that same student’s post score. This graphic shows that (1) all but 8 students either gained or remained stagnant from pre to post, (2) These 8 students seems to be outliers by declining in performance by 30-40 points, and (3) there may be a significant ceiling effect for this test. None of these observations are apparent in the dynamite plot on the left, but are revealed in the plot on the right.

Figure 2. (A) Dynamite plot of pre/post outcomes versus (B) lines representing individual student scores from pre to post.

While much, much more could be said about the advantages and disadvantages of certain visualizations over others, we want to discuss how R’s graphic system can actually encourage researchers to see graphics as a means of displaying data and telling stories. To accomplish this, we need to investigate the syntax that is used in plotting. Figure 2A and B were produced in a package called ggplot2 (11), which is now widely used in the R community. The “gg” stands for “grammar of graphics,” indicating that it, like R, is not a series of options and functions, but a language. Code 1 shows a generic syntax for ggplot. In the first line, the ggplot function is called to indicate the start of a graphic. The name of the data set, data in this case, is then provided along with aesthetics, aes. These aesthetics map the Var1 variable to the x-axis and the Var2 variable to the y-axis. Line 2 then maps different geometries onto those aesthetics, such as points (geom_point), lines (geom_line), boxplots (geom_boxplot), and many others. In other words, the syntax requires that the user first consider the data to be displayed as opposed to a type of graph.

70

This is in contrast to many other programs that require a user to first choose pre-determined geometries (a type of graph), and then map those geometries onto the data second. The ggplot package applies exactly the reverse process, where data are first identified (line 1) and then those data are mapped onto one or more geometries second (line 2). Defining the data first followed by the geometries might seem like a small difference, but is more aligned with the fundamental reasons researchers show visualizations of data by allowing researchers to think about the best visual display versus the best of available options. As an analogy, consider being asked “how was your day?” at the end of a work day. One option is to represent an aggregate summary of the events of the day by choosing “type” of day: Was it a good day? Bad? Unproductive? Busy? None of these representations depict the actual interactions and events of the day. As a result, if someone only reports that they’ve had a bad day, there are many important details that are not accessible, and the consumer of this limited information may conclude things that were not actually observed. Thus, instead of choosing a “type of day,” a more effective way of communicating is to find a way to describe some of the individual events of the day and let the consumer determine their own labels for the day. Like with the ggplot syntax, it is more informative to first define the values (events of the day) and represent them in some way (using various parts of speech in a sentence). The same is true with data and the ggplot2 syntax which encourages researchers to remember that the goal is to display raw data when possible. So often, as in the dynamite plot shown previously, what is displayed is a visual representation of a parameter of the whole data set, not the individual data themselves. Advanced capabilities in R, such as jittering (adding small random noise to prevent over-plotting), transparencies, and facet plotting (creating multiples of the same graph broken down by group) all help researchers effectively display hundreds or even thousands points of data without inducing cognitive overload. We will almost exclusively be focusing on quantitative information, but data visualization is important in qualitative settings as well. There is a text mining package called tm (12, 13) that has gained popularity recently. While R is not the program of choice to code qualitative data, this package provides a number of useful tools to explore large sets of short-answer responses. R also allows for word clouds in the wordcloud package (14), and we will discuss a unique visualization called a chord diagram (15). Originally, this visualization was used in bioinformatics to visualize genome sequences, but it can be used with text to show not just how frequently words appear (word cloud), but also how frequently words appear next to each other. To illustrate the potential power of this visualization, we used R to visualize the titles for every issue of the Journal of the American Chemical Society (JACS) printed from January 1, 2016 through December 31, 2016 (volume 138, through issues 1-51). TryThis(#3). In this example, we used a package that does web scraping (automated access and import of information from servers into R) to easily create a list containing all the article titles. Then, we reformatted the data so that we could accurately count how many times each unique word pair showed up in an article title. For space concerns, this part is not included in the supplemental file, and instead we have also included a raw data file 71

that can be imported. As of issue 51, a total of 2,478 articles were published in 2016 (about 48 articles per issue). We then made a chord diagram to show the common word pairings that appeared in titles. The chord diagram (Figure 3) is read by finding the side of a particular link that is furthest away from the circle’s edge, which represents the first word in a pair. Following the link to the side with that is closer to the circle’s edge, this is the second word in the pair. You can gauge how many times that word pair was mentioned in that order by reading the axis. For example, the thickest line stretches from “metal-organic” to “framework”, indicating that word pair is the most common pair in 2016 JACS titles. The axis indicates that 64 titles contain this word pair. We’ve highlighted the most common word pairs/phrases appearing in JACS titles in a darker color. Doing so reveals that the most common topics explicitly mentioned in article titles were metal-organic frameworks (64 articles) and total (enantioselective) synthesis (51 articles). This visualization is particularly effective at displaying short phrases of three or more words long. This can be seen in the phrase “covalent organic framework” (19 articles), which we’ve also highlighted in a darker color. While this particular example of JACS articles titles is limited because we cannot always infer much from an article title, this technique would prove useful in a variety of settings, such as analyzing open ended responses where phrases are expected to be commonly used. For example if students are asked to explain why a balloon’s volume increases at higher temperature, the proportion of students who include “increase temperature, increase volume” versus “increase kinetic energy of molecules” in their explanations could be meaningful.

Figure 3. Chord diagram of 2016 JACS article titles. 72

Creating custom visualizations in R could be an entire book’s worth of content, so instead of showing additional visualizations here, we encourage readers to visit the following resources to browse some unique visualizations produced in R.



• • •

The R Graph Gallery (http://www.r-graph-gallery.com/) – Each graph on this page comes with the code used to build it, some are purely artistic, others offer unique ways to visualize actual data ggplot2 Help Topics (http://docs.ggplot2.org/current/) – Lists current geometries available in ggplot package and how to use them Plotly Integration with R (https://plot.ly/r/) – Plotly is a web-based graphing site that offers full integration with R Quick-R Graphs (http://www.statmethods.net/graphs/) – Useful for making plots using base R opposed to ggplot2

Functions and Programmatic Loops The best way to show the benefits of R for chemistry education research is to actually walk through an analytic problem and show how R leads to efficient and robust research via functions, loops, and reproducibility. Before we introduce and investigate a problem, we’ll first provide a brief tutorial on functions and loops in R as these will be used throughout the example research investigation. The basics of how a function works are shown in Code 2. A function we’ve called myfun is defined to take two arguments called argument1 and argument2. This meaningless function starts with the { in Line 1 and ends with the } in Line 4. Line 2 instructs R to define an object, x, as the sum of whatever is entered in argument1 and divide it by the mean of whatever is in argument2. Line 3 simply tells R to return (print) the value of x at the end of the function. This function is exemplified by defining two sets of numbers, a and b in lines 5 and 6, and running the function in two different orders (Lines 7 and 9). Line 7 is essentially telling R to replace every instance of argument1 in the function with the object a, which resolves to a set of 3 numbers, and replace every instance of argument2 in the function with object b, which resolves to a different set of 3 numbers. Therefore, Line 2 starts as object x being defined as sum(argument1)/mean(argument2), which evaluates to sum(a)/mean(b), which further evaluates to R computing the sum of 5, 2, and 8 (15) divided by the mean of 7, 3, and 0 (3.33), which equals 4.5, which is displayed as a result of the return function in Line 3. The exact same happens in reverse if the user enters b for argument1 and a for argument2, as shown in Line 9. You can try this for yourself in TryThis(#4).

73

Programmatic loops can add greater functionality to existing and custom functions in R. The basic concept behind a programmatic loop, called a for loop, is that some chunk of code is re-run multiple times, each time changing something. A simple example for the for loop is shown in Code 3. First, the object c is defined as 10 numbers (Line 1). The boundaries of the loop are defined in Line 2, which basically tells R that it should run the code inside of the curly brackets a total of 10 times (1:10 is shorthand for “every integer between and including 1 through 10”). The i is representing an index. Therefore, the first time the code is run, i will be equal to the first element in the vector given. In this case, the vector given is 1:10, so for the first run through, i = 1. The second time the code is run, i will be equal to the second element in the vector given, and so on. In this case, the second run through will evaluate to i = 2, third run through, i = 3, and so on. In the first run through i = 1, so Line 3 evaluates to x symbol indicates that R is waiting for a command to be typed. Environment (upper right, first tab) where objects stored in R will appear, the Import Dataset button is located here Help (upper right, third tab) where functions are explained Plots (lower right, second tab) where graphics will appear along with an Export button Packages (lower right, third tab) where packages are installed, with the Install button, and managed

Packages can be searched for and installed by clicking on the Install button in the upper left corner of the Packages pane. Type the name of the desired package and click Install to install from CRAN (Figure 2). Connecting to CRAN requires an internet connection. The Console pane will show output confirming installation (Figure 3). The Console output in Figure 3 shows that packages can also be installed by using the function install.packages() with the name of the package in quotes. The convention in this chapter will be to use the Courier New font to distinguish commands to be entered into the Console and all functions names will be followed by parentheses. Any text contained inside the parenthesis of a function is an argument. More information about the arguments a function takes can be found by searching the Help pane for that function, shown in Figure 4. Once a package is installed, it is listed in the Packages pane, but must be activated either by selecting the checkbox next to it (Figure 5), or by using the function library().

93

Figure 1. RStudio pane layout. RStudio is a trademark of RStudio, Inc.

Figure 2. Installing a package in RStudio.

Figure 3. Console output showing the command that installed the likert package along with output confirming download. 94

Figure 4. The help documentation for the install.packages() function showing the arguments of the function.

Figure 5. Packages pane showing the likert package has been installed and loaded while the lavaan package has been installed but not loaded.

All analyses in this chapter were conducted using the most current version of R (3.3.2), RStudio (1.0.44), likert (1.3.3), psych (1.6.9), and lavaan (0.5-22) available at the time of writing. RStudio is a trademark of RStudio, Inc.

Visualizing Response Patterns with Likert Importing, Viewing, and Summarizing Data Data can be imported into R using a function such as read.csv(), but users may prefer to use the Import Dataset button available in the Environment pane. R can work with data imported from many different formats including proprietary formats from software such as Excel, SPSS, SAS, and Stata (Figure 6). 95

Figure 6. The Import Dataset menu options.

The likert package contains a built-in dataset that can be used to demonstrate its functionality. This dataset contains responses from the 2009 administration of the Programme of International Student Assessment (PISA). More information about the PISA data can be accessed by searching for pisaitems in the Help pane or typing ?pisaitems in the Console. The command data(pisaitems) loads the dataset into the R environment. Figure 7 shows that the result of this command is that a dataset named pisaitems appears in the Environment pane with 66,690 observations of 81 variables.

Figure 7. Environment pane showing pisaitems dataset.

Clicking on the table icon to the right of the data or using the command View(pisaitems)will bring up a spreadsheet like view of the data (Figure 8). The dataset can be searched and filtered as in most spreadsheet programs, but not edited. The likert package does not use numeric response data for analysis, but the actual words of the response options which are stored in a specific format called a factor that R associates with underlying numeric values. This format can be seen by clicking on the blue arrow next to the pisaitems dataset in the Environment pane. Figure 9 shows that all 81 variables in this dataset are stored as factors with different numbers of levels. For example, the second column, variable ST24Q01, contains response data with four levels one of which is “Strongly disagree”. The numbers to the right shows that R has numeric values associated with these factors that will be useful for later analyses. 96

Figure 8. Spreadsheet view of pisaitems dataset.

Figure 9. Environment pane with pisaitems information visible showing that variables are stored as factors with levels and associated numeric values.

To get a count of how many responses are recorded for each factor, enter the following code (after the >) in the Console.

This code calls the summary() function and the $ operator indicates which column of the dataset to summarize. Since this is the second column in the dataset, the command summary(pisaitems[,2]) produces equivalent results (Figure 10). Depending on the situation and function being used, both selection methods will be used in this chapter. Values of NA refer to missing data in R. 97

Figure 10. Console output showing two different ways to execute the summary() function on the second column of data in the pisaitems dataset.

Visualizing Response Distributions The likert() function provides a summary of datasets formatted like pisaitems. The likert summary can be easily passed to a plotting function to visualize response distributions. Bar charts are typically used in the CER literature to visualize the number of individual survey responses that fall into each category (15–17). Plotting all 81 variables in the pisaitems dataset would be unwieldy, so the code below is used to create a smaller dataset called minipisa that contains only the country information and the first five pisa items (the first six columns of the pisaitems dataset).

By default, the column names will be used as labels for the plot. The names() function can be used to replace the generic column names with more descriptive names, including the actual question text, as shown in the following code which renames the columns in the minipisa dataset.

Next, the five pisa items in columns 2 through 6 can be summarized with the likert() function and saved into a new variable called likert.out which will be used for creating four different plots that will appear in the Plots pane when the code is executed.

98

The first plot (Figure 11) uses default settings to plot a centered bar chart showing the percent responses for each item. Notice that the items are plotted in order from most to least agreement. The second plot (Figure 12) forces the plot to be in the order of the column names, removes the centering, and increases the text size of the axis labels. The third plot (Figure 13) shows how the default colors can be changed using the color arguments.

Figure 11. Default centered bar plot generated with the likert package.

Figure 12. Filled bar plot generated with the likert package ordered by question. 99

Figure 13. Filled bar plot generated with the likert package ordered by question using user-specified color scheme.

Figure 14. Centered bar plot generated with the likert package with response distributions grouped by country. 100

Since this dataset also contains a grouping variable, Country, the response distributions can also be plotted by Country. First, a new likert summary that incorporates the grouping variable must be created, then the new likert object can be plotted. The group.order argument is used to change the order in which the groups appear (Figure 14) in grouped plots instead of changing the item order. These types of plots are most useful for comparing response distributions for groups under different experimental conditions (18) or with different levels of training in chemistry (19). The final plot (Figure 15) is a heat plot that shows the mean, standard deviation, and percent selection of responses for each item. However, this type of plot does not allow for the specification of item order or plotting by groups. To plot by groups, the dataset would need to be split into smaller datasets by group and a likert summary created and plotted separately for each group.

Figure 15. Heat plot generated with the likert package. Further modifications of the plots produced by the likert package are possible, including adding titles, plotting item response densities, and adding histograms. Additional information about the capabilities of the likert package can be found at http://jason.bryer.org/likert/. The plots produced with likert, as with all plots produced with R, can be exported or copied to the clipboard using the Export button in the Plots pane. These customizable, publication-ready, professional looking plots provide a simple way to visualize response distributions for survey items.

Computing Descriptive Statistics and Reliabilities with Psych While the likert package computes some descriptive statistics for the heat plot, the psych package computes more comprehensive descriptive statistics including skewness and kurtosis values that can be applied to other data types beyond those collected using Likert-type scales. Additionally, the psych package provides a simple way to compute descriptive statistics broken down by grouping variables. These data summaries can then be exported into spreadsheets for later use. The psych package also provides multiple methods for calculating scale reliability, 101

such as Cronbach’s α for internal consistency. Prior to using the functions in the psych package, the package must be installed and loaded in the same way as the likert package. Once the psych package is loaded, new functions will be available including describe(), describeBy(), and alpha(). Computing and Exporting Descriptive Statistics In contrast to how the summary() function discussed earlier simply provided a count of responses in each category (Figure 10), the describe() and describeBy() functions in the psych package understand the underlying numeric representation of the factors denoting the survey item response categories. These functions will compute means, standard deviations, medians, and other descriptive statistics for the item responses stored as factor variables (Figure 16). These descriptive statistics are the most frequently reported in the CER literature (20–22) and the ability to export the values to spreadsheets aids in the preparation of tables for publication.

Figure 16. Descriptive statistics produced by the describe() function. In Figure 16 all six variables are marked with an asterisk to denote their original form as factor variables. While the descriptive statistics may be meaningful for the five items related to reading, because of the ordinal nature of the response scale, the descriptive statistics for the nominal Country variable are less meaningful. When performing calculations on factor variables it is prudent to check that the numeric assignment of the factor labels is as expected. The function table() is helpful in constructing a cross tabulation of a variable and its underlying numeric representation as defined by the function as.numeric().

Figure 17. Cross tabulation of factor variables in the minipisa dataset and their underlying numeric representation. 102

The results of this command are shown in Figure 17 and confirm that higher numeric values have been assigned to responses with stronger agreement. The describeBy() function computes the same descriptive statistics as the describe() function but calculates the statistics separately for each group of responses. This is accomplished by providing describeBy()two arguments: the name of the dataset for which to compute descriptive statistics and the name of a grouping variable.

Figure 18 shows that when using the entire minipisa dataset describeBy()also computes descriptive statistics for the Country variable. The values of NaN for the skew and kurtosis stand for Not a Number and indicate an undefined calculation result since there is only a single value for the Country variable once split into groups.

Figure 18. Result of using the describeBy() function to compute summary statistics for the minipisa dataset by country.

The result of both describe() and describeBy()can be exported as spreadsheets, with a slightly different process for each function. To save the results of describe(), assign the output of the function to a variable instead of letting it print to the Console, then use the function write.csv() to tell R what to name the resulting .csv file.

103

The resulting .csv file will appear in the same directory of your computer that RStudio is using as your working directory (this can be configured in RStudio’s preferences menu). To save the results of describeBy() use the mat = TRUE argument to store the output as a matrix that can be easily written to a spreadsheet. The write.csv() command does not change.

Computing Evidence for Internal Consistency (Cronbach’s α) Cronbach’s α is frequently used in CER to provide reliability evidence based on the internal consistency of various instruments including surveys, concept inventories, and other assessments (3, 4, 16, 22–36), but other methods of demonstrating reliability are also available and may be better suited for different situations (37). The psych package provides a few types of reliability calculations and many other reliability calculations are available through different packages. Searching the Help pane for the terms “alpha” or “reliability” will generate a list of the various functions available to meet different needs. The most straightforward calculation of Cronbach’s α (and Guttman’s λ6) can be performed with the alpha() function in the psych package. In previous analyses in this chapter, some functions recognized the numeric values underlying the factors associated with the minipisa dataset. However, using the command alpha(minipisa[,2:6]) produces an error because the data is stored as a factor, not as numeric responses. The simplest way to convert the variables to the correct format is to wrap the dataset in the function data.matrix() which forces the factors into their numeric equivalent.

Figure 19 shows the result of executing this code which produces values for α (both raw and standardized along with the associated 95% confidence interval) and λ6 (G6) in addition to item level statistics (not shown in Figure 19). However, a message will appear and the output will indicate that some items were negatively correlated with the scale and should likely be reverse coded. Helpfully, a suggestion is provided to rerun the function with the argument check.keys=TRUE. 104

Figure 19. Output from the alpha() function.

As shown in Figure 20, adding this argument produces a much larger value for alpha and the output identifies the reverse coded items with a negative sign. It seems reasonable for the first and fourth items to be reverse coded since they describe negative associations with reading while the other three items are positively worded.

Figure 20. Output from the alpha() function after letting the function reverse code items with a negative correlation to the scale total. It is also possible to specify ahead of time which items should be reverse coded instead of relying on alpha()to identify them. The keys argument is used to indicate which items to reverse code. This argument accepts a list equal in length to the number of items with either a positive or negative 1 in the position representing each item. Items to be reverse coded are indicated with negative 1. The function c() combines the numbers into a single list.

105

The ability to compute descriptive statistics and reliability values make the psych package a valuable tool for analysis of quantitative data from various types of instruments including surveys, concept inventories, and other assessments. The psych package provides other functions for computing estimates of reliability. The splitHalf() function computes numerous split-half reliability values using λ4, λ6, λ3 (equivalent to α), and β. The omega() function uses a factor analysis approach to calculating reliability that is applicable to situations where an instrument is not unidimensional. The next section will discuss how to assess the dimensionality of an instrument.

Factor Analysis with Psych and Lavaan Exploratory Factor Analysis with Psych In addition to reliability evidence for instruments based on internal consistency, evidence for the validity of an instrument based on its internal structure is frequently reported in the CER literature in the form of factor analysis results (23). Exploratory factor analysis (EFA) is typically used during instrument development when there is no hypothesized structure for how individual items may be related to each other and underlying latent variables (factors). In CER, EFA has been used in the development of the Meaningful Learning in the Laboratory Instrument (MLLI) (30) and to investigate the structure of the Chemistry Self-Concept Inventory (CSCI) when administered to high school students (3). This chapter will focus on the mechanics of performing factor analysis in R, not the associated analysis decisions; interested readers are directed to the examples in the CER literature as well as statistical and psychometric references for those considerations (38–41). While the term EFA more specifically describes a technique that models variance shared among items, principal components analysis (PCA) is frequently categorized as a type of exploratory factor analysis since its results can be similar to EFA (42–44). However, PCA is more specifically a data reduction technique that analyzes all the variance in a dataset, including variance due to measurement error. The psych package provides useful functions for both PCA and EFA. As with alpha(), the functions principal() for PCA and fa() for EFA require the data to be in a numeric format, accomplished as before with data.matrix(). This numeric format is also required for other functions that may be useful in examining the suitability of the data for factor analysis: cortest.bartlett() for Bartlett’s test of sphericity and KMO()for Kaiser–Meyer–Olkin Measure of Sampling Adequacy (MSA). Figure 21 shows the output from calling these two functions with the five reading items in the minipisa dataset. The use and interpretation of these values in exploratory factor analysis are described in statistical references (38) in addition to the CER literature utilizing factor analysis techniques (3, 4). Note that Bartlett’s test gives a message that the data (R) was not square because the raw responses were provided instead of a correlation matrix. 106

Figure 21. Output from Bartlett’s and Kaiser–Meyer–Olkin tests.

The function principal() takes the dataset as the first argument and arguments are used to set the number of components to extract, select the type of rotation to use (default is varimax), and determine whether or not to compute component scores. As with all functions, detailed information about these arguments can be viewed by searching in the Help pane or by typing the name of the function in the Console with a question mark in front, ?principal. The code below runs a PCA extracting two factors with no rotation and saves the results as pca2.

Typing either pca2 or print(pca2) into the Console will print a summary of the PCA (Figure 22). The standardized loading matrix is equivalent to the Component Matrix produced in SPSS when using principal components analysis as the extraction method. The values for h2 are equivalent to the communalities given by SPSS. The second table provides information about the sums of squared (SS) loadings and total variance explained by the components, similar to SPSS. EFA is conducted using the fa() function and takes similar arguments to the principal() function including a numeric dataset, the number of factors to extract, the rotation to use (default is oblimin), and the factor method to use. The fa() default factor method is a minimum residual solution but principal axis factoring and maximum likelihood (ml; used in this example) are also available. Additionally, fa() provides an argument to use a polychoric correlation matrix for polytomous items rather than the default Pearson correlation. As with the principal() output, the standardized loadings (the pattern/factor matrix) are provided along with the communalities and sum of squared loadings. The loadings and correlations can also be written to .csv files using the method previously discussed for descriptive statistics. A plot can be created to show how the items load on each factor, using the item names as the factor labels (Figure 23). 107

Figure 22. Summary of PCA results.

Figure 23. Plot of item loadings in factor space labeled with item wordings. To help determine the number of factors to extract a scree plot can be made by plotting the eigenvalues with the command plot(efa2$e.values, type="b"). Figure 24 shows that eigenvalues for all five factors were computed and can be plotted even though only information about two factors was requested in the 108

output of the original function call. An identical plot can be made for the results of principal()with plot(pca2$values, type="b").

Figure 24. Scree plot produced by plotting eigenvalue results of the fa() function.

In addition to using a scree plot to determine the number of factors (or components) to extract, as was done in research with the MLLI (32), the psych package also contains functions that provide multiple alternate criteria for determining the number of factors to extract: fa.parallel() for parallel analysis and vss() for Very Simple Structure (VSS) and Velicer’s Minimum Average Partial Correlation (MAP). Parallel analysis was used in determining the number of factors to extract for research with the CSCI (3).

Confirmatory Factor Analysis with Lavaan Confirmatory factor analysis (CFA) is used to test the internal structure of an instrument against a previously hypothesized structure (23). The lavaan (latent variable analysis) package contains functions necessary to perform CFA as well as other latent variable analysis techniques including multi group models, path analysis, structural equation modeling, and latent growth modeling (13). This chapter focuses on the use of CFA, but resources are available to explore the advanced capabilities of the lavaan package (45, 46). Readers interested in testing the capabilities of lavaan without installing R may be interested in a recently developed fully online deployment of the lavaan package that generates both statistical output and plots (47). As with the previously discussed factor analysis functions, the data must be in numeric format to use cfa(). It is also helpful to have short variables names to identify items. For the minipisa dataset, this can be accomplished by subsetting the original pisaitems dataset again to use the original column names and using data.matrix() to convert the factors to their numeric representation. 109

CFA is used to test how well data fit a hypothesized model of the internal structure of an instrument. The models to be tested are typically based on a theory of how the instrument items are related to one or more underlying latent variables (factors). CFA is often used to test newly collected data against an existing structure specified by the instrument developers. This use of CFA in CER can be seen in work with the Approaches to Teaching Inventory (ATI) (16), the Attitude towards the Subject of Chemistry Inventory (ASCI/ASCIv2) (48, 49), and other instruments measuring motivation in chemistry (50, 51). One hypothesized model for the five items in the minipisa dataset might be that all items are associated with a single factor representing Reading Enjoyment. A visual representation of that model is shown in Figure 25 where the oval represents the latent variable (factor) of Reading Enjoyment and the rectangles represent the measured variables (items). Arrows pointing from the latent variable to the measured variables indicate that the latent variable has a causal influence on responses to the items, typically referred to as the loading of each item on the factor. Each item also has a shorter arrow pointing to it representing the measurement error associated with responses to that item.

Figure 25. Simplified one-factor CFA model for the minipisa items.

This model must be translated into lavaan specific model syntax to be used with cfa(). In lavaan, latent variables are defined by their measured variables with the =~ operator. Using the name READ for the latent variable and the column names inherited from the original pisaitems dataset, the syntax for the one-factor model is shown below, saved as the variable pisa.CFA.1F. This model can then be passed to cfa()along with the dataset to be used in the analysis.

110

Calling cfa()with only the model and data arguments uses the default settings of maximum likelihood estimation, setting the scale of the latent variable by fixing the loading of the first variable to 1, and dealing with missing data by listwise deletion. Additional arguments can be passed to cfa() in order to standardize the latent variable, use a full information maximum likelihood approach to dealing with missing data (52), define a group variable for multi group analysis, use an estimator such as Satorra-Bentler or weighted least squares to address non normal or categorical data (53, 54), enable bootstrapping, and mimic the Mplus program for calculations. These arguments and others are described in the help page for cfa().

Figure 26. Fit statistics provided by lavaan. 111

Initially, printing the results of the cfa() will only display the model chi-square (χ2) value and degrees of freedom. More detailed data-model fit information can be requested with summary(). The code below requests detailed fit statistics (Figure 26), standardized model parameters, and R2 values (Figure 27). Modification indices can also be requested to determine if changes should be made to the model such as adding error covariance terms or removing poor performing items (50); the code provided sorts the modification indices in order from highest to lowest before printing the first ten (Figure 28). The fit statistics provided by lavaan, shown in Figure 26, are similar to what other latent variable analysis programs such as SPSS Amos, Mplus, and LISREL provide. This includes the χ2 value for the model, labeled as the Minimum Function Test Statistic, the degrees of freedom for the model and the significance value for the χ2 value, all provided in the first section of the output. Other frequently reported fit indices include the comparative fit index (CFI), root mean square error of approximation (RMSEA) and its associated 90% confidence interval, and the standardized root mean square residual (SRMR). These fit indices are reported in the CER literature when CFA is conducted (16, 48–51) and recommendations for acceptable values can be found in the statistics literature (55, 56).

Figure 27. Parameter estimates provided by lavaan. Model parameters are provided by lavaan in multiple formats. As shown in Figure 27, the first set of values are the loadings of the items on the latent variable of Reading Enjoyment and the second set of values are the variances for the measured and latent variables. The Estimate column in Figure 27 shows the unstandardized model parameters. This is followed by columns for the standard 112

error of the estimates, z scores, and significance values. Two types of standard estimates are provided, those with only the latent variable standardized (Std.lv) and those with both the measured variables and latent variables standardized (Std.all), the latter being the typically reported standardized parameter estimates. If the argument to standardize the latent variable was provided to cfa(), then the Estimates and Std.lv columns will be equivalent.

Figure 28. Modification indices provided by lavaan.

The modification indices provided by lavaan (Figure 28) show how the model χ2 can be lowered by changing aspects of the model (mi). The ~~ notation indicates that adding a covariance term between ST24Q01 and ST24Q04 (the two negatively worded items) will lower the χ2 value of the model by approximately 2000. The expected parameter value for the suggested modifications are provided in both unstandardized (epc) and standardized (sepc.lv, sepc.all, sepc.nox) versions. If desired, the new parameters identified by the modification indices can be added as additional lines in the model specification syntax. Best practices in latent variable analysis recommend a theoretical reason for making the suggested modification, not just to improve data-model fit (50). The command capture.output() is useful for capturing the lavaan output and saving it as a text file for later reference. The code below can be used to write both the summary information and modification indices to a single file. As with write.csv() the text file will appear in the current working directory.

The functionality of the lavaan package is on par with commercially available latent variable analysis software (57). Additional packages such as semPlot (58) provide the ability to generate plots from lavaan output. A benefit to using R for latent variable analysis is that initial data processing, cleaning, and summary steps can all be completed within the same program. 113

Summary This chapter explored the use of three R packages for quantitative survey data analysis using a dataset available within R. Response patterns were visualized using functions from the likert package. The psych package was used to compute descriptive statistics, calculate internal consistency reliability values such as Cronbach’s α, and perform both principal components and exploratory factor analysis. Finally, confirmatory factor analysis was conducted with the lavaan package. The purpose of this chapter was to demonstrate that, while free and open-source, R is a robust and accurate data analysis tool that contains the functionality necessary to replace many commercially available software packages currently utilized in the CER community for data analysis including SPSS, Excel, and Stata. For each type of analysis conducted, examples from the CER literature were provided to highlight the applicability of R to current research topics and analyses. Additional benefits of R include the capability to complete all stages of analysis within the same program and the ability to save and reuse code to reproduce and share analyses. Readers are encouraged to join the growing community of R users by downloading R and RStudio, following along with the examples in this chapter, modifying the provided code to analyze their own data, and exploring the numerous resources available both in print and online.

Recommended Resources For readers interested in using R instead of commercially available software packages, the following resources provide support for the transition to using R. Beaujean, A. A. Latent Variable Modeling Using R: A Step-by-Step Guide; Routledge: New York, 2014. Dalgaard, P. Introductory Statistics with R, 2nd ed.; Springer: New York, 2008. Field, A.; Miles, J.; Field, Z. Discovering Statistics Using R; SAGE: Los Angeles, 2012. Kabacoff, R. I. Quick-R http://www.statmethods.net/index.html (accessed Nov 25, 2016). UCLA: Statistical Consulting Group. Resources to Help You Learn and Use R http://www.ats.ucla.edu/stat/r/ (accessed Nov 25, 2016).

References 1.

2. 3. 4. 5.

Tang, H.; Ji, P. In Tools of Chemistry Education Research; Bunce, D. M., Cole, R. S., Eds.; ACS Symposium Series 1166; American Chemical Society: Washington, DC, 2014; pp 135–151. Fox, L. J.; Roehrig, G. H. J. Chem. Educ. 2015, 92, 1456–1465. Nielsen, S. E.; Yezierski, E. J. Chem. Educ. 2015, 92, 1782–1789. Tang, H.; Abraham, M. R. J. Chem. Educ. 2016, 93, 31–38. Warfa, A.-R. M. J. Chem. Educ. 2016, 93, 248–255. 114

6. 7. 8. 9. 10. 11. 12.

13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39.

Harshman, J.; Yezierski, E. J. Chem. Educ. 2016, 93, 239–247. Bodé, N. E.; Flynn, A. B. J. Chem. Educ. 2016, 93, 593–604. Velasco, J. B.; Knedeisen, A.; Xue, D.; Vickrey, T. L.; Abebe, M.; Stains, M. J. Chem. Educ. 2016, 93, 1191–1203. Tippmann, S. Nature 2015, 109–110. The R Foundation. The Comprehensive R Archive Network; https://cran.rproject.org/ (accessed Nov 20, 2016). Bryer, J.; Speerschneider, K. likert: Functions to Analyze and Visualize Likert Type Items. R package version 1.3.3; 2015. Revelle, W. Northwestern University: Evanston, IL psych: Procedures for Psychological, Psychometric, and Personality Research. R package, version 1.6.9; Northwestern University: Evanston, IL, 2016. Rosseel, Y. J. Stat. Softw. 2012, 48, 1–36. RStudio. Download RStudio; https://www.rstudio.com/products/rstudio/ download/ (accessed Nov 20, 2016). Hall, D. M.; Curtin-Soydan, A. J.; Canelas, D. A. J. Chem. Educ. 2014, 91, 37–47. Stains, M.; Pilarz, M.; Chakraverty, D. J. Chem. Educ. 2015, 92, 1466–1476. Zane, M.; Tucci, V. K. J. Chem. Educ. 2016, 93, 406–412. Priest, S. J.; Pyke, S. M.; Williamson, N. M. J. Chem. Educ. 2014, 91, 1787–1795. Talanquer, V. J. Chem. Educ. 2013, 90, 1419–1424. Walker, J. P.; Sampson, V. J. Chem. Educ. 2013, 90, 1269–1274. Weaver, G. C.; Sturtevant, H. G. J. Chem. Educ. 2015, 92, 1437–1448. Ryan, M. D.; Reid, S. A. J. Chem. Educ. 2016, 93, 13–23. Arjoon, J. A.; Xu, X.; Lewis, J. E. J. Chem. Educ. 2013, 90, 536–545. Southam, D. C.; Lewis, J. E. J. Chem. Educ. 2013, 90, 1425–1432. Luxford, C. J.; Bretz, S. L. J. Chem. Educ. 2014, 91, 312–320. Springer, M. T. J. Chem. Educ. 2014, 91, 1162–1168. Milenković, D. D.; Segedinac, M. D.; Hrin, T. N. J. Chem. Educ. 2014, 91, 1409–1416. Bretz, S. L.; McClary, L. J. Chem. Educ. 2015, 92, 212–219. Shedlosky-Shoemaker, R.; Fautch, J. M. J. Chem. Educ. 2015, 92, 408–414. Galloway, K. R.; Bretz, S. L. J. Chem. Educ. 2015, 92, 1149–1158. O’Dwyer, A.; Childs, P. J. Chem. Educ. 2015, 92, 1159–1170. Galloway, K. R.; Bretz, S. L. J. Chem. Educ. 2015, 92, 2006–2018. Hibbard, L.; Sung, S.; Wells, B. J. Chem. Educ. 2016, 93, 24–30. Cicuto, C. A. T.; Torres, B. B. J. Chem. Educ. 2016, 93, 1020–1026. Franco-Mariscal, A. J.; Oliva-Martínez, J. M.; Blanco-López, Á.; EspañaRamos, E. J. Chem. Educ. 2016, 93, 1173–1190. Hensiek, S.; DeKorver, B. K.; Harwood, C. J.; Fish, J.; O’Shea, K.; Towns, M. J. Chem. Educ. 2016, 93, 1847–1854. Zinbarg, R. E.; Revelle, W.; Yovel, I.; Li, W. Psychometrika 2005, 70, 123–133. Field, A.; Miles, J.; Field, Z. Discovering Statistics Using R; SAGE: Los Angeles, 2012. Henson, R. K. Educ. Psychol. Meas. 2006, 66, 393–416. 115

40. Jackson, D. L.; Gillaspy, J. A.; Purc-Stephenson, R. Psychol. Methods 2009, 14, 6–23. 41. Velicer, W. F.; Eaton, C. A.; Fava, J. L. In Problems and Solutions in Human Assessment; Goffin, R. D., Helmes, E., Eds.; Kluwer: Boston, MA, 2000; pp 41–71. 42. Velicer, W. F.; Jackson, D. N. Multivariate Behav. Res. 1990, 25, 1–28. 43. Widaman, K. F. Multivariate Behav. Res. 1993, 28, 263–311. 44. Towns, M.; Harwood, C. J.; Robertshaw, M. B.; Fish, J.; O’Shea, K. J. Chem. Educ. 2015, 92, 2038–2044. 45. Rosseel, Y. The lavaan Project; http://lavaan.ugent.be/ (accessed Nov 20, 2016). 46. Beaujean, A. A. Latent Variable Modeling Using R: A Step-by-Step Guide; Routledge: New York, 2014. 47. Moon, K.-W. Shiny Application: Structural Equation Modeling with R; http:/ /198.74.50.54:3838/r-sem/ (accessed Nov 25, 2016). 48. Xu, X.; Lewis, J. E. J. Chem. Educ. 2011, 88, 561–568. 49. Brandriet, A. R.; Xu, X.; Bretz, S. L.; Lewis, J. E. Chem. Educ. Res. Pract. 2011, 12, 271. 50. Ferrell, B.; Barbera, J. Chem. Educ. Res. Pract. 2015, 16, 318–337. 51. Salta, K.; Koulougliotis, D. Chem. Educ. Res. Pr. 2015, 16, 237–250. 52. Enders, C. K. In Structural Equation Modeling: A Second Course; Hancock, G. R., Mueller, R. O., Eds.; Quantitative methods in Education and the Behavioral Sciences: Issues, Research, and Teaching; Information Age Publishing: Charlotte, NC, 2013; pp 493–519. 53. Finney, S. J.; DiStefano, C. In Structural Equation Modeling: A Second Course; Hancock, G. R., Mueller, R. O., Eds.; Quantitative Methods in Education and the Behavioral Sciences: Issues, Research, and Teaching; Information Age Publishing: Charlotte, NC, 2013; pp 439–492. 54. Flora, D. B.; Curran, P. J. Psychol. Methods 2011, 9, 466–491. 55. Mueller, R. O.; Hancock, G. R. In Best Practices in Quantitative Methods; Osborne, J. W., Ed.; Sage Publications, Inc.: Thousand Oaks, CA, 2008; pp 488–508. 56. Hu, L.; Bentler, P. M. Struct. Equ. Model. A Multidiscip. J. 1999, 6, 1–55. 57. Narayanan, A. Am. Stat. 2012, 66, 129–138. 58. Epskamp, S. semPlot: Path Diagrams and Visual Analysis of Various SEM Packages’ Output. R package version 1.0.1; 2014.

116

Chapter 8

Assessment of the Effectiveness of Instructional Interventions Using a Comprehensive Meta-Analysis Package Alexey Leontyev,*,1 Anthony Chase,2 Steven Pulos,3 and Pratibha Varma-Nelson2,4 1Department of Chemistry, Computer Science and Mathematics, Adams State University, Alamosa, Colorado 81101, United States 2STEM Education Innovation and Research Institute (SEIRI), Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States 3School of Psychological Sciences, University of Northern Colorado, Greeley, Colorado 80639, United States 4Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States *E-mail: [email protected].

The purpose of this manuscript is to introduce a software solution for a meta-analysis and to provide information on some steps of conducting a meta-analysis. We have used a set of papers that investigate the effectiveness of peer-led team learning strategies as an example for conducting a meta-analysis. Comprehensive Meta-Analysis software was used to calculate effect sizes, produce a forest plot, conduct moderator analysis, and assess the presence of publication bias.

Introduction When multiple studies addressing the same topic exist, a meta-analysis can be conducted to summarize, integrate, and interpret the results of these studies. One can view a meta-analysis as conducting research on research. In meta-analyses, instead of surveying people, which is a common way of conducting research in social sciences, research reports are surveyed, essential information is extracted © 2017 American Chemical Society

from them, and the resulting information is analyzed using adapted statistical techniques. Meta-analysis is conducted to investigate dispersion of effects found in multiple studies and to assess what factors influence the direction and the magnitude of these effects. Another advantage of a meta-analysis is that instead of looking at the p-values, we are working directly with effect sizes and interpreting them in our given context. Effect size is a measure of the difference between two groups that emphasizes the size of that difference rather than confounding this with sample size. The purpose of this manuscript is to demonstrate how some steps of metaanalysis can be performed in Comprehensive Meta-Analysis package (CMA). In our meta-analysis, we included studies that were reported in the review by Wilson and Varma-Nelson (1) and satisfied the following criteria: i) the study design included two or more groups; ii) at least one of the groups used peer-led team learning (PLTL), and at least one was a comparison group that did not use PLTL; iii) enough statistical information about achievement outcomes was reported or both groups, ether in the form of mean scores or success rates. The purpose of this chapter is not to conduct a comprehensive meta-analysis of PLTL literature but rather to show the functionality of the software. It is most likely that published papers that are included in this meta-analysis represent only a small sample of all possible studies. For example, it is recommended to include gray literature (conference proceedings, dissertations, and grant reports) in meta-analyses but it was not included in this analysis because it was not included in the original review article by Wilson and Varma-Nelson (1). One of the advantages of using this real data set for the analysis is that it allows readers to see obstacles of meta-analysis such as multiples outcomes and missing data. From the review by Wilson and Varma-Nelson (1), we found 16 studies to be included in the following meta-analysis (2–17). All of these studies have in common that they investigate the effectiveness of PLTL by comparing them with alternative strategies. We had to exclude several studies, for example, the study (18) which compares traditional PLTL with PLTL in cyberspace (cPLTL). The included studies are summarized in Table 1. The analysis was performed in Comprehensive Meta-Analysis (version 3.3.070) package developed by Biostat, Inc. This software was chosen for its rich functionality, ease of use, and high quality graphics. The first author of the manuscript (AL) received an evaluation copy of the software for one year from Biostat, Inc. A trial version can be downloaded from https://www.meta-analysis.com. Note that the trial version will only work for 10 days or 10 runs. While it is possible to complete a meta-analysis in one run having all data ready to enter into the trial version, we felt obliged to inform our readers about alternatives for conducting a meta-analysis. A section describing alternatives can be found at the end of the chapter.

118

Table 1. Included Studies, Discipline, and Reported Outcomes Study

Outcome

Discipline

Akinyele, 2010 (2)

General, Organic, and Biochemistry (GOB)

Achievement

Baez-Galib et al., 2005 (3)

General Chemistry

Passing

Chan & Bauer, 2015 (4)

General Chemistry

Achievement

Hockings et al., 2008 (5)

General Chemistry

Passing

Lewis & Lewis, 2005 (6)

General Chemistry

Achievement

Lewis & Lewis, 2008 (7)

General Chemistry

Achievement

Lewis, 2011 (8)

General Chemistry

Achievement and Passing

Lyon & Lagowski, 2008 (9)

General Chemistry

Achievement

McCreary et al., 2006 (10)

General Chemistry Labs

Achievement

Mitchell et al., 2012 (11)

General Chemistry

Passing

Rein & Brookes, 2015 (12)

Organic Chemistry

Achievement

Shields et al., 2012 (13)

General Chemistry

Achievement

Steward et al., 2007 (14)

General Chemistry

Passing

Tenney &Houck, 2003 (15)

General Chemistry

Passing

Tien et al., 2002 (16)

Organic Chemistry

Achievement

Wamser, 2006 (17)

Organic Chemistry

Passing

Figure 1. Logic of Comprehensive Meta-Analysis.

119

Entering Data into CMA CMA includes two distinct parts of its interface. The first is when you start the program, you will see a table where you should enter all studies and their statistical information. The second is when you click the Run Analysis button, you will see the window with the analysis of the entered studies and all computational functions of CMA. The logic of the program is shown in Figure 1. In this manuscript, we will refer to these parts of the program as STUDIES TAB and ANALYSIS TAB. When you open CMA, it automatically starts in the STUDIES TAB, where you need to enter information about the studies, type of outcome, their statistical characteristics, and moderator variables. CMA displays the spreadsheet depicted in Figure 2 when you start it.

Figure 2. CMA at start.

At this point, the program does not recognize a type of information you enter in columns. You need to insert columns for study names, outcomes, and moderators. To do this, go to Insert → Column for → Study names. If the data includes two or more outcomes, repeat the same procedure for Outcome names. For our dataset, we are going to treat the discipline as a moderator variable. PLTL studies that we included in our analysis were done in various classes, such as general chemistry or organic chemistry. This suggests that the moderator is a categorical variable. To add a column for a moderator variable, click on Insert → Column for → Moderator variable. When you add a column for a moderator, you will see a dialog box (Figure 3), where you need to enter the name for the moderator and specify the data type of this moderator. 120

Figure 3. The dialog box for entering a moderator in CMA.

After you have entered the data you can select an appropriate measure form an array of effect size indicies. Since the studies we included in out dataset utilize experimental or quasi-experimental designs based on means, the appropriate choice for effect size would be Hedges’s g. Hedges’s g is similar to the well known Cohen’s d effect size statistic, which is commonly used as a measure of effect sizes for the studies that involve comparison between groups.

However, Cohen’s d overestimates effect sizes for small samples. This bias can be eliminated by using Hedge’s g which includes a correction factor (19).

CMA allows data entry in more than 100 different formats, but for this dataset, we used only three. Included studies mainly use two types of outcomes: achievement scores and passing rates. These studies should be entered with corresponding information necessary to compute effect sizes. Several studies (2,4,9) include multiple measures of the same outcomes for the same or different 121

groups. In these cases, we averaged the effect sizes and entered them directly into the program. CMA also offeres the option to merge data from multiple outcomes or groups automatically. To enter the column for the studies that used direct comparisons of two groups, click Insert → Column for → Effect size data. In the dialog box, select Show common formats only → Comparison of two groups, time-points, or exposures (includes correlations) → Two groups or correlations → Continuous (means) → Unmatched groups, post data only → Mean, SD and sample size in each group. When you click on the last option, a dialog box will appear where you can specify the names for each group. It is convenient to go with the options offered by the program and name Group-A as Treated and Group-B as Control. To enter the columns for studies that report the dichotomous outcome, click on the same options as in the previous paragraph, but select Dichotomous (number of events) under Two groups or correlation. Then select Unmatched groups, prospective (e.g., controlled trials, cohort studies) → Events and sample size in each group. To add the last option for the effect sizes, click on Two groups or correlation → Continuous (means) → Computed effect sizes → Hedge’s g (standardized by pooled within-groups SD) and variance. After all formats for entering effect size data are entered, you can see the tabs at the bottom of the screen (Figure 4). The highlighted tab indicates that you can enter the data in the corresponding format. For our meta-analysis, we also selected Hedges’s g as our primary index (located in the yellow column) at the very end of the data entry table.

Figure 4. One of the templates for entering the data in CMA.

Now we can begin entering studies. After a study is entered and the effect direction is specified as Auto, you should see effect sizes for each study in the 122

yellow column. To switch between data entry formats, click on the tabs at the very bottom of the screen. After all studies are entered, the tab should look as in Figure 5. It is possible to upload all data into an Excel spreadsheet, and then into CMA, where you can specify the columns. This approach works well with existing datasets where all studies report data in the same format. Often, this is not the case with educational research studies.

Analysis of Data in CMA After all data are entered, click on the Run Analysis button at the top of the screen and that will bring you to the ANALYSIS TAB with the outcomes of the analysis. You will see a table with all effect size information and the so-called forest-plot, a graphical interpretation of effect sizes explained later (Figure 6). From the ANALYSIS TAB, you can have several computational options that will allow you to see different facets of a meta-analysis. Some of them are listed in the sections below.

Figure 5. STUDIES TAB after all studies are entered into CMA.

Fixed versus Random Effect At the bottom of the screen, you can see tabs Fixed, Random, Both Models. These tabs indicate which model was used to produce the overall effect size. The fixed-effect model is based on the assumption that the observed effect is constant for all studies and varies only due to sampling error. The random-effects model assumes that the true effect varies from study to study. For our data, since participants are coming from different populations, the random effects model is the most appropriate. However, with a small number of studies, estimates that are 123

used to calculate random effects are not stable, so it might be beneficial to look at estimates and confidence intervals for both models.

Figure 6. ANALYSIS TAB of CMA.

Figure 7. Sensitivity analysis. 124

Sensitivity Analysis If you click on the tab One study removed at the very bottom of the screen, you will see the results of the so-called sensitivity analysis (Figure 7). In the sensitivity analysis, the overall effect is computed by removing one study at a time. In this case, the effect shown on each row is not the effect for that row’s study, but rather the summary effect for all studies, with that study removed. Generally, this does not change the results for meta-analyses that contain more than 10 studies. As we can see from the results of the sensitivity analysis, an elimination of any study does not lead to any change in the effect that would be of substantive import.

Analysis of Heterogeneity Once you click the Next table button, it will toggle the display, and will present you with another computational outcome of meta-analysis (Figure 8). You will see numerical overall effect sizes with their standard error, variance, lower and upper limit of confidence intervals. The test of the null hypothesis tells us that the overall effect is significantly different from zero. You will also see the Q-value (along with corresponding df and p values) and I-squared, both are measures of heterogeneity statistics. The Q-test tells us that there is dispersion across effect sizes. The I-squared statistics attempts to quantify how much of the study-to-study dispersion is due to real differences in the true effects. Tau-squared is an estimate of between-studies variance, and Tau is an estimate of the between-studies standard deviation. Note that the values for Q statistics, I-squared, and tau estimates are reported on the line with the fixed effect models, because these values are computed using weights computed under fixed effect assumptions. However, these statistics apply to both statistical models.

Figure 8. Numerical outcomes of meta-analysis: overall effect size, test of null hypothesis, heterogeneity analysis, and tau-squared estimates. 125

Forest Plot Once you click the High resolution plot button, the program will get you to the graphical interface. There you can work with the forest plot. The menu will allow you to customize graphic parameters according to aesthetic preferences. A customized forest plot for the meta-analysis of PLTL is presented in Figure 9. Note that we removed some duplicate information to declutter the forest plot. The forest plot can be viewed as the end game of meta-analysis, so it is worth spending time and effort to better convey information about studies and results. The forest plot gets its name because it allows seeing “forest through the trees.” A forest plot is a graphic presentation of a result of a meta-analysis. The square is centered on the value of the effect size for each study. The location of the square represents the direction and magnitude of the effect of an intervention. The length of a line crossing that square indicates the 95% confidence intervals. In CMA, the size of the square is proportional to the sample size of the study. If this line crosses the g = 0 line, this indicates a non-significant study. In general, the confidence intervals are wider for the studies with the smaller number of participants. The diamond at the very bottom of the screen represents the overall effect size. The forest plot can be exported as a Word or PowerPoint file. To do this, select the corresponding option from the File menu.

Figure 9. Forest plot of meta-analysis.

Publication Bias Analysis Publication bias analysis is based on the same premise as a sensitivity analysis. While multiple approaches exist for testing the presence of publication bias, most of them are based on the idea that studies with non statistically significant p-values are more likely to be excluded from the analysis. This phenomenon is called the 126

file drawer problem, because studies that failed to find statistical significance are less likely to get published and instead they end up in file drawers. To perform this analysis, click on Analysis → Publication Bias. This will bring you to a funnel plot where effect sizes are plotted versus standard error. You can choose between two types of funnel plots: you can either plot effect sizes versus standard error or precision. To switch between these two, simply click on the button at the top of the screen: Plot precision or Plot standard error. Visual examination for asymmetry of the funnel plot may give some idea about the presence or absence of small non-significant studies. Small studies have a larger standard error, thus they are located in the lower part of the funnel plot. Non-significant studies are located in the left part of the funnel plot because their effect size is very small or negative. If a funnel plot appears to be symmetrical this indicates absence of publication bias. The funnel plot for our analysis in shown in Figure 10.

Figure 10. Funnel plot for publication bias analysis.

If you click on Next table button at the top of the screen, you will be shown a numerical analysis of publication bias. CMA allows the following types of analyses: classic fail-safe N, Orwin N fail-safe, Begg and Mazumdar rank correlation, Egger’s regression, Duval and Tweedie’s trim and fill procedure. Fail safe numbers represent how many non-significant studies are needed to zero the observed effect of intervention, Egger’s and Begg and Mazumdar investigate a relationship between study size and effect size, and Duval and Tweedie’s trim and fill procedure estimates the number of missing studies and corrects for funnel plot asymmetry arising from omitting these studies. Out of all of them, Duval and Tweedie’s method gives the most reliable estimate of the presence of publication bias. However, this method does not work well with datasets with high heterogeneity. For our dataset, this method shows two imputed studies 127

to the left of the mean. As we specified earlier, our analysis does not contain gray literature. Since gray literature is more likely to include non-significant results than published studies, it is quite likely that a careful search of conference proceedings will yield several small and nonsignificant studies that are similar to the imputed studies from Duval and Tweedie’s trim method. Figure 11 shows the funnel plot with imputed studies and the corrected overall effect. The theory underlying Duval and Tweedie’s trim method is quite simple. Publication bias leads to an asymmetry of the funnel plot. Studies that are small and failed to find significance are located to the left of the overall effect size. Duval and Tweedie’s is an iterative process that trims the most extreme studies and fills with computed estimates to preserve the symmetry of the funnel plot (19). The imputed studies that are shown on the plot are needed to keep the funnel plot symmetrical across the unbiased estimate of the overall effect.

Figure 11. Funnel plot with imputed studies.

Moderator Analysis The relationship between independent variable (mode of instruction) and dependent variable (achievement) can be affected by a moderator variable that can be either continuous or categorical. Examples of categorical moderators can be a type of chemistry class or geographical region where a study is conducted. Examples of continuous moderators can be duration of a study or attrition rates. To perform an analysis that investigates the impact of a moderator variable (discipline), go to Computational options → Group by, then select Discipline in the dialog box as a moderator variable and check both boxes for Also run analysis across levels of discipline and Compare effects at different levels of discipline as shown in Figure 12. Click Ok. 128

Figure 12. The dialog box for moderator analysis. Now we can see studies grouped by their moderator variables in the ANALYSIS TAB. To see only overall effects for each level of a moderator, click on the Show individual studies icon in the toolbar (Figure 13).

Figure 13. Effect sizes by moderator variables. Analyses for the GOB and General Lab categories do not produce any meaningful data because there is only one study for each of these levels of the discipline moderator. Comparison of PLTL studies that are done in General and Organic settings reveals a slightly higher effectiveness of PLTL for Organic classes. Click Next table for an analysis that reports a p-value for the between-group differences.

Conclusion We provided an overview of entering data and performing meta-analyses in CMA software. While we presented the core procedures for meta-analysis, this manuscript is not a comprehensive guide for meta-analysis. For example, we did not cover meta-regression, because we did not have a continuous variable as a moderator. We also did not show cumulative analysis due to the nature of our data. 129

However, the results that we showed illustrate the effectiveness of PLTL. Although it may not be possible to infer effectiveness of PLTL from a single study with non-significant findings, aggregating results of several studies described above increases the power of analysis. From the results of our meta-analysis, we can see that the overall effect size for the studies that use PLTL is 0.364. If we look at disaggregated effect sizes by different disciplines, we see that studies that are done in organic chemistry classes showed higher effectiveness (0.400) than studies done in general chemistry (0.331). There is also substantial variation between studies. The I-squared coefficient (84%) tells us that most of the variation in observed effects reflects variation in true effects, and not sampling error. The standard deviation of true effects (T) is 0.198. There are variations in the implementation of the PLTL model such as peer leader training and the length of the PLTL workshop. Both these factors can affect the effectiveness of PLTL. Future work can expand on differences identified between studies. The analysis did not reveal any substantial publication bias; however, this inference is not particularly strong due to high variation and a small number of studies.

Table 2. Selected Effect Sizes from Hattie’s Work (20) and Their Categories Category

Examples of factors and their influence

Zone of desired effects (d > 0.40)

Providing formative evaluation (0.90) Feedback (0.73) Meta-cognitive strategies (0.69) Problem solving teaching (0.61) Mastery learning (0.58) Concept mapping (0.57) Cooperative learning (0.41)

Teacher effects (0.15 < d < 0.40)

Computer assisted instruction (0.37) Simulations (0.33) Inquiry based teaching (0.31) Homework (0.29) Individualized instruction (0.23) Teaching test taking (0.22)

Developmental effects (d < 0.15)

Gender (0.12) Distance education (0.09)

Reverse effects (d < 0.0)

Summer vacation (0.09) Television (0.18)

One of the most important steps of meta-analysis is putting its results into the context of other studies. Here is one of the possible interpretations of effect sizes suggested by Hattie (20): instead of interpreting effect sizes as large, small, medium, Hattie suggested using 0.40 as the hinge point because it corresponds to a level where the effects of interventions are noticeable and are above the average

130

of all possible factors. Hattie also suggested the categorization of effects that is presented in Table 2. The overall impact of PLTL instruction is generally positive and appears to be greater for the upper division classes such as organic chemistry. PLTL interventions for organic chemistry, as found in this meta analysis, have a similar effect size as Hattie’s aggregate score for cooperative learning.

Other Software Solutions for Meta-Analysis The Comprehensive Meta-Analysis package is not the only software solution for meta-analysis. Technically, meta-analysis can be performed using hand calculations and standard graphic solutions. Table 3 includes various software products that can be used for performing meta-analyses.

Table 3. Possible Software Solutions for Meta-Analysis Description

Software R

There are more than 20 packages on CRAN for various stages of meta-analysis. The most common are rmeta, meta, metafor.

MIX

Add-in to perform meta-analysis with Excel 2007

RevMan

Software package used for preparing Cochrane Reviews

OpenMeta[Analyst]

Open-source software for performing meta-analyses

Stata

Software package used for running multilevel models that can be used for meta analyses. Several macros are available for meta-analysis.

References Note: * indicates those references that were included in the meta-analysis. Wilson, S. B.; Varma-Nelson, P. J. Chem. Educ. 2016, 93, 1686–1702. Akinyele, A. F. Chem. Educ. 2010, 15, 353–360*. Báez-Galib, R.; Colón-Cruz, H.; Wilfredo, R.; Rubin, M. R. J. Chem. Educ. 2005, 82, 1859–1863*. 4. Chan, J. Y. K.; Bauer, C. F. J. Res. Sci. Teach. 2015, 52, 319–346*. 5. Hockings, S. J. Chem. Educ. 2008, 85, 990–996*. 6. Lewis, S. E.; Lewis, J. E. J. Chem. Educ. 2005, 82, 135*. 7. Lewis, S. E.; Lewis, J. E. J. Res. Sci. Teach. 2008, 45, 794–811*. 8. Lewis, S. E. J. Chem. Educ. 2011, 88, 703–707*. 9. Lyon, D. C.; Lagowski, J. J. J. Chem. Educ. 2008, 85, 1571–1576*. 10. McCreary, C. L.; Golde, M. F.; Koeske, R. J. Chem. Educ. 2006, 83, 804–810*. 1. 2. 3.

131

11. Mitchell, Y. D.; Ippolito, J.; Lewis, S. E. Chem. Educ. Res. Pract. 2012, 13, 378–383*. 12. Rein, K. S.; Brookes, D. T. J. Chem. Educ. 2015, 92, 797–802*. 13. Shields, S. P.; Hogrebe, M. C.; Spees, W. M.; Handlin, L. B.; Noelken, G. P.; Riley, J. M.; Frey, R. F. J. Chem. Educ. 2012, 89, 995–1000*. 14. Steward, B. N.; Amar, F. G.; Bruce, M. R. M. Aust. J. Educ. Chem. 2007, 67, 31–36*. 15. Tenney, A.; Houck, B. J. Math. Sci. Collab. Explor. 2003, 6, 11–20*. 16. Tien, L. T.; Roth, V.; Kampmeier, J. A. J. Res. Sci. Teach. 2002, 39, 606–632*. 17. Wamser, C. C. J. Chem. Educ. 2006, 83, 1562–1566*. 18. Smith, J.; Wilson, S. B.; Banks, J.; Zhu, L.; Varma-Nelson, P. J. Res. Sci. Teach. 2014, 51, 714–740*. 19. Borenstein, M.; Hedges, L.; Higgins, J. P. T.; Rothstein, H. R. Introduction to Meta-Analysis; Wiley, 2013. 20. Hattie, J. Visible learning: A synthesis of over 800 meta-analyses related to achievement; Routledge, 2009.

132

Chapter 9

A Study of Problem Solving Strategies Using ATLAS.ti Tanya Gupta* Department of Chemistry and Biochemistry, South Dakota State University, Brookings, South Dakota 57007, United States *E-mail: [email protected].

The emphasis of this chapter is on the use of ATLAS.ti as a tool for analysis of qualitative data. A brief overview of ATLAS.ti software is provided. The chapter also includes a detailed example of the application ATLAS.ti software for a qualitative research study on the problem solving behavior in stoichiometry.

Introduction ATLAS.ti is a qualitative data analysis package. The history of qualitative data analysis goes back to the 1960s when the mechanization of data analysis began with word sorting in the discipline of humanities, literature and linguistics. However, by 1980s the computers became faster, stronger and easily accessible. The field of social science research evolved with the use of computer aided-research tools that became available in late 1980s. Around 1995, commercially available software tools were developed and were being successfully used in research and dealing with data sets that was unimaginable manually. These early software included HyperResearch, NVivo, Ethnograph, Hypersoft and ATLAS.ti (1–3). In present day and age,these tools find widespread applications for developing research projects conducting a literature review, data compilation, data analysis, and developing reports across various disciplines (4).

© 2017 American Chemical Society

This chapter is mainly focused on a qualitative research study on problem solving behavior and highlights the application of ATLAS.ti software in a chemical education research study. ATLAS.ti was first commercially released in 1993, and is in the eighth version for both Windows and Macintosh based computers.There are several approaches to qualitative data analysis and the use of software does not essentially dictate the research methodology involved (5–7). In a qualitative research study, the researcher is the analyst who drives and determines the direction of the study and uses software to help with managing and organizing data, thinking about the data and making sense of data to generate findings and come to a conclusion, or to plan further studies (8). The present study involving the use of ATLAS.ti is on problem solving behavior in reaction stoichiometry. In this chapter, the author is introducing a qualitative data analysis package by providing an example of how such a tool can be used to answer specific research questions by engaging inthr process of data analysis and interpretation.This should be viewed more as an example of using qualitative software to conduct analysis and not as a promotion for the specific software.

A Literature Review of Problem Solving in Chemistry A primary goal of science education is to develop a deep and robust understanding and appreciation of scientific concepts and principles among students. Problem-solving skills are essential for students for them to be successful in their chemistry courses. Student understanding of these scientific concepts and principles can be evaluated through their problem-solving approaches. Problem solving is what you do when you do not know what to do (9). A problem is defined as a gap between where a person is, and where he or she wants to be but does not know how to cross the gap (10). By a careful examination of the student approach to solving chemistry problems, it is possible to know how students learn chemistry, and the conceptual framework within which they operate (11). Students solve similar problems in multiple ways. While some students get hold of the concept and are capable of applying the knowledge to solve particular problems, others find the same concept alien when it is presented to them in a problem format. The conceptual framework of students’ determines which problem-solving strategies are to be used. Students are successful in defining concepts but have a hard time representing their conceptual understanding (12). Some studies have shown the extent to which students are successful in solving various types of chemistry problems. Research on problem solving has particularly focused on: 1. 2. 3. 4. 5.

Successful and unsuccessful problem solving (13, 14) Comparing experts and novices (15, 16) Conceptual versus algorithmic problem solving (17–19) Nature of problems posed – well defined versus ill defined problems (20, 21) Group problem solving versus individual problem solving (22) 134

Learning in chemistry has been described in terms of problem solving, conceptual understanding, and acquisition of science process skills (23–27). All chemists engage in problem solving behaviors irrespective of the sub-discipline or field of work. Individuals who excel in their chemistry courses have excellent problem-solving abilities. As made clear in the definition, a problem is not an exercise – exercises have known and relatively easy solutions, are routine in nature or can be referred to as familiar problems (28). Few models of problem solving are well known. These models reported by Polya (11) and Wheatley (9), describe problem solving as a series of steps undertaken by an individual to address a given problem. The steps often depend on the nature of the problem. For example, a well-structured mathematical problem involves understanding the problem statement, developing a strategy for solving the problem, carrying out the strategy, and reviewing/revising the strategy after problem-solving effort. Bodner (28, 29) describes problem solving strategy in chemistry as a sequence of steps that involve reading and re-reading the problem; writing down a tentative strategy for solving the problem; representing the problem using a picture, equation, or a formula to further understand the problem; applying the strategy, trying again if it fails; re-thinking and revising strategy and reviewing progress on solution; writing an answer obtained from the solution strategy; reviewing answer and seeing if it makes any sense with respect to the problem asked; if it does not make sense then starting all over again with a new strategy till the solution is reached. Understanding of problem may happen at any stage during the problemsolving process (30, 31). Problem solving involves seven distinguishable steps that include: 1.

2. 3. 4. 5. 6. 7.

Reading and comprehending the problem statement in ways that one can understand it through rephrasing, simply stating, and using symbols and representations to visualize it. Transforming the parts of problem statements into meaningful chunks Setting goals and sub goals for developing solutions Being selective about information from problem statement Retrieving rules and facts from memory that seem to relate to the problem Achieving goals and sub goals by explicitly or implicitly linking information, facts and formulas and the solution strategy Rechecking path of solution (strategy) and reviewing answer

Several problem-solving stratgeies have been reported (32–34). The steps in problem solving may be similar across diverse strategies employed for problem solving depending on the type or problem, context, and the problem domain. Study of problem solving strategies remains a challenging area and the various models reported by Polya (11) and Bodner (28) help us to look deeper at the these problemsolving strategies employed by individuals or a group of people. In a few studies the problem-solving strategies, the problem-solving success has been tied to the abilities of problem-solvers. Studies on the strategies used by experts (people trained in the content area) and novices (first-timers to the 135

discipline) involved using problems that were challenging enough to require more than the mere recall from the experts. Yet these problems offered enough in terms of simplicity for novices to arrive at a the solution (13–15). These studies on experts and novices used different methods to compare problem-solving strategies. One study used HyperCard method to compare balancing equations among high school honors students (experts) and students from a regular chemistry class (35). Another study used a paper and pencil test method to compare differences in student performance on various chemistry problems to classify students as experts and novices (16). Students who demonstrated better strategies and higher problem-solving performance had fewer procedural errors and demonstrated a better conceptual understanding than the novices in these studies. In all these studies on problem solving, experts and novices primarily differ in their strategies based on their prior experiences (15, 16). Experts often classify the problems according to the principles that govern the problem solution and novices tend to classify problems based on their surface features. The experts use knowledge development or forward chaining strategy to solve problems. In the forward chaining strategy, one begins with the information provided in the problem and works his/her way to perform all needed steps to arrive at the final goal (problem solution). Novices deploy means-end analysis strategy, which begins by identifying the problem goal and then finding differences between the goal and the information stated in the problem. The next step in means-end analysis strategy is to seek an equation or a formula that would help eliminate the gap between the problem information and the supposed solution (15, 16). The novice approach is akin to going through several different pieces of a puzzle to find a missing piece and then trying to fit in the very first piece that fits the shape and size of the incomplete puzzle. There is little consideration of whether the piece really belongs to the puzzle being solved. In the expert/novice studies, the experts are able to solve problems successfully worked using the forward-chaining strategy. It is possible that the expert views the problems presented as a routine exercise or has more familiarity with the problem. For novices the path to problem solution is not obvious and immediately recognizable which leads to the slower rate at which novices solve these problems, use incorrect formulas and equations, and take more pauses when asked to present their thoughts in words (during interviews). Unraveling problem solving strategies is however much more complex than the expert-novice paradigm presented by early researchers. Some researchers focused on successful and unsuccessful problem solvers in chemistry and identified several traits of successful problem-solvers (36, 37). According to these studies successful problem solvers tend to read the problem completely and understand the problem objectives clearly before engaging in any problem solving strategy. Once there is a understanding of problem, these problem solvers then write any equation in case of a reaction early on. They try to grasp the problem based on the underlying reasoning and probable solution. Another trait of successful problem solvers is that they do not invoke a formula or an equation until they are certain of being able to solve a problem in chemical terms. They often develop representations and use symbols and 136

formulas to represent the chemical species involved in the problem. These problem solvers perform all necessary steps in a specific order to solve the problem; they apply the information provided and infer any implied information (left out from problem statement yet important to consider). While performing all these steps the successful problem solvers frequently check their work to see any inconsistencies, make proper assumptions, and have an adequate understanding of chemical concepts, and principles involved in the problem. Most importantly successful problem solvers can recognize the patterns of problems, use trial and error less frequently, think effortlessly and fluently about their strategy, and are able to express their understanding of the problem and deploy more than a single strategy to arrive at the problem solution (38–40). In general successful problem solvers have advanced knowledge; they use declarative and procedural knowledge; construct appropriate representations; and apply general reasoning abilities that permit them to make logical connections among various problem elements. Successful problem solvers apply more than one verification strategy to ensure that their problem representations are consistent with the facts provided. Their solution is logically bound, their computations are error free, and the problem solved is the problem presented (41). Problem solving strategies among students are developed based on the types of problems that students experience during their education. In chemistry courses students are presented with the end of chapter textbook problems as practice problems. Similar problems are asked during exams and quizzes. These problems might be easy for experts due to their experience and familiarity with textbook problems, yet these may pose a reasonable challenge to students. Such problems invoke quantitative reasoning at the expense of qualitative explanations. Students use rule-based strategy to solve such problems, which rely on memorization of rules, and algorithms that can be practiced, until they can be applied directly to familiar problems (42). Some papers on problem-solving in chemistry have focused on the nature of problems being conceptual and algorithmic or mathematical (12, 17, 19, 26, 43–45). These studies reported that even high achievers struggle when presented with conceptual problems. Successful problem solvers are good at solving both conceptual and algorithmic (formula based) problems and can transfer their skills from one sub-discipline of chemistry to another with ease. Assessment of student knowledge of concept underlying a problem requires asking problems that are effective in eliciting student thinking, representations, and problem solving strategies. Multiple-choice exam problems do not help with assessing such understanding among students unless they are carefully designed. Students with erroneous understanding can select a correct answer choice (46, 47). Usual multiple choice problems rely on rote memorization and short term memory of definitions of concepts rather than in-depth understanding and long-term memory that builds on practice, reflection and critical assessment of ideas. Few studies have also reported that students can learn to solve chemistry problems without understanding the underlying concept. These students often rely on their memory for terms and definitions and apply concepts without any real comprehension (48–50). Such students perhaps go through several episodes during problem solving that involve reading, defining problem, setting-solution, 137

and solving the problem. Specifically, algorithmic problems take more time and involve a greater number of transitions from the episodes than the paired conceptual problems (51–54). As evident from a c comprehensive literature review on problem types and strategies, it is evident that despite numerous studies reported on student problem-solving strategies, there is a a need to understand the strategies used by the course instructors to solve chemistry problems, and how do their problem solving strategies compare to graduate students and first year college students. The chapter addresses the gap in this area by using case-study approach to compare the problem-solving strategies used by an instructor, graduate student, and a first year student. The study was conducted using ATLAS.ti software. The next section is covers the theoretical framework of constructivism and Adaptive Control of Thought-Rational (ACT-R) theory that inform this qualitative case-based study.

Theoretical Framework According to the constructivist theory, the process of knowledge acquisition by an individual begins with the input from the environment as detected by senses. An individual actively constructs knowledge from the data obtained by the senses and by the further interaction of this data with the existing knowledge (55, 56). Constructivism has a great relevance for teaching and learning. The process of knowledge construction by an individual is often limited by a zone of proximal development and involves facilitation of knowledge construction by a more knowledgeable peer or an instructor (expert). Specifically the knowledge constructed must fit the reality, thus leading to common knowledge across the group of people. Constructivism focuses on both building and testing of the knowledge that is viable and workable (57–59). Piaget has outlined four stages of intellectual development - sensory-motor, pre-operational, concrete operational, and formal operational. Students reach the formal operational stage by the age of twelve, and complete intellectual development occurs by the age of fifteen. Students at the concrete operational level have struggle thinking about various possibilities and find it difficult to understand the concepts and principles that depart from reality (abstract ideas such as atoms, electrons, and nucleus). The scientific ideas are counterintuitive and cannot be acquired by merely observing phenomena (60–62). The formal operational student has the capacity to think regarding the possibilities and can reason out efficiently what might happen, without any visible aid. The student at concrete level can solve problems that require formal thinking, provided that the student gets an opportunity to deal with the formal concept using some concrete experience that lead to real observations as a special case of the possible (24). According to Herron (24), formal thinkers have some expertise of the subject and display an advanced level of comprehension of chemical concepts as compared to the concrete operational thinkers. While the concrete operational thinkers can 138

think only in terms of real and possible, formal thinkers are a step ahead and can think in terms of possibilities or abstractions. The constructivist approach offers an invaluable insight of the learning process according to which, during the process of learning, the learner necessarily reconstructs any knowledge. Constructivist model plays a role of involving the learners in learning a pre-determined body of agreed knowledge (i.e. consensually agreed scientific theories than the personal theories about phenomena). The constructivist model is also helpful in explaining the misconceptions that students bring to chemistry and the resistance of these misconceptions to change toward meaningful learning (62, 63). Highly meaningful learning often relies on problem solving and creativity and is possible in the knowledge domains in which the learner has a considerable well-organized prior knowledge . Adaptive control of Though-Rational (ACT-R) theory: encompasses both declarative and procedural knowledge. It builds on student prior knowledge and states that both the declarative and procedural knowledge are acquired by individuals from their prior knowledge of facts and processes. ACT-R explains three types of learning that includes ability to generalize discriminate and strengthen the knowledge by application and transfer. In terms of generalizability –one’s understanding begins to expand and an individual gains a holistic perspective when he/she come across the same idea or principles several times. Discrimination of understanding arises from the specific application of knowledge in certain areas, and strengthening of knowledge occurs by practice, retrieval and application of knowledge when an individual applies his or her understanding for solving various problems in different contexts. All three – the ability to generalize, discriminate, and apply to strength knowledge leads one to make new connections, and think critically to discard the ones that are not relevant or needed during problem-solving (64–66). ACT-R theory explains the process of knowledge acquisition, organization and application. According to ACT-R theory problem solving takes places within a theoretical space or a mental representation that includes an initial state, intermediate state, and a final state that satisfies the goal. The state could imply some external conditions or internal coding of those external conditions. The process of problem-solving involves an operator, which can be understood as an action that transforms one state into another. The theory assumes that when a problem solver approaches a state for which there are no adequate problem-solving operators, the problem-solver searches for an example of a similar problem solving state and tries to address the problem using an analogy or an example. The initial state of problem-solving is referred to as the interpretative stage. In this stage, an individual recalls specific examples related to the problem at hand and an attempt to interpret these examples. This step invokes declarative memories without any involvement of long-term memory. Reviewing text examples for end of chapter problems would be an example of memory recall stage (67). The interpretative stage may involve considerable verbalization as the problem solver attempts to practice the key attributes of the examples used or from which analogous problem is determined. With procedural encoding of the skills, the verbalization becomes less distinct due to transition from the 139

interpretative stage to that of procedural encoding. This stage is referred to as knowledge compilation stage during which the problem solver transitions from the interpretative stage to the procedural stage. The procedural knowledge gets encoded in the memory as production rules that are production-action pairs (68, 69). Anderson provides an example of such production-action pair as “if the two triangles are congruent then try to prove that their corresponding parts are congruent”. The production rules are principally problem-solving operators in an abstract form that can apply across different situations. Using analogy for problem solving can help on in extracting the problem-solving operators and to establish the production rules (70, 71). The amount of practice determines the strength of encoding which leads to a quick access to declarative knowledge and the application of procedural knowledge. The application of a particular production rule is a direct measure of the strength of the rule. Several production rules can be applied for a give problem at a particular time (correctly or incorrectly) and the probability of each production is indicative of its strength. The ACT-R theory thus explains the variability observed in the problem-solving behavior, which is related to the strength of production encoding.

ATLAS.ti: An Introduction to Software ATLAS.ti is a company based in Berlin, Germany. The first version of the software was commercially released in 1993 as a Qualitative Data Analysis (QDA) Package. The software is currently in its 8th version and can be used on Macintosh computers (Desktops), Windows based computers, Android and Apple based devices as a lighter version of the software. ATLAS.ti is versatile QDA package that can be used for any discipline. It facilitates the process of data organization, data description, analysis and interpretation, and developing research reports or literature summaries. The software is ideal for multiple qualitative approaches and handles various file formats such as text, graphics and images, video data, and audio files. The intuitive interface provides proximity to data, participant view and the context of research. It finds several applications such as comparative analysis; collaboration across teams (cloud based) and integrating qualitative findings with quantitative data in case of mixed-methods research studies (5, 72, 73).

A Brief Description of the User Interface The opening screen of software has 13 dropdown buttons on the top panel (Figure 1). It gives information about software version in use, option to create a new ATLAS.ti project or to import an existing ATLAS.ti project from the computer (the left bottom of the screen). The right panel of screen includes any projects that exist or have been created. If there are no existing projects then this panel appears as a white screen (73). The drop down buttons include: 140

1.

ATLAS.ti (About ATLAS.ti, License information, updates, services, option to Hide ATLAS.ti and Quit the program when done using it). 2. Project: This is where a new project is created or uploaded. It includes options such as new, open, open recent, close, save, rename, renumber documents and quotations, info, import project, import iPad project import survey, export project, export project to XML, and print 3. Edit: is used to edit project and includes buttons to undo, redo, cut, copy, paste, paste and match style, delete, select all, deselect all, rename, format, find, spelling and grammar, substitutions, transformations, speech, start dictations, emojis and symbols. 4. Document: this is for all the documents that are for a project on ATLAS.ti. It includes options to import documents, show document manager, show document group manager, and output button to generate a list of documents and associated groups or list of document groups and their members 5. Quotation: A quotation is related to segments of text or phrases in the documents that are coded. It includes buttons such as New from selection, Add Coding, Quick Coding, Code in Vivo, Show link manager (links code and quotation), Show Quotation Manager, Output button that generates output or summary of the Commented quotations by documents, Quotations by Code, Quotations by Code with comments, Quotations by Code (Alternative view). 6. Code: This dropdown button includes several options for qualitative coding and output of codes for example new code(s), new smart code, auto coding, show link manager, show relation manager, show code manager, show code group manager, output for Codebook, List of Codes and Associated Groups, List of Codes by Documents, List of Codes Groups and their Members, Tag Cloud, Tag Cloud with Code Colors 7. Memo: is a brief description of document or data or a methodological note. The dropdown menu here includes options for creating a New Memo, Show Memo Manager, Show Memo Group Manager, and Output for All Memos Including Content, List of Memos and Associated Groups, Memos with Content and Linked Quotations 8. Network: This drop-down button displays the links. Users have the choice of creating a New Network, Show Network Manager, and to generate Output that includes List of Code-Code Links with Comments, List of Hyperlinks with Comments, List(s) of Memo Links 9. Analysis: The analysis dropdown menu is for performing a few quick analyses by using the Word Cruncher to generate a summary of the frequency of words, Code Concurrence Table, and Code Document Table. 10. Tools: the tools button provides qualitative researchers the ability to gain a quick overview of project through the Project Explorer button and to manage users by selecting options to Change User or displaying users via Show User Manager. 11. View: The View button contains various options that one would expect in a tools button. Here a researcher can Hide Toolbar, Hide 141

Drop-Down Lists on ATLAS.ti, Hide the Navigator, Hide Margin area, Hide (Document) Inspector, Sort (a document) by Number and Name and Enter Full Screen Mode. 12. Window: The Window button performs all functions related to a window or a tab for example: Minimize, Zoom, Duplicate Tab, Move Tab to a New Window, New Project Window, Bring all to Front etc. 13. Help: this button is for seeking help on ATLAS.ti with ATLAS.ti Help button and for sending feedback on the ATLAS.ti. On clicking it opens the email function and gives user an option to drag the specific document for which the user has technical issues. It generates a brief report as an email text that contains information about user computer (such as CPU model, old ATLAS.ti crash logs, free disk space etc.) as a file named General.atlinfo. Another file "Projects.atlinfo" contains information on the project database, including code names etc., but excluding memo, comment, or document content. Such user data is used for technical improvement of the software.

Figure 1. Overview of the user interface for ATLAS.ti.

Research Method: Data Collection and Analysis The study on problem-solving strategies involved comparative qualitative case study approach with a purposeful, stratified sampling of three case units out of 51 interviews that were conducted in a mid-western university (73). Comparative case studies were used to study the similarities, differences and patterns across two or more cases that share a common focus or goal in a way 142

that produces knowledge that is easier to generalize about fundamental questions (74). In this study the goal was to determine and compare the problem-solving strategies used by college-instructor, graduate student and an undergraduate student. The participants’ identities were coded for anonymity during the entire research process. The instructor in this study was coded as Tim, the graduate student Matt, and the undergraduate student as Zach. Overall nine-first year graduate teaching assistants, four college-professors, and 38 undergraduate students participated in this study. The samples case-units were identified based on the proximity of their solutions during in-depth think aloud semi-structured qualitative interviews to compare the strategies used by each representative particpant. These interviews involved general questions on the prior learning experiences of participants, and their views of science. During interview each participant was asked specific questions on stoichiometry based problems and prompted to gather information regarding the strategies being used by the participants (sample problem is shown below). Each participant was provided with a worksheet while engaging in problem solving during the interview. Each interview was about 45-60 minutes long. The interviews were transcribed verbatim. Stoichiometry problem asked: Octane (C8H18) is a component of gasoline. Complete combustion of octane yields H2O and CO2? Incomplete combustion produced H2O and CO, which not only reduces the efficiency of the engine using the fuel but is also toxic. In a certain test run, 1.000 gallon (gal) of octane is burned in an engine. The total mass of CO, CO2 and H2O produced is 11.53 kg. Calculate the efficiency of the process; that is, calculate the fraction of octane converted to CO2. The density of octane is 2.650 kg/ gal (75). In order to solve this problem one needs to: a) Consider the products of complete and incomplete combustion b) Write equations for two processes c) Compare mass of products from complete combustion to that of incomplete combustion to determine the fraction of octane converted to CO2 or the efficiency of process Data preparation for using software: A key requirement of ATLAS.ti software is data preparation. Data input to the software must be in the form a soft copy, placed in a single folder in a specific location on the computer, and saved as a unique filename. In this study the folder was named problem-solving strategies. After, saving all data files in a single folder in documents, the documents were uploaded in the ATLAS.ti software into a single hermeneutic unit (HU) as primary documents. (p-docs) A Hermeneutic Unit refers to the analysis project and it includes all the documents in various file formats that are a part of the research project. Both text and media files can be uploaded and qualitatively coded in ATLAS.ti. Text documents (.doc files) and scanned PDF worksheets were uploaded as primary documents. The primary documents are the various sources of data and information for a given project, in several formats. When the documents are uploaded the default software file name for the project ends in .hpr8 which has all the documents and the 143

work that is conducted on ATLAS.ti. The filename indicates the single separate project with a specific focus or research questions. For simplicity and ease of use the HU was saved in the same folder in which all the primary documents are located. ATLAS.ti provides possibility of organizing the documents into groups using the document group manager dropdown function under the documents button on the main menu. In this study, the files were grouped under the categories of the instructor, student, GTA. Qualitative Analysis: ATLAS.ti serves as a tool to engage in qualitative research to: a) highlight the accounts of the study participants or of phenomena based on the data collected b) analyze the content of the accounts by comparing, connecting and examining the context, and c) to produce an in-depth description of these accounts succinctly in terms of the findings based on the comprehensive examination of the data. Qualitative research is researcher driven. This means that the researcher plans the complete study, its purpose, and determines the methods of the study. The software is a tool for data organization and analysis but it does not perform analysis magically by any means (5). For each document a memo was prepared that contained a brief description of the document. For example, the memo of undergraduate student contained information about the date of interview, participant background, and duration of the interview.The process of coding the problem-solving study involved assigning codes or conceptual categories to the data. The first round of coding involved reading each file carefully to assign categories (also called labels or codes) to segments of the data also called as quotations. Qualitative coding is an iterative process, which means there are several cycles of coding that involve careful review, revision and merger of codes. Figure 2 displays qualitative coding in ATLAS.ti. The codes are on the right side of the window with vertical bars displaying the quotations from the data that were coded. These codes correspond to the interview data that contains the specific stoichiometry problems. Four kinds of coding actions can be performed in ATLAS.ti. These include the basic function of adding code, smart coding, auto coding, and quick coding. Auto coding can also be used searching for a phrase or keyword in all documents or a specific document; selecting codes from the pre-generated code-list in ATLAS.ti and the code to the quotations that return the matched phrases. Recently used codes can also be added to the chunks of data by clicking on “add last used codes” while coding. In-vivo coding is helpful when a specific statement or segment of data stands out and can form a conceptual category by itself. In this study open coding was done using the add-code function. Several codes were assigned during the first cycle of open coding (Figure 2). The initial coding process was open and descriptive. As the analysis progressed the codes were refined and merged to generate the specific strategy for each participant. In order to achieve this goal the code-group manager function of ATLAS.ti was used to categorize similar codes from various coded documents for developing a code-hierarchy. The final coding was more patten based and thematic in nature. Total 30 codes were left 144

after deleting duplicate codes, renaming and merging codes. The data was shared with an independent coder with the list of initial codes separately. The inter-coder agreement was found to be about 87% in the first coding cycle (25 agreements/ 30 codes). Overall analysis settled down to 24 codes that led to three major problem-solving strategies that were identified (discussed later).

Figure 2. Initial coding.

During coding analytic memos were developed for various codes and quotations using the memo function of the software. These memos are a researcher description of codes, quotations, and thought processes about the analysis. For example, a memo in this study involved journaling of the patterns that were noticed for a problem-solving strategy and briefly described the quotations or data components that seem to contribute to the observed pattern. In the final stage of analysis network views were generated for codes and memos using Network button (presented in problem-solving strategies). The network views involve links that are created by connecting codes and data segments. Strong links among codes helped determine the specific problem-solving strategy used by each participant by careful examination of the problem solving pathways. The codes were viewed in network view as nodes. The nodes in network link are connected based on the strength of their relation and correspondence with other codes, and related quotations in the data. These interactions indicated whether a code is a part of another code, is associated with a code, is a cause of, or contradicts a code. 145

Findings: Problem-Solving Strategies Generated by Network Views In ATLAS.ti the quotations, codes, and memos can be connected together as link objects. Nodes and links form the two basic components of network views. Nodes include all objects in a HU such as codes, quotations, memos, primary documents, families and also network views. Different icons can be used to visualize codes (audio versus video based quotations). A network view of codes represents the relations between different codes in a project in the code family. The neighbor in network view includes all objects that are directly linked to a node. One can use Command+N keys to view all neighbors of the nodes for a code. Network views also include co-occurring nodes. These co-occurring nodes include all objects that are assigned to the same or neighboring quotations in the project. The network views can be created from either the existing network ties in the project or generated as a completely new network view from the beginning by clicking on Networks drop-down menu and then clicking on New Network View. One can import the nodes one wants to add to this new network. By clicking on nodes, one can select import nodes function and select as many nodes one needs. The layout of these nodes can be adjusted by selecting the semantic layout and organize the network view. The objects can also be directly dragged and dropped in the network view window. One of the important aspects of creating network views is that whenever existing network ties are used or the new networks are created, making changes by renaming, linking or deleting the network nodes impacts an entire project. For example, while creating a network view of the researcher changed a code name, the change will occur for that particular code in the entire project. The network views can depict both weak and strong links (for example if a step is a part of, cause of sub-part of a concept or a problem-solving strategy). Strong links exist between codes and quotations and are created when a researcher links one code with another or hyperlinks data segments through codes. Strong links have independent properties and are definable. Weak links are adjacent segments or neighbors to codes in the data. These are not independent and are indirectly connected to strong links. The analysis of problem-solving strategies of participants led to identification of three unique strategies as demonstrated by the network views. Tim’s (instructor) approach to problem solving involved reading the problem statement and seeking problem goal. He made frequent use of labels to determine the process in the given problem. Tim proceeded to use symbolic representations for the problem once the nature of the process was known. He constructs a mathematical model based on the information provided in the problem and identifies the gap based on this information, and any other crucial piece of information that may be helpful for crossing the problem-gap.

146

So the way I would model this mathematically is….mass of H2O from …reaction A plus the mass of H2O from reaction B plus mass of CO from reaction B plus the mass of CO2 from reaction A and those all together, it will give us 11.53 kilogram (P1:8) Tim explicitly identified his overall problem-solving strategy as a guess and test strategy. So that’s one piece of the puzzle that those products have the sum to 11.53 kilograms…. Well I will assume that the octane may react, lets say all of the octane reacts. We will just make that assumption. I will do a guess and test strategy here that’s sort of zero in on it (P1-9). Tim splits the equation for the reaction to determine the amount of products generated for both complete and incomplete combustion of octane in order to arrive at the solution. He performs lengthy calculations for the amount of products that would be form for the two processes and hits a dead-end (see Figure 3). Tim then improvises his strategy from here. He looks at the data he generated by performing calculations and engages in logical reasoning to determine the solution. Tim continuously evaluated his solution by asking logical questions that helped him connect various numbers he had generated while attempting to solve the problem. For example, when he finally gets stuck then he questions that “if 100% octane reacts to give 11.93 kilograms of products then how much of the octane reacts to give 11.53 kilograms?” The quote from Tim as below shows his frequent questioning to evaluate his solution. If there is 2.650 Kg how much a 2 kg react completely with equation A and then 0.650 kg react incompletely with equation B. So label this A and B …and then if I want to, I can determine the individual masses of water, and CO2 and CO and see if that sums up to a 11.53 kg as total. That wouldn’t be an ineloquent solution but it is a solution that works where you could sort of work your way down to see if, what is the optimum amount. (P1-11) Matt (graduate student) starts with reading objective of problem and then searches for key information provided in the problem. Matt identifies the problem to be related to Stoichiometry. There is some uncertainty about the problem domain as evident from the statement made by Matt. “The three things [in problem statement] are probably going to be a key to finding out is how much octane goes in and how much carbon-dioxide is made. I see it as a bit of stoichiometry” (P2-4). Matt takes a logical approach problem-solving (Figure 4). He views this problem as being domain specific too, and argues that because it is a stoichiometrybased problem, he must be able to calculate the theoretical yield. He then develops a road map for problem solving that emphasizes the expected yield. Matt makes 147

a logical assumption, that in a perfect problem he should be able to determine the difference between the amount of products generated and the expected (theoretical yield) to calculate the actual mass of products for both complete and incomplete combustion of octane. I will calculate the expected yield is and assume that this is a perfect problem. I will take the difference from that and subtract the difference and the difference in the mass of what came out can be used to calculate the CO and CO should be lighter than CO2 so it should get a numbers bigger than 11.53 (P2-9)

Figure 3. Instructor’s approach to problem-solving. Matt reasons that since carbon dioxide is heavier in mass as compared to CO, the amount of carbon dioxide produced will relate to the efficiency. Matt then proceeds to set the reaction equation for complete combustion and performs calculations for the moles of products (water and carbon-dioxide). He then goes back to the problem statement and draws several comparisons for the masses obtained using the law of conservation. According to Matt’s logical approach, the actual yield is a fraction of the theoretical yield of the products and that should help him determine the final solution. So I am just checking to see if these numbers I am pulling out make logical sense because my reaction from the start is only being containing this much weight so that in the end it should contain the same weight and its gaining a lot of weight and I am not sure if this weight really exists on the 148

left hand side. This might work out after all. I don’t have too much time to think about this, means I don’t have to try crazy things, I can actually logically reason it out. (P2-21)

Figure 4. Graduate student’s approach to problem-solving. Zach (undergraduate student) mainly uses rule-based strategy for problem solving. He starts with the reaction equation and writes two separate equations for both the complete and incomplete combustion. Because it says calculate the efficiency of the process. So it is out of zero to 100, but I know that if it converts all of it to CO2 that is 100%, if it does not do a complete combustion and only makes CO, is that 0% efficient, or what is that? (P3-5) Zach focused on the surface features of the problem. He wondered what he needs to do with the numbers provided in the problem to find the efficiency of the reaction. His approach involved some deductive reasoning. For example, Zach thinks that he will get two numbers when solving for the amount of carbon dioxide – a bigger number and a smaller number (incomplete combustion) for maximum and minimum efficiency of the process. It would be the same as using the number of moles I’ll just substitute my number of moles of C8H18 that I found, and figure out how many moles of everything else that I would need…To get it to grams I have to multiply not divide. I can convert number of moles of each of those into grams and see how close it is to the original value and see about what an answer would be (P3-9). 149

Zach’s argument in solving this problem was that no matter what happens the amount of water produced does not change. During the entire problem-solving process Zach fixated on the amount of carbon dioxide and carbon monoxide and his reasoning stemmed from the reaction equations he wrote. Each time Zach lost his path, he came back thinking about the rules he had learned in his chemistry courses to solve stoichiometry problems (Figure 5). Just looking at my numbers, trying to remember any equations. I’m thinking I can take the mass of the reactants that were actually produced over the amount it would have produced under fully complete combustion and that would give me an efficiency percentage. And that will tell me what percentage of CO2 to CO Yeah. So, 11.53 kg over 11.94 kg is 96% efficiency

Figure 5. Undergraduate student’s approach to problem solving.

Discussion and Conclusions The study provides an insight into the problem-solving strategy used by three participants who are at completely different levels of their experience and exposure to the domain principles and problems in chemistry. A summary of the problemsolving strategy of three case units in this study is provided in Table 1. 150

Table 1. Comparison of Problem-Solving Strategy of Instructor, Graduate Student, and Undergraduate Student Instructor

GTA

Undergraduate Student

Schema

Well developed and strongly connected and helps in meaningful interpretation

Developed but not connected

Loosely connected (formula of ethanol)

Domain Principals

Frequently used during problem solving

Somewhat–needed probing

Minimally used

Strategies

Flexible and adaptive Analogical guess, test and revise

Logical Reasoning Strategy for connecting gaps

Rule based and partly deductive strategy – from what is to what should be.

In this study it was found that the instructor, Tim had a well-developed and strongly connected schema of underlying stoichiometry concepts and principles. This helped Tim make meaningful interpretation of the problem statement. He frequently referred to the domain principles during the problem-solving process. Tim was flexible and adapted his strategy on the go. His guess and test, and revise strategy involved frequent use of analogies such as puzzles, bridge and models for problem-solving. In case of Matt, his schema of concepts and knowledge of the domain seemed developed but not as strongly connected as Tim’s, though he had a better understanding as compared to Zach (undergraduate student). His domain needed some probing during the interview process (tell me what are you thinking here, you seem stuck on this can you explain what you are applying?). One would expect graduate student to have a far better understanding of domain principles, but Matt did not display that. Matt used the logical thinking strategy for connecting the gaps in his problem-solving process. He frequently used the word “logic” and “logical” while thinking aloud. Zach uses a rule-based strategy to solving stoichiometry problems. It is common for novices to use rule-based strategies. Zach frequently mentioned rules and principles he had used in solving stoichiometry problems. He relied on his memory to recall facts, formulas, and rules during entire problem-solving process. Zach’s mental schema of stoichiometry seemed to be rather weakly connected. He uses the incorrect formula for ethanol in one of the stoichiometry problems, yet his rules and approaches to the solution were correctly applied. Perhaps, due to the weak schema of various components needed to solve the stoichiometry problems, Zach often doubted his solutions and his final answer. He minimally used domain principles, and when used these were not correctly applied. The rule-based approach was used to set-up equations and to calculate the moles of products from the balanced chemical equations in-case of Zach.

151

The three strategies identified in this study include guess, test, and revise strategy, logical reasoning strategy, and rule-based strategy for a specific stoichiometry based problem. All three particpants had similar solutions, yet they took a different approach to solving problem. This can be attributed to the development of sound schemas and understanding of domain principles in case of instructor as compared to that of and undergraduate student, who do not display their ability to generalize and discriminate the problem and their prior knowledge. The understanding of undergraduate student seems very limited to rules and he lacks the holistic perspective of having experienced similar problems multiple times as compared to the instructor.

Limitations, Implications, and Further Research The study highlights the problem solving strategies used by diverse participants who have a completely different level of experience and exposure to chemistry, yet all three were successful in solving the chemistry problem presented in this study. The chapter includes several key functions of the ATLAS.ti software that were employed for this study from data organization to data analysis and interpretation. The study is limited to three case units who were successful in solving a specific set of stoichiometry problems. It is also limited in terms of the use of the various features of the ATLAS.ti program. Despite its limitations, the study demonstrates the importance of the role of software in qualitative studies, and that the problem solving in chemistry and perhaps other areas rely on conceptual knowledge backed by the use of rules, analogies and common sense. The software helped with identifying the trail of problem-solving strategies and keeping track of the entire qualitative research process (codes, quotations, and memos). The software also facilitated the triangulation of interview transcripts and the worksheet data and generating inter-rater reliability by sharing files through group project feature of ATLAS.ti. Throughout the analysis process there was a closeness of the researcher to the data and the participant view of the problem solving strategy. The context of the study was always present because of easy access to research questions, the first impression of the researcher, and the data side-by-side within the software. Though the study focused on three participants who were successful in solving a stoichiometry based problem, it leads into further research on the successful problem-solving strategies employed by a diverse group in the sub-disciplines of organic and inorganic chemistry. The author is studying problem-solving behavior in this area in organic and inorganic chemistry by using computer simulations developed by her research group (76).

152

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.

24.

25. 26. 27. 28.

Davidson, J.; Paulus, T.; Jackson, K. Qual. Inquiry 2016, 7, 606–610. Woods, M.; Palulus, T.; Atins, D. P.; Macklin, R. Soc. Sci. Comput. Rev. 2016, 34 (5), 597–617. MacMillan, K.; Koening, T. Soc. Sci. Comput. Rev. 2004, 22, 179–186. Dohan, D.; Sanchez-Jankowski, M. Ann. Rev. Sociol. 2016, 24, 477–498. Friese, S. Qualitative Data Analysis with ATLAS.tiTM; Sage: London, 2014. Garcia-Horta, J. B.; Guerra-Ramos, M. T. Int. J. Res. Meth. Educ. 2009, 32, 151–165. Bazley, P.; Jackson, K. Qualitative Data Analysis with NVivo, 2nd ed.; Sage: London, 2013. Bazley, P. Qualitative Data Anlalysis: Practical Strategies; Sage: London, 2013. Wheatley, G. H. MEPS Technical Report 84.01 School Mathematics and Science Center; Purdue University: West Lafayette, IN, 1984. Hays, J. The Complete Problem-Solver; The Franklin Institute: Philadelphia, 1980. Polya, G. How to Solve it: A New Aspect of Mathematical Method; Princeton University Press: Princeton, NJ, 1985. Pickering, M. J. Chem. Educ. 1990, 67, 254. Camacho, M.; Good, R. J. Res. Sci. Teach. 1989, 26, 251–272. Carter, C. S.; LaRussa, M. A.; Bodner, G. M. J. Res. Sci. Teach. 1987, 24, 645–657. Chi, M . T. H.; Feltovich, P. J.; Glaser, R. Cognit. Sci. 1981, 5, 121–152. Heyworth, R. M. Int. J. Sci. Educ. 1990, 21, 195–211. Sawrey, B. A. J. Chem. Educ. 1990, 67, 253–254. BouJaoude, S.; Barakat, H. E. J. Sci. Educ., 7, 1–42. Nurrenbern, S. C.; Pickering, M. J. Chem. Educ. 2003, 64, 508. Bunce, D. M.; Gabel, D. L.; Samuael, J. V. J. Res. Sci. Teach. 1991, 28, 505–521. Fasching, J. L.; Erickson, B. L. J. Chem. Educ. 1985, 62, 842–848. Gupta, T.; Burke, K. A.; Mehta, A.; Greenbowe, T. J. J. Chem. Educ. 2015, 1, 32–38. Gabel, D. S. In What Research Says to the science teacher? Gabel, D., Ed.; National Science Teachers Association: Washington, DC, 1989; Vol. 5, pp 5−11. Gabel, D. S.; Bunce, D. M. In Handbook of Research on Science Teaching and Learning; Gabel, D., Ed.; NSTA, Macmillan Publishing Company: New York, 1994; pp 301−326. Herron, J. D. J. Chem. Educ. 1975, 52, 146. Herron, J. D. The chemistry classroom: Formulas for successful teaching; American Chemical Society: Washington DC, 1996. Yarroch, W. L. J. Res. Sci. Teach. 1985, 22, 449–459. Bodner, G. M. In Chemistry Education: Best practices, opportunities & Trends; Martinez, J. G., Torregroa, E. S., Eds.; Wiley-VCH, 2015; Chapter 8, pp 181−200. 153

29. Bodner, G. M. In Toward a Unified Theory of Problem Solving: Views from the content domain; Smith, M. U., Ed.; Lawrence Erlbuam Associates: Hillsdale, NJ, 1987. 30. Lee, K. L.; Goh, N. K.; Chia, L. S.; Chin, S. Sci. Educ. 1996, 6, 691–710. 31. Domin, D.; Bodner, G. M. J. Chem. Educ. 2012, 89, 837–843. 32. Asieba, F. O.; Egbugara, O. U. J. Chem. Educ. 1993, 70, 38–39. 33. Bodner, G. M.; Domin, D. S. Univ. Chem. Educ. Proc. 2000, 4, 24–30. 34. Anderson, J.; Boyle, C.; Farrell, R.; Reiser, B. In Modeling Cognition; Morris, P., Ed.; John Wiley: New York, 1987. 35. Kumar, D. D. J. Sci. Educ. Tech. 1993, 2, 481–485. 36. Smith, M. U.; Good, R. J. Res. Sci. Teach. 1984, 9, 895–912. 37. Smith, M. U., Toward a unified theory of problem solving: A view from content domains, Lawrence Erlbaum: New Jersey, 1991. 38. Gabel, D. L.; Sherwood, R. D.; Enochs, L. J. Res. Sci. Teach. 1984, 21, 221–233. 39. Gabel, D. L.; Samuel, K. V. J. Res. Sci. Teach 1986, 23, 165–176. 40. Frank, D. V.; Baker, C. A.; Herron, J. D. J. Chem. Educ. 1987, 64, 514–515. 41. de Astudillo, L. R.; Niaz, M. J. Sci. Educ. Technol. 1986, 5, 131–140. 42. Herron, J. D.; Bodner, G. M. In Chemical Education: Towards Research Based Practice; Gilbert, J. K., Jong, O. D., Justi, R., Treagust, D. F., Van Driel, J. H., Eds.; Kluwer Academic Publishers: The Netherlands, 2002; pp 235−266. 43. Herron, J. D.; Greenbowe, T. J. J. Chem. Educ. 1986, 63, 526–531. 44. Ehrlich, E.; Flexner, S. B., Carruth, G.; Hawkins, J. M. Oxford American Dictionary; Oxford University Press: Oxford, 1980. 45. Nakhleh, M. B.; Mitchell, R. C. J. Chem. Educ. 1993, 70, 190. 46. Bunce, D. M. In Chemists Guide to Effective Teaching; Pienta, N. J., Cooper, M. M., Greenbowe, T. J., Eds.; Prentice Hall: Upper Saddle River, NJ, 2005; pp 12−27. 47. Schmidt, H. J. Int. J. Sci. Educ. 1990, 12, 457–471. 48. Schmidt, H. J.; Beine, M. Educ. Chem. 1992, 28, 19–21. 49. Ashmore, A. D.; Frazer, M. J.; Casey, R. J. J. Chem. Educ. 1979, 56, 377–379. 50. Bowen, C. W. J. Res. Sci. Teach. 1990, 27, 351–370. 51. Bunce, D. M.; Heikkein, H. J. Res. Sci. Teach 1986, 23, 11–20. 52. Gupta, T. Guided-inquiry based laboratory instruction: investigation of critical thinking skills, problem solving skills, and implementing student roles in chemistry, Graduate Theses and Dissertations. Paper 12336, 2012. 53. Friedl, A. W.; Gabel, D. L.; Samuel, J. Sch. Sci. Math. 1990, 90, 674–682. 54. Gabel, D. L.; Sherwood, R. D. J. Res. Sci. Teach. 1983, 20, 163–177. 55. Huffman, D. J. Res. Sci. Teach. 1997, 34, 551–570. 56. Bodner, G. M. J. Chem. Educ. 1986, 63, 873–878. 57. Britton, B. In The Psychology of learning science; Glynn, S. M., Yeany, R. H., Britton, B. K., Eds.; Lawrence Erlbaum Associates: Hillsdale, NJ, 1991; pp 3−19.

154

58. Cracolice, M. S. In Chemists Guide to Effective Teaching; Pienta, N. J., Cooper, M. M., Greenbowe, T. J., Eds.; Pearson Prentice Hall: NJ, 2007; pp 12−27. 59. Glynn, S.; Yeany, R.; Britton, B. In The Psychology of learning science; Glynn, S. M., Yeany, R. H., Britton, B. K., Eds.; Lawrence Erlbaum Associates: Hillsdale, NJ, 1991; pp 43−63. 60. Vygotsky, L. S. J. Gen. Psych. 1929, 36, 415–434. 61. Vygotsky, L. S. Mind in society: The development of the higher psychological processes; Harvard University Press: Cambridge MA, 1978. 62. Atwater, M. M.; Alick, B. J. Res. Sci. Teach. 1990, 27, 157–172. 63. Bodner, G. M.; McMillan, T. L. B. J. Res. Sci. Teach. 1986, 23, 727–737. 64. Osborne, R. J.; Cosgrave, M. M. J. Res. Sci. Teach. 1983, 20, 825–838. 65. Anderson, J. R. The architecture of cognition; Cambridge, MA: Harvard University Press, 1983. 66. Anderson, J. R. Amer. Psych. 1996, 4, 355–365. 67. Anderson, J. R. Cognitive psychology and its implications, 6th ed.; Worth Publishers: New York, 2005. 68. Kearsley, G.; Seidel, R.; Park, D. K. Theory Into Practice, A hypertext Database for Learning and Instruction; US Army Research Institute, 1993. 69. Anderson, J. R.; Bothell, D.; Byrne, M. D.; Douglass, S.; Lebiere, C.; Qin, Y. Psychol. Rev. 2004, 4, 1036–1060. 70. Anderson, J. R.; Lebiere, C. The atomic components of thought; Lawrence Erlbaum Associates: Mahwah, NJ, 1998. 71. Yates, K. A. Towards a taxonomy of cognitive task analysis methods: A search for cognition and task analysis interactions; Unpublished Doctoral Dissertation, University of Southern California, Los Angeles, 2007. 72. http://atlasti.com/ retrieved, May, 2017. 73. Patton, M. Q.; Qualitative Research & Evaluation Methods, 3rd ed.; Sage: Thousand Oaks, CA, 2002. 74. Yin, R. K.; Case Study Research: Design and Methods, 4th ed.; Sage: Los Angeles, 2009. 75. Burdge, J.; Chemistry, 2nd ed.; McGraw Hill: New York, 2009; Chapter 3, p 114. 76. Gupta, T.; Ziolkowski, Z. P.; Albing, G.; Mehta, A. In Optimizing STEM Education With Advanced ICTs and Simulations; Levin, I., Tsybulsky, D., Eds.; IGI Global, 2017; Ch. 8, pp 186−218.

155

Editor’s Biography Tanya Gupta Dr. Tanya Gupta is an Assistant Professor in the department of Chemistry & Biochemistry at South Dakota State University (SDSU). Her research interests focus on development and integration of simulations and games; guided-inquiry based curriculum for interdisciplinary science instruction, and Eye-tracking methodology to study student problem solving behavior. She did her postdoc at Grand Valley State University on Target-Inquiry Program. Dr. Gupta also serves as a member of the American Chemical Society’s Committee on Computers in Chemical Education and NSTAs College Science Teaching Committee. Her teaching experience includes teaching cheminformatics, general and inorganic chemistry courses for undergraduate, and graduate students in STEM and non-STEM majors, pre-service, and in-service teachers.

© 2017 American Chemical Society

Indexes

Author Index Chase, A., 117 Elluri, S., 39 Graulich, N., 21 Gupta, T., ix, 1, 133 Harshman, J., 65 Hedtrich, S., 21 Hyslop, R., 49 Kalyvaki, M., 9

Komperda, R., 91 Leontyev, A., 49, 117 Mehta, A., 9 Nielsen, S., 65 Pulos, S., 49, 117 Varma-Nelson, P., 117 Yezierski, E., 65

161

Subject Index A Analytics, open source tools Apache tools, sample case study, 44 open sources software tools, 45t data, type, 40 data analysis processes, overview, 41 data analysis, summary of the general process, 42f open source tools, 44 right software for analysis, 43 ATLAS.ti, study of problem solving strategies ATLAS.ti, 140 user interface for ATLAS.ti, overview, 142f data collection and analysis, research method, 142 initial coding, 145f qualitative analysis, 144 using software, data preparation, 143 discussion and conclusions, 150 problem-solving strategy, comparison, 151t limitations, implications, and further research, 152 network views, problem-solving strategies generated, 146 problem-solving, graduate student’s approach, 149f problem-solving, Instructor’s approach, 148f problem-solving, undergraduate student’s approach, 150f problem solving in chemistry, literature review, 134 theoretical framework, 138

C CER, putting the R data and R code presented, 68 functions and programmatic loops, 73 getting the best solution using loops, 81 figure 5A and 5B, reproduction, 84f squares values, histogram of the between group sums, 83f getting what we want with functions, 77 Chem vs Math subscales, 79f code additional visualizations, 80

math and chemistry, 75 issues, k-means clustering algorithm, 76 mach data set, plot, 75f notebooks, transforming documentation, 84 interactive portion, Analysis tab, 85 reproducible and dependable analyses, importance, 86 research notebook produced entirely in R, 87f programming in R, advantages and disadvantages, 67 real life examples, 88 transforming data visualization, 69 Anscombe’s quartet, 69f data visualization, 71 2016 JACS article titles, chord diagram, 72f pre/post outcomes, dynamite plot, 70f what is R?, 67 Comprehensive meta-analysis package CMA, analysis of data, 123 CMA, ANALYSIS TAB, 124f imputed studies, funnel plot, 128f meta-analysis, forest plot, 126f meta-analysis, numerical outcomes, 125f moderator analysis, dialog box, 129f moderator variables, effect sizes, 129f publication bias analysis, funnel plot, 127f sensitivity analysis, 124f STUDIES TAB after all studies, 123f conclusion, 129 Hattie’s work, selected effect sizes, 130t entering data into CMA, 120 CMA, dialog box for entering a moderator, 121f CMA, one of the templates for entering the data, 122f CMA at start, 120f introduction, 117 comprehensive meta-analysis, logic, 119f included studies, discipline, and reported outcomes, 119t meta-analysis, other software solutions, 131 meta-analysis, possible software solutions, 131t

163

describe() function, descriptive statistics produced, 102f factor variables, cross tabulation, 102f internal consistency, computing evidence, 104 negative correlation to the scale total, output, 105f introduction, 92 Journal of Chemical Education, data analysis software, 92t psych and lavaan, factor analysis, 106 Bartlett’s and Kaiser-Meyer-Olkin tests, output, 107f eigenvalue results, scree plot, 109f item loadings in factor space, plot, 108f lavaan, fit statistics, 111f lavaan, modification indices provided, 113f lavaan, parameter estimates provided, 112f minipisa items, simplified one-factor CFA model, 110f PCA results, summary, 108f RStudio, installing packages, 93 install.packages, help documentation, 95f likert package, console output, 94f likert package, packages pane, 95f RStudio, installing a package, 94f RStudio pane layout, 94f visualizing response patterns with Likert, 95 default centered bar plot generated with the likert package, 99f environment pane showing pisaitems dataset, 96f filled bar plot generated, 99f heat plot generated with the likert package, 101f Import Dataset menu options, 96f likert package with response distributions, 100f pisaitems dataset, console output, 98f pisaitems dataset, spreadsheet view, 97f pisaitems information, environment pane, 97f user-specified color scheme, 100f

Computer-aided data analysis, introduction book, focus, 3 book, organization, 4

E Electronic learning, crossing boundaries, 21 blended-learning, 22f blended-learning, different views of assessments, 23f conclusion and outlook, 37 Learning Management System Analysis Kit, 26 analysis tool, LMSA Kit, 29 automatic generated criteria-based feedback, 33 default feedback template, 35f estimation process, results, 32f feedback, overall accordance, 36t feedback evaluation, results, 35t LMSA Kit, usage, 28f online chess and student’s abilities, 30 tasks describing one competency, combination, 27f text template editing tool, 34f try to make connections, 23 data mining, ways, 24

L Learning management system beginner researcher, struggles, 10 chemical education research, 11 discussion, 14 D2L survey tool, 15f LMS, quiz function, 16 exploring tools, 12 discussion board, knowledge creation, 13f D2L discussion board, 13f D2L Dropbox folder system, 14f

R R and RStudio, Likert-type survey data analysis, 91 computing descriptive statistics and reliabilities with psych, 102 alpha() function, output, 105f describeBy() function, result, 103f

T Test data in jMetrik, analysis introduction, 49

164

stereochemistry concept inventory, item 18, 50f jMetrik, analysis of data, 51 advanced item scoring dialog box, 55f create New Database popup window, 52f import Data dialog box, 53f import Data file directory dialog box, 53f Item Analysis dialog box, 58f items q1 and q2, item analysis, 60f jMetrik, starting window, 52f

165

Nonparametric Characteristic Curves dialog box, 61f nonparametric IRC, 62f scale level analysis and reliability estimates, output, 59f Stereochemistry Concept Inventory, basic item scoring of responses, 54f Test Scaling dialog box, 57f variables tab contains information, 56f jMetrik overview, 50 jMetrik, data analysis, 51f