Eye Tracking for the Chemistry Education Researcher 9780841233409, 0841233403, 9780841233423

The idea for an ACS symposium series for eye tracking started to form in the summer of 2015. Each of the editors had exp

652 98 54MB

English Pages 213 pI [224] Year 2018

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Eye Tracking for the Chemistry Education Researcher
 9780841233409, 0841233403, 9780841233423

Table of contents :
Content: Preface1. Eye Tracking as a Research Tool: An Introduction2. Eye Tracking in Chemistry Education Research: Study Logistics3. What They See Impacts the Data You Get: Selection and Design of Visual Stimuli4. Using Fixations To Measure Attention5. Sequence Analysis: Use of Scanpath Patterns for Analysis of Students' Problem-Solving Strategies6. Advanced Methods for Processing and Analyzing Eye-Tracking Data Using R7. Using Multiple Psychophysiological Techniques To Triangulate the Results of Eye-Tracking Data8. Beyond Gaze Data: Pupillometry as an Additional Data Source in Eye Tracking9. Coupling Eye Tracking with Verbal Articulation in the Evaluation of Assessment Materials Containing Visual Representations10. Studying the Language of Organic Chemistry: Visual Processing and Practical Considerations for Eye-Tracking Research in Structural NotationEditors' BiographiesAuthor IndexSubject Index

Citation preview

Eye Tracking for the Chemistry Education Researcher

ACS SYMPOSIUM SERIES 1292

Eye Tracking for the Chemistry Education Researcher Jessica R. VandenPlas, Editor Department of Chemistry Grand Valley State University Allendale, Michigan

Sarah J. R. Hansen, Editor Department of Chemistry Columbia University New York, New York

Steven Cullipher, Editor Science and Mathematics Department Massachusetts Maritime Academy Buzzards Bay, Massachusetts

Sponsored by the ACS Division of Chemical Education

American Chemical Society, Washington, DC Distributed in print by Oxford University Press

Library of Congress Cataloging-in-Publication Data Names: VandenPlas, Jessica R., editor. | Hansen, Sarah J. R., editor. | Cullipher, Steven, editor. | American Chemical Society. Division of Chemical Education. Title: Eye tracking for the chemistry education researcher / Jessica R. VandenPlas, editor (Department of Chemistry, Grand Valley State University, Allendale, Michigan), Sarah J.R. Hansen, editor (Department of Chemistry, Columbia University, New York, New York), Steven Cullipher, editor (Science and Mathematics Department, Massachusetts Maritime Academy, Buzzards Bay, Massachusetts) ; sponsored by the ACS Division of Chemical Education. Description: Washington, DC : American Chemical Society, [2018] | Series: ACS symposium series ; 1292 | Includes bibliographical references and index. Identifiers: LCCN 2018038283 (print) | LCCN 2018041332 (ebook) | ISBN 9780841233409 (ebook) | ISBN 9780841233423 (alk. paper) Subjects: LCSH: Chemistry--Study and teaching--Research. | Eye tracking. | Tracking (Engineering) Classification: LCC QD40 (ebook) | LCC QD40 .E94 2018 (print) | DDC 540.71--dc23 LC record available at https://lccn.loc.gov/2018038283

The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences—Permanence of Paper for Printed Library Materials, ANSI Z39.48n1984. Copyright © 2018 American Chemical Society Distributed in print by Oxford University Press All Rights Reserved. Reprographic copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Act is allowed for internal use only, provided that a per-chapter fee of $40.25 plus $0.75 per page is paid to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. Republication or reproduction for sale of pages in this book is permitted only under license from ACS. Direct these and other permission requests to ACS Copyright Office, Publications Division, 1155 16th Street, N.W., Washington, DC 20036. The citation of trade names and/or names of manufacturers in this publication is not to be construed as an endorsement or as approval by ACS of the commercial products or services referenced herein; nor should the mere reference herein to any drawing, specification, chemical process, or other data be regarded as a license or as a conveyance of any right or permission to the holder, reader, or any other person or corporation, to manufacture, reproduce, use, or sell any patented invention or copyrighted work that may in any way be related thereto. Registered names, trademarks, etc., used in this publication, even without specific indication thereof, are not to be considered unprotected by law. PRINTED IN THE UNITED STATES OF AMERICA

Foreword The ACS Symposium Series was first published in 1974 to provide a mechanism for publishing symposia quickly in book form. The purpose of the series is to publish timely, comprehensive books developed from the ACS sponsored symposia based on current scientific research. Occasionally, books are developed from symposia sponsored by other organizations when the topic is of keen interest to the chemistry audience. Before agreeing to publish a book, the proposed table of contents is reviewed for appropriate and comprehensive coverage and for interest to the audience. Some papers may be excluded to better focus the book; others may be added to provide comprehensiveness. When appropriate, overview or introductory chapters are added. Drafts of chapters are peer-reviewed prior to final acceptance or rejection, and manuscripts are prepared in camera-ready format. As a rule, only original research papers and original review papers are included in the volumes. Verbatim reproductions of previous published papers are not accepted.

ACS Books Department

Contents Preface .............................................................................................................................. ix 1.

Eye Tracking as a Research Tool: An Introduction .............................................. 1 Steven Cullipher, Sarah J. R. Hansen, and Jessica R. VandenPlas

2.

Eye Tracking in Chemistry Education Research: Study Logistics ................... 11 Sarah J. R. Hansen and Jessica R. VandenPlas

3.

What They See Impacts the Data You Get: Selection and Design of Visual Stimuli ..................................................................................................................... 25 Katherine L. Havanki and Sarah J. R. Hansen

4.

Using Fixations To Measure Attention ................................................................. 53 Steven Cullipher and Jessica R. VandenPlas

5.

Sequence Analysis: Use of Scanpath Patterns for Analysis of Students’ Problem-Solving Strategies ................................................................................... 73 Elizabeth L. Day, Hui Tang, Lisa K. Kendhammer, and Norbert J. Pienta

6.

Advanced Methods for Processing and Analyzing Eye-Tracking Data Using R ............................................................................................................................... 99 Hui Tang and Norbert J. Pienta

7.

Using Multiple Psychophysiological Techniques To Triangulate the Results of Eye-Tracking Data ........................................................................................... 119 Kimberly Linenberger Cortes, Kimberly Kammerdiener, and Adriane Randolph

8.

Beyond Gaze Data: Pupillometry as an Additional Data Source in Eye Tracking ................................................................................................................ 145 Jessica M. Karch

9.

Coupling Eye Tracking with Verbal Articulation in the Evaluation of Assessment Materials Containing Visual Representations .............................. 165 Jessica J. Reed, David G. Schreurs, Jeffrey R. Raker, and Kristen L. Murphy

10. Studying the Language of Organic Chemistry: Visual Processing and Practical Considerations for Eye-Tracking Research in Structural Notation ................................................................................................................. 183 Katherine L. Havanki Editors’ Biographies .................................................................................................... 205

vii

Indexes Author Index ................................................................................................................ 209 Subject Index ................................................................................................................ 211

viii

Preface The idea for an ACS symposium series for eye tracking started to form in the summer of 2015. Each of the editors had experience with eye tracking from our dissertation research and upon further discussion, discovered we all had the same questions when we started out: What type of eye tracker should we use? What types of research questions can eye tracking help us answer? What should we consider concerning experimental design? How could we analyze the results in a meaningful way? With no clear help from the existing chemistry education research (CER) literature, we had to expand our reading to the fields of psychology, consumer analysis, and product design to find suitable answers. When we spoke with chemistry education researchers who were interested in eye tracking, we discovered these same questions were prevalent among them as well; and often, they did not even know where to start. At that point, we decided that an eye-tracking guidebook aimed at chemistry education researchers could be a valuable resource. This book is not intended to be a review of eye tracking in CER literature; rather, it is meant to be a tool to help answer the same questions we had when we started out in the field. We also hope that those with some experience in eye tracking will find among these pages ideas they had not considered applying to their research, but that will be valuable to them moving forward.

How This Book Is Organized This book is split into three sections. In the first section (Chapters 1, 2, and 3), the reader will find information on how to get started with eye tracking, including how to set up an eye-tracking lab, how to design an appropriate stimulus, and how to develop your first eye-tracking study. The second section of this book (Chapters 4, 5, 6, 7, and 8) details a variety of data collection and analysis methods, including how to analyze fixations, how to apply sequence analysis to eye-tracking data, appropriate statistics for data analysis, pupillometry, and the use of additional biometric data to support eye-tracking results. Each chapter in this section begins with a general discussion of a method, and then uses a specific example from CER to demonstrate how this method might be applied to relevant research questions. Finally, the third section of the book (Chapters 9 and 10) details specific applications of eye tracking to chemistry education research, and demonstrates how to pull together all of the information included in this book to perform robust and beneficial studies.

ix

Acknowledgments In addition to all of the authors involved in this project, who worked diligently with us to produce a resource we hope will be beneficial to the CER community, we must also thank the many reviewers who devoted their time to ensuring that the work presented here is of the highest quality. Diane Bunce’s encouragement was instrumental in making this book a reality. Finally, we thank Zac Stelling, Amada Tindall Koenig, and Arlene Furman of ACS books for their extreme patience and kind support as we undertook this project.

Jessica R. VandenPlas Department of Chemistry Grand Valley State University 234 Padnos Hall Allendale, Michigan 49401, United States

Sarah J. R. Hansen Department of Chemistry Columbia University 3000 Broadway New York, New York 10026, United States

Steven Cullipher Science and Mathematics Department Massachusetts Maritime Academy 101 Academy Dr. Buzzards Bay, Massachusetts 02532, United States

x

Chapter 1

Eye Tracking as a Research Tool: An Introduction Steven Cullipher,*,1 Sarah J. R. Hansen,*,2 and Jessica R. VandenPlas*,3 1Science

and Mathematics Department, Massachusetts Maritime Academy, Buzzards Bay, Massachusetts 02532, United States 2Department of Chemistry, Columbia University, New York, New York 10027, United States 3Department of Chemistry, Grand Valley State University, Allendale, Michigan 49401, United States *E-mails: [email protected] (S.C.); [email protected] (S.J.R.H.); [email protected] (J.R.VP.).

Eye tracking can be a robust and rich source of data for chemistry education research but is not appropriate for all research questions. There are many variables to consider when deciding to conduct an eye-tracking study, some of which may not be obvious to the novice user. By the end of this chapter, the reader should be able to: (1) decide whether or not the research question can be answered with eye tracking; (2) design an appropriate participant task; and (3) determine which quantitative measures are appropriate to collect and analyze.

© 2018 American Chemical Society

Introduction In recent years, eye tracking has become increasingly popular in the chemistry education research (CER) community. Many researchers are looking to add this technique to their toolbox, but because of its complexity and distinct differences from other CER tools, they have hesitated to make the leap. Outside the field of CER, eye tracking has been employed in many areas, including usability studies, reading research, and visual search tasks (1–12). While these types of studies provide important information on the range and usefulness of eye tracking, the research questions that CER investigates are often unique and distinct. Before venturing deeper into the world of eye tracking, and the remainder of this book, it is important to understand the benefits, capabilities, and limitations of eye-tracking applications. Eye-Tracking Technology: How It Works Most modern eye-tracking systems use a technique called pupil center corneal reflection (PCCR) to track movements of the eyes while viewing a visual stimulus. This technology uses near-infrared illumination to create reflection patterns on the cornea and pupil of the eyes of the user. Image sensors in the eye-tracking unit are then used to capture images of the eyes and reflections patterns of the near-infrared light. Using advanced algorithms for image processing, as well as a physiological 3D model of the eye, the software is able to estimate the position of the eye in space and the point of gaze on the visual stimulus. Sampling rates vary based on the manufacturer and model of eye tracker, and should be considered if purchasing a new eye tracker. A minimum sampling rate of 60 Hz is typically recommended, but a review of the literature will provide best practices for the type of study being conducted.

Eye Tracking as a Useful, Independent Research Tool Eye tracking is a quantitative method for recording a participant’s eye movements as they observe a visual stimulus. Thus, eye tracking can directly answer the questions: • •



At what part of the stimulus is the participant looking? How much time does the participant spend looking at a particular part of the stimulus? Does the participant look at a particular part of the stimulus more than the others? In what order does the participant view the various components of the stimulus?

This information alone can be quite useful if the investigator is looking to design, for instance, curriculum materials, dynamic representations, or interactive simulations. Details on designing suitable stimuli for eye-tracking investigations are given in Chapter 3 of this book. 2

Beyond the quantitative data, eye tracking can provide insight into the participants’ underlying cognitive processes as they interact with visual stimuli. Hoffman and Subramaniam have shown that if an individual’s eyes are focused on an object, the attention of the individual is also on that object (13). The relationship between mental processing and eye-movement data has also been studied extensively (14–17). However, to accept the relationship between cognition and eye movements, the investigator must rely on two core working assumptions (14): •



The immediacy assumption states that the viewer begins processing the object upon which they fixate immediately and before moving on to the next object. As they begin fixating on the new object, the viewer immediately begins processing this new information. The eye-mind assumption states that a link exists between the eyes and the mind, such that whatever the eye fixates on, the mind processes. Based on this assumption, it can be inferred that commonly occurring eye gaze patterns might represent similar ways of processing a visual stimulus.

Thus, in addition to the questions listed above, eye tracking alone can be useful to answer questions such as: • • •

What part of the stimulus does the participant spend most of their time processing or trying to interpret? How much time does the participant spend processing different parts of the stimulus? In what order does the participant process information presented in the stimulus?

While all of these questions are worthwhile investigations, they only begin to scratch the surface of the types of questions chemistry education researchers are interested in investigating. Eye tracking alone cannot answer questions about what the participant is thinking about the stimulus they are processing, or why they have chosen to process this information in the first place. Such interpretations require the use of auxiliary data collection methods.

Mixed Methods Approaches to Eye Tracking: Methodological Triangulation for Enhanced Confidence As with any research study, the use of additional data channels can serve to boost confidence in the interpretation of results from a single data collection method. Remember that eye tracking alone can only provide information about which parts of a stimulus an individual is processing. It provides no information on the types of processing that occur. Methodological triangulation is the use of more than one data collection method to improve confidence in results when investigating a research question (18). Because of the limitations of eye tracking, 3

most research investigations should rely on methodological triangulation to cement understanding of the quantitative findings. Holmqvist et al. suggest several auxiliary methods to consider, each with their own benefits and shortcomings (19). Many of the alternate data channels that work well with eye tracking are already common tools used by CER investigators, including cognitive interviewing, questionnaires, problem-solving tasks, and thinking aloud. Other auxiliary data collection methods rely on biological responses to gauge neurological functions. Some examples are galvanic skin response, functional magnetic resonance imaging (fMRI), and electroencephalography (EEG). For more information on the use of multiple biometric methods to triangulate eye-tracking results, the reader is referred to Chapter 7. Additional information on other biological data metrics, including those listed, can be found elsewhere (19). Two considerations that the researcher must consider when selecting such auxiliary data collection methods include: •



Will the auxiliary data collection method impact the eye-tracking data? Kirk and Ashcraft have shown that verbal interaction with a participant during eye tracking may alter their eye movements (20). Some studies have shown that the increased cognitive load of concurrent verbalizations slows down eye movements and learning processes (21, 22). How will the auxiliary data be linked to eye-tracking data? Most data collection software for eye tracking provides the capability for recording audio and video data while monitoring eye movements. While concurrent verbalization has some drawbacks, the ability to provide an in-the-moment perspective from the participant may serve as valuable data, which can then be directly linked to eye movements during the participant task.

The reader can find more information on identifying appropriate research questions and designing an eye-tracking study to address these questions in Chapter 2.

Eye-Tracking Measures When designing an eye-tracking study, it is important to consider the project as a whole, from formulation of the research question through analysis of findings, from the outset. Because of the nature of eye tracking, decision-making in study design must consider multiple factors. Chief among these decisions will be to determine which measures are best suited for the study. Before discussing these measures, there are some commonly used terms that appear frequently in eye-tracking literature and that you will see throughout this book: •

Fixation: a pause in eye movement in which the retina is stabilized over a stationary object. It is estimated that 90% of viewing time is comprised of fixations, which can last between 150–600 ms (23). 4







Saccade: a rapid movement of the eye that occurs between fixations. During this transition, the viewer is essentially blind. Saccadic movements can be both voluntary and reflexive, so careful consideration is important in analyzing data related to saccades (24). Area of interest (AOI): a region of the visual stimulus in which measurements, primarily fixations, can be aggregated as part of analysis. AOIs can be defined in two ways. First, AOIs can be defined by the researcher before data collection, a technique that is commonly used when the researcher wants to know if a participant looks at a particular region of the visual stimulus. In the second method, AOIs are defined after data collection based on cluster analysis. Cluster analysis aggregates fixation data from all participants to indicate regions of the visual stimulus that have a high concentration of fixations. Scanpath (fixation sequence): a series of eye fixations and saccades, most commonly among AOIs, that occurs when a viewer is exposed to the visual stimulus.

Based on these concepts, there are innumerable measures available to eyetracking users, ranging from the relatively simple to the cumbersome and complex. Below is a discussion of the most frequently utilized eye-tracking measures and their typical applications. The adventurous researcher is referred to Holmqvist et al. and Duchowski for a more extensive compilation of eye-tracking measures (19, 24). Fixation Count and Fixation Duration The most common types of measures used in eye tracking involve fixations. In particular, fixation count and fixation duration are frequently used to interpret participant processing of information. Fixation count refers to the distinct number of fixations within a particular AOI. This is a common measure to indicate how frequently a participant processes information within the AOI. Fixation duration is the length of time that a participant’s gaze remains within a particular AOI. Duration can indicate a participant’s level of understanding, or the complexity, difficulty, or importance of the information, depending on the task and other auxiliary measures. Although research has shown a strong correlation between fixation count and fixation duration, it is important to consider these two measures separately to get a complete picture of how a participant processes the information in a stimulus (25, 26). The researcher should keep in mind that a correlation between fixation count and fixation duration may not exist for their particular study; so it is always important to examine both measures. Some research shows that fixation count is an appropriate indicator of the importance of the information contained in an AOI, with higher fixation counts corresponding to greater importance (27, 28). Fixation duration, on the other hand, is indicative of the complexity of information within an AOI, with longer fixation duration corresponding to greater complexity (7, 29–34). Both fixation count and fixation duration can be analyzed in various ways. For example, fixation duration could be taken as the average duration of individual 5

fixations or the total duration of all fixations within the AOI. An AOI with a high fixation count but a low average duration per fixation can indicate search behavior, showing that the participant has little understanding of the presented information (35). Williamson et al. used fixation duration to examine how students used ball-and-stick representations and electrostatic potential maps to answer questions about electron density, positive charge, proton attack, and hydroxide attack for various molecules (36). Their results showed a correlation between the accuracy of the students’ response and the fixations of students within the provided representation. Hansen used fixation data to examine how students transition between different symbolic reaction representations when solving chemistry problems (33). Results showed that participants shifting their viewing pattern regardless of success and similar viewing patterns were employed by participants who were both successful and unsuccessful in solving visual stoichiometry problems. For a more in-depth discussion of how fixations may be collected and analyzed, see Chapter 4. Fixation Sequence or Scanpath Eye fixation sequences (also known as scanpaths) refer to the order in which a participant’s eye movements shift between AOIs. Whereas fixation count and duration can provide information on the importance and complexity of the various AOIs, fixation sequences can reveal perceptual strategies that people develop for interpreting the sum of a visual stimulus (25, 34, 35, 37–39). These strategies reflect the individual’s cognition processes and could be used to group participants or differentiate between demographically grouped individuals. Cullipher and Sevian used scanpath analysis to compare students at various education levels and problem-solving strategies (35). Their results showed that students with greater understanding showed distinct fixation sequences that correlated to examining the presented information in a direct and reasoned fashion. Additional applications of sequence analysis are discussed in Chapter 5. Pupil Diameter The use of pupil diameter measures, referred to as pupillometry, is another technique that can be used to indicate cognitive functions of viewers. A thorough history and explanation of pupillometry can be found in Chapter 8. In the chapter, Karch also provides evidence for the relationship between pupil dilation and cognitive function. Analyzing Eye-Tracking Data Once the aforementioned measures have been collected by the researcher, some data analysis, likely including statistical comparison, must take place. The type of statistics that can be applied to eye-tracking data share many commonalities with the type of statistics the CER community applies to other quantitative data. However, some special considerations for running statistical 6

analysis of eye tracking data are given in Chapter 6, along with suggestions for using the statistical program R to complete these analyses.

Putting It All Together Building an eye-tracking study takes careful planning and consideration. From selecting appropriate research questions, to designing stimuli and concurrent data collection methods to ensure these questions are properly addressed, to selecting the right metrics to collect, to finding the right statistical analyses to apply to the data, the researcher has many decisions to make. The final two chapters of this book, Chapters 9 and 10, give detailed examples of how CER can be applied to particular topics within CER, and show how other researchers have tackled some of these questions. We hope this book will help the researcher get started and feel more confident making some of these decisions.

References 1.

Van Gog, T.; Paas, F.; Van Merriënboer, J. J. G. Uncovering ExpertiseRelated Differences in Troubleshooting Performance: Combining Eye Movement and Concurrent Verbal Protocol Data. Appl. Cogn. Psychol. 2005, 19, 205–221. 2. Van Gog, T.; Paas, F.; van Merriënboer, J. J. G.; Witte, P. Uncovering the Problem-Solving Process: Cued Retrospective Reporting versus Concurrent and Retrospective Reporting. J. Exp. Psychol. Appl. 2005, 11, 237. 3. Goldberg, J. H.; Wichansky, A. M. Eye Tracking in Usability Evaluation: A Practitioner’s Guide. In The Mind’s Eye; Hyona, J., Radach, R., Deubel, H., Eds.; Elsevier, 2003, pp 493–516. 4. Jacob, R. J. K.; Karn, K. S. Eye Tracking in Human-Computer Interaction and Usability Research: Ready to Deliver the Promises. Mind 2003, 2, 4. 5. Land, M. F. Eye Movements and the Control of Actions in Everyday Life. Prog. Retin. Eye Res. 2006, 25, 296–324. 6. Reder, S. M. On-Line Monitoring of Eye-Position Signals in Contingent and Noncontingent Paradigms. Behav. Res. Methods Instrum. 1973, 5, 218–228. 7. Rayner, K. Eye Movements in Reading and Information Processing: 20 Years of Research. Psychol. Bull. 1998, 124, 372–422. 8. Rayner, K.; Pollatsek, A. The Psychology of Reading; Prentice Hall: Englewood Cliffs, NJ, 1989. 9. Inhoff, A. W.; Radach, R. Definition and Computation of Oculomotor Measures in the Study of Cognitive Processes. In Eye Guidance in Reading and Scene Perception; Underwood, G. M., Ed.; Elsevier Science, Ltd.: Oxford, 1998; pp 29–53. 10. Engbert, R.; Longtin, A.; Kliegl, R. A Dynamical Model of Saccade Generation in Reading Based on Spatially Distributed Lexical Processing. Vision Res. 2002, 42, 621–636. 11. Wolfe, J. M. What Can 1 Million Trials Tell Us about Visual Search? Psychol. Sci. 1998, 9, 33–39. 7

12. Wolfe, J. M. Visual Search: A Review. In Attention; Pashler, H., Ed.; University College London Press: London, 1998. 13. Hoffman, J.; Subramaniam, B. The Role of Visual Attention in Saccadic Eye Movements. Percept. Psychophys. 1995, 57 (6), 787–795. 14. Just, M. A.; Carpenter, P. A. A Theory of Reading: From Eye Fixations to Comprehension. Psychol. Rev. 1980, 87, 329. 15. Rayner, K.; Raney, G. E.; Pollatsek, A. Eye Movements and Discourse Processing. In Sources of Coherence in Reading; Lorch, R. F., O’Brien, J. E. J., Eds.; Lawrence Erlbaum Associates, Inc: Hillsdale, NJ, 1995; pp 9–35. 16. Rayner, K. Eye Movements and Attention in Reading, Scene Perception, and Visual Search. Q. J. Exp. Psychol. 2009, 62, 1457–1506. 17. Anderson, J. R.; Bothell, D.; Douglass, S. Eye Movements Do Not Reflect Retrieval Processes: Limits of the Eye-Mind Hypothesis. Psychol. Sci. 2004, 15, 225–231. 18. Denzin, N. K. The Research Act in Sociology: A Theoretical Introduction to Sociological Methods; Transaction Publishers: Piscataway, NJ, 1973. 19. Holmqvist, K.; Nyström, M.; Andersson, R.; Dewhurst, R.; Jarodzka, H.; Van de Weijer, J. Eye Tracking: A Comprehensive Guide to Methods and Measures; Oxford University Press, 2011. 20. Kirk, E. P.; Ashcraft, M. H. Telling Stories: The Perils and Promise of Using Verbal Reports to Study Math Strategies. J. Exp. Psychol. Learn. Mem. Cogn. 2001, 27, 157. 21. Nielsen, J.; Clemmensen, T.; Yssing, C. Getting Access to What Goes on in People’s Heads?: Reflections on the Think-Aloud Technique. In Proceedings of the Second Nordic Conference on Human-Computer Interaction; ACM: New York, 2002; pp 101–110. 22. Van Someren, M. W.; Barnard, Y. F.; Sandberg, J. A. C. The Think Aloud Method: A Practical Guide to Modelling Cognitive Processes; Academic Press London: London, 1994; Vol. 2. 23. Irwin, D. E. Visual Memory Within and Across Fixations. In Eye Movements and Visual Cognition: Scene Perception and Reading; Rayner, K., Ed.; Springer: New York, 1992; pp 146–165. 24. Duchowski, A. T. Eye Tracking Methodology, 3rd ed.; Springer: London, 2017. 25. Tang, H.; Day, E.; Kendhammer, L.; Moore, J.; Brown, S.; Pienta, N. J. Eye Movement Patterns in Solving Science Ordering Problems. J. Eye Mov. Res. 2016, 9, 1–13. 26. Stieff, M.; Hegarty, M.; Deslongchamps, G. Identifying Representational Competence With Multi-Representational Displays. Cogn. Instr. 2011, 29, 123–145. 27. Hegarty, M.; Mayer, R. E.; Green, C. E. Comprehension of Arithmetic Word Problems: Evidence from Students’ Eye Fixations. J. Educ. Psychol. 1992, 84, 76. 28. Green, H. J.; Lemaire, P.; Dufau, S. Eye Movement Correlates of Younger and Older Adults’ Strategies for Complex Addition. Acta Psychol. (Amst). 2007, 125, 257–278. 8

29. de Corte, E.; Verschaffel, L.; Pauwels, A. Influence of the Semantic Structure of Word Problems on Second Graders’ Eye Movements. J. Educ. Psychol. 1990, 82, 359–365. 30. Tang, H.; Pienta, N. Eye-Tracking Study of Complexity in Gas Law Problems. J. Chem. Educ. 2012, 89, 988–994. 31. Schuttlefield, J. D.; Kirk, J.; Pienta, N. J.; Tang, H. Investigating the Effect of Complexity Factors in Gas Law Problems. J. Chem. Educ. 2012, 89, 586–591. 32. Topczewski, J.; Topczewski, A. M.; Tang, H.; Kendhammer, L.; Pienta, N. J. NMR Spectra through the Eyes of a Student: Eye Tracking Applied to NMR Items. J. Chem. Educ. 2017, 94, 29–37. 33. Hansen, S. J. R. Multimodal Study of Visual Problem Solving in Chemistry with Multiple Representations. Ph.D. Thesis, Columbia University, 2014. 34. Havanki, K. L.; VandenPlas, J. R. Eye Tracking Methodology for Chemistry Education Research. In Tools of Chemistry Education Research; Bunce, D. M.; Cole, R. S., Eds.; ACS Symposium Series 1166; American Chemical Society, 2014; pp 11–191. 35. Cullipher, S.; Sevian, H. Atoms versus Bonds: How Students Look at Spectra. J. Chem. Educ. 2015, 92, 1996–2005. 36. Williamson, V. M.; Hegarty, M.; Deslongchamps, G.; Williamson, K. C.; Shultz, M. J. Identifying Student Use of Ball-and-Stick Images versus Electrostatic Potential Map Images via Eye Tracking. J. Chem. Educ. 2013, 90, 159–164. 37. Havanki, K. L. A Process Model for the Comprehension of Organic Chemistry Notation, The Catholic University of America, 2012. 38. Augustyniak, P.; Tadeusiewicz, R. Assessment of Electrocardiogram Visual Interpretation Strategy Based on Scanpath Analysis. Physiol. Meas. 2006, 27, 597. 39. Slykhuis, D. A.; Wiebe, E. N.; Annetta, L. A. Eye-Tracking Students’ Attention to PowerPoint Photographs in a Science Education Setting. J. Sci. Educ. Technol. 2005, 14, 509–520.

9

Chapter 2

Eye Tracking in Chemistry Education Research: Study Logistics Sarah J. R. Hansen*,1 and Jessica R. VandenPlas2 1Department

of Chemistry, Columbia University, New York, New York 10027, United States 2Department of Chemistry, Grand Valley State, Allendale, Michigan 49401, United States *E-mail: [email protected].

Eye-tracking studies for research purposes have unique characteristics that warrant special consideration when setting up, planning, and obtaining Institutional Review Board approval (IRB). This chapter will discuss necessary considerations when seeking approval for an eye-tracking study in chemistry education research and navigating the institutional context for human subject research for discipline-based educational research (DBER). Additionally, we include practical considerations such as using a shared eye tracker, exporting data, combining data from different systems, creating stimuli midtrial, and/or the use of pauses or triggers.

© 2018 American Chemical Society

Introduction Eye tracking holds significant potential to complement chemistry education research studies involving visual stimuli (1–6). Logistical considerations arise when combining eye tracking with chemistry education studies, particularly when eye-tracking research is new for a particular department or institution. Designing a research space and planning a research study are both daunting tasks for the new researcher and will be discussed along with additional study considerations, including the use of shared eye trackers, creating stimuli midtrial, and the use of pauses or triggers during eye tracking. Finally, institutional review boards (IRBs) or equivalent bodies are now almost universally charged with reviewing research studies involving human subjects to ensure ethical treatment of the participants and the research data. Because the collection of physiological data involves different considerations than more traditional chemistry education research studies, this chapter outlines IRB considerations for chemistry education researchers working with an eye-tracking system.

Designing a Research Space Selecting an Eye Tracker The first question any researcher must answer when they begin eye-tracking research is what type of eye-tracking system they will use and how to gain access to this system (by borrowing, renting, building, or purchasing a system). While the technology itself is fairly consistent among brands and between models these days, the physical configurations of these systems vary greatly. Some examples of these systems are shown in Figure 1. In order to select an appropriate system, the researcher must identify the basic types of research they wish to conduct and the demands of this research. For example, researchers conducting studies on student behavior in the laboratory will derive no benefit from a screen-based tracker and will likely need to use a headmounted tracker with an integrated scene camera (which operates independently of the eye-tracking camera(s) to capture video of the environment around the user) so that the user is able to interact with real-world objects. Researchers conducting studies on participant interaction with audio-visual material, such as textbooks, images, or animations, on the other hand, will likely benefit from the stability and ease of analysis that comes from using a screen-based tracker. Even among screen-based trackers, however, there are variations in the quality of data collected (based on sampling frequency and accuracy of the tracking hardware and software) that must be considered. Researchers conducting highly sensitive research, including research on pupillometry (see Chapter 7) or research where very small eye movements could be significant, including some reading research, may require systems with higher sampling rates or systems which offer some stability to the user (see Figure 1). Once a researcher has identified the physical configuration of the system that would best work for their research, they will likely 12

find only one or two brands that offer this physical configuration, thus narrowing the search significantly.

Figure 1. Two example eye-tracking systems. The first (left), is a screen-based SMI REDn system, which uses a remote IR light source and sensor attached directly to the computer, such that the system is not in direct contact with the participant. The second two images are of a Grinbath system. This head-mounted system rests on the participant like a hat, with the IR light source and sensor bending around the head. Setting up an Eye-Tracking Lab Depending on the type of eye tracker you are using, there may be some flexibility in the location that can be used for research. A shared eye tracker, particularly in another department, may already be installed, and you will need to consider how to guide your participants to the location and schedule trials in a potentially shared space. It is important to check that settings have not changed between sessions in order to guard against data compromise caused by a system crash or another researcher’s activity on the system you share. If your eye tracker is portable you may want to consider a fixed location where you can control the ambient light and allow your participant to be comfortable and uninterrupted. In terms of lighting concerns, researchers should consider a location with consistent lighting between sessions, and without bright light shining directly on the tracker or in the participant’s face. Consult the set-up manual for your particular system for suggestions on optimal lighting, participant distance from the screen, chair height, etc. Additionally, make-up (such as heavy mascara), glasses, contact lenses, and eye color or shape may also impact eye-tracking data collection (7). Chapter 1 discusses study design and set-up considerations.

Designing an Eye-Tracking Study Appropriate Research Questions Eye-tracking data has the potential to provide insight into participant behavior beyond traditional interview or observational methods. For example, if a student is viewing an animation of a redox reaction involving copper with copper ions floating in solution but does not mention the copper ions as being relevant during a subsequent interview, we may wonder if they even saw these atoms, or if the atoms were seen but not considered important (8). Similarly, a participant using 13

a simulation may direct their attention to the portions of the screen necessary to enter their answer; in this case, knowing where they look provides a more complete understanding of their engagement with the task (9). Eye tracking can tell you if visual attention is being allocated to a particular feature of the image (Figure 2). Similarly, eye tracking can give insight into the order in which an individual interacts with the simulation including when they read the balanced equation relative to answering a particular question (Figure 2). Research questions that deal with attentional metrics such as these (duration or number of views in a particular region, the order of viewing particular regions, etc.) are ideally suited for eye-tracking research. When designing a study, it is important to identify the type of data that will best address your research questions—eye tracking alone may not be appropriate or sufficient to address all research questions. Questions about what behaviors participants demonstrate (including what they pay attention to when completing a task) can be addressed via eye tracking. However, questions about why participants demonstrate certain behaviors may best be answered through interviews or thinkaloud protocol analysis instead of, or in addition to, eye-tracking data (Chapter 7). At the same time, eye-tracking data can also provide a different view of the participant’s experience that can help supplement or complement these additional data sources. For example, participants who describe a reaction and successfully answer questions about that reaction may meet our standard for understanding the reaction, but their viewing pattern may reveal large, chemically relevant portions of the image that were not attended to. This may indicate gaps in understanding that would not be detected by traditional interview methods. Coupling eye tracking with additional data sources, then, can provide a rich and often complex view of the participant’s experience (10–12).

Figure 2. Different participants playing the same simulation. Each circle is a fixation (the size of the circle indicates the length of the fixation). The participant on the left did not view the submicroscopic images while the participant on the right did not look at the balanced equation (9). (see color insert)

14

Working with Human Subjects Before beginning any data collection, it is crucial to seek approval for working with human subjects, generally through an institutional review board (IRB). A good starting point for chemistry education researchers is Christopher Bauer’s chapter in a previous ACS symposium series book, Tools of Chemistry Education Research (13). In addition to viewing data, some eye-tracking systems can collect a video of the participant, audio, or click data. Collecting only the data needed for the research study can streamline the IRB process and minimize the personally identifiable data collected. IRB processes are extremely institution-specific, but it is always a good idea to contact your office of Human Research Protection (or equivalent campus resource) early for guidance if you intend to publish or share your findings at a conference. It may also be helpful to discuss the timeline and process with colleagues who have already obtained IRB approval.

Data Collection and Consent Forms The process of creating consent forms, designing a research protocol, and collecting, storing, and analyzing data requires careful planning. If the eye tracker records an image of the participant but this information is not needed for the analysis, then covering or disconnecting the front camera prevents extra (identifiable) data collection and may make the IRB approval process easier. If you are analyzing participant gestures or interactions with a molecular model, then the front camera may be needed and the merit of this additional data is clearly justified in your protocol. Reflecting on the utility of audio data when designing the study may be useful; if participants can draw or organize their responses using graphic organizers, it may decrease the need for transcription. Some institutions may require additional scrutiny and consent if transcriptions of the audio are required for the research project. Eye-tracking data collection may increase the participant’s discomfort (14, 15). Being very specific about how the eye tracker collects the data may help. Does the eye tracker touch the participant? If it rests on the head or face, is it similar to how a hat or glasses would rest (See Figure 1)? Clarifying how the eye tracker interacts similarly to an item the participant is familiar with can provide clarity and minimize confusion with both the IRB and potential research participants. For example, your participants may be familiar with video game systems that track movement using IR sensors. Is the participant able to look away during the session? Will they be given breaks if needed or does the trial need to be completed during a single sitting? Additionally, some institutions request clarification about the IR beam used in the eye tracker, and it may be necessary to include specific safety warnings found in the manual for a particular eye tracker, such as warnings about photosensitive epilepsy (16) or safety guidelines the eye tracker meets. You may want to include the manual for your eye tracker with your IRB submission. For more discussion of stimuli design, look-away behavior, and eye fatigue, see Chapter 3. 15

Eye-tracking protocols will be classified differently by different institutions. It may be worth reaching out to your particular IRB in order to determine if they have approved eye-tracking studies before, and if so, which section of the Federal Policy for the Protection of Human Subjects these studies are normally approved under. Most commonly, eye tracking falls under exempt or expedited review categories, including 45 CFR 46.110a, category 4 (collection of data through noninvasive procedures) (17). Study Populations Pre-screening participants may be useful if you need to exclude some volunteers. For example, your institution may require you to screen for and exclude participants who are epileptic due to the IR light used to track viewing. Other populations you may want to exclude include: participants with specific medical conditions (color blindness, cataracts, etc.), those under 18 who would require parental consent, individuals requiring tinted or specialized glasses that might interfere with data collection, and volunteers who are currently your own students. Other factors such as stress, time of day, eye strain, and cognitive impairment can also impact the reliability of eye-tracking results. Chapter 3 includes a discussion of participant considerations with regard to stimuli design. These conditions may be more difficult to manage but should be noted. Consult the manual for your specific eye tracker regarding the limitations of the system you are using. It is important to note that prescreening your volunteers alone may require a separate consent form. Shared Eye Trackers If you are using an eye tracker in another department or another institution you may need additional IRB approval. This can also be true when recruiting participants from a different institution. Although institutions may be closely affiliated, each will have their own IRB and will require different levels of protocol approval. Some eye trackers can be moved easily between study locations (e.g. glasses or laptops) while others require you to bring the participant to a specific site. When working in another laboratory or institution you may need a letter stating you have permission from that lab’s PI to collect data there. Similarly, co-registration (collecting eye tracking and other biometric data together) has its own considerations, which are discussed in Chapter 7.

Collecting Data Practical Considerations During the data collection, you need to take into account the task you will ask participants to complete. If this task involves the participant making some sort of output, such as problem-solving or diagram generation, you must consider how this data will be collected. It is important to think about where the participant’s eye will be during each task and if shifting tasks will require you to recalibrate the 16

eye tracker. If your system loses calibration when participants look away in order to record their output, you may want to have them speak their responses rather than writing them. Depending on the type of additional data you need, a writing or drawing device (e.g. a Waacom tablet or iPad with a screen and audio recording software such as Vittle) may also be appropriate. Figure 3 shows data collected during an eyetracking session using the Vittle software (18). This software allows the researcher to upload images for the participant to edit. All edits are collected (anything written or drawn is then erased and edited) along with the audio at that point in the trial. The data is saved as an exportable movie file. Another option for allowing the collection of user outputs is a head-mounted eye tracker with a whiteboard.

Figure 3. A participant’s image (drawn on a tablet) in response to a drawing prompt. Audio data were collected along with eye-tracking data when the participant viewed a still image (on a second screen) (18). The analysis you use will need to take into account how the experiment and stimuli are structured. For example, participants who are able to look away from the screen will not have their eyes tracked during this time. Analyzing your data into terms of percentage fixation time or total fixation count can account for participants who look away from the eye tracker. This approach can also work when participants are allowed to view the stimuli for different lengths of time. Another consideration is who collects the data. Some researchers find that undergraduate research assistants interviewing other undergraduates make participants feel more comfortable. In this case, you will want to consider how you train the undergraduate research assistants and what the back-up plan is if the equipment experiences technical difficulties. Dry runs with the research assistants collecting data from the other researchers on the study is one way to practice and flush out sticking points. This is especially important if they need to process information during the trial and make edits to the experimental protocol while the participant is waiting. Making sure to check that the calibration is accurate before proceeding with the study is vital. If the trial has multiple separate experiment protocols using the eye tracker, you will need to consider if each requires a separate calibration and what level of calibration the eye tracker requires between participants. Older systems may benefit from a reboot between participants to clear working memory. 17

Pauses and Triggers It may be useful to include a pause frame between multiple stimuli that provides a moment for the researcher to check in with the participant before moving on. The researcher can then advance the experiment manually rather than allowing it to proceed in an automatic progression. If your system allows for trigger AOIs (Figure 4), adding them requires all participants to view a defined area of the screen for a continuous pre-set length of time before the experiment moves forward. This is one way to ensure the participants begin viewing the stimuli within that defined area on the screen. This may be particularly important for studies involving pupillometry, where establishing a baseline pupil diameter between tasks is important (see Chapter 7), or for tasks where response times are collected and all participants begin the task by viewing the same on-screen location. When using trigger AOIs, it is important to remember the viewing will need to be continuous. While one second seems short, it may be a long time for your participant to view continuously. A setting of 500 ms may be more attainable due to natural saccades. Similarly, the size of the AOI needs to be considered. A discussion of eye-tracking stimuli, including the use of triggers, can be found in Chapter 3.

Figure 4. A trigger AOI shown in yellow/gray. The eye-tracking experiment does not advance to the next image until fixations are detected in this area for a set amount of time. Creating Stimuli Midtrial Eye tracking offers researchers another unique opportunity: the ability to provide visual feedback to participants on their own viewing patterns. Visual feedback has been used to develop not only eye-controlled interfaces, but also gaze-dependent displays, which alter the stimulus presented based on a user’s viewing pattern. These modified displays can be achieved through the use of an eye-tracking system that is designed to respond directly to the participant’s eye movements (e.g. the Tobii 4C system or the use of programs like MATLAB (19)). For example, if a researcher wishes to study the impact of visual feedback on a 18

participant’s fixations on chemically relevant features in a simulation, the system can create a gaze-dependent display after each user interaction with the simulation. This requires a short break in the study while the researcher exports an inverse gaze opacity image and then inserts this image into a second eye-tracking experiment. A standard gaze opacity image shows only the areas the participant viewed with the rest of the screen blacked out; an inverse gaze opacity image instead blacks out where the participant looked, leaving the rest of the stimuli uncovered. Figure 5 shows how an inverse gaze opacity image compares to other visualizations of viewing, specifically a scan path, heat map, and a standard gaze opacity image. Midtrial stimuli creation is best achieved by creating multiple experiments in the eye-tracking software, allowing the stimuli for each subsequent experiment to be derived from the results of the preceding one.

Figure 5. Four images of gaze data. Top left image: scan path, displaying fixations as circles (size correlates to duration) and saccades as lines. Top right: heat map, displaying fixation duration from short (green) to long (red). Bottom left: gaze opacity image, showing areas viewed by the participant with unviewed areas blacked out. Bottom right: inverse gaze opacity image, showing areas not viewed by participant, with areas viewed blacked out (18). Stimulus images (labeled ‘start’ and ‘end’) adapted from animations created by Resa Kelly (personal communication) (8). (see color insert) Showing a participant where they looked offers an opportunity to study changes in viewing patterns due to visual feedback. Participants are able to reflect on the areas of the screen to which they did not attend and to identify any chemically relevant features they did not consider initially. This approach also holds the potential to create more purposeful viewers by making participants aware of their viewing strategy. Figure 6 shows an inverse gaze opacity image, along with visualizations of where the participant directed their visual attention after receiving this feedback on their initial viewing pattern. The heatmap and scanpath images show that the participant once again viewed the darkened areas 19

that they had previously viewed, but also looked at additional chemically relevant features during their second viewing.

Figure 6. The top image is an inverse opacity visualization of the participant’s viewing patterns from the first half of the study. This image was exported midtrial and inserted into the second half of the eye-tracking study. The bottom two images show where the participant looked when they were shown their initial viewing pattern. The heatmap (bottom left) and the scanpath (bottom right) together show that the participant viewed new chemically relevant features when shown their initial viewing pattern (18). The initial stimulus image is from a video developed by Resa Kelly (personal communication) based on a previously developed concept (8).

Analyzing Data Exporting Data Eye-tracking data records where and when someone looks at visual stimuli. The equipment specifications vary between machines, but additional metrics often provide insight into the reliability of the data as well as the viewing process. These additional metrics may include pupil diameter, mouse position or clicks, gaze vector, number of blinks/saccades/fixations/samples, individual eye deviations, and length of viewing (Figure 7). Most systems also provide a metric assessing the quality of the data collected, such as a confidence ranking (Tobii assigns each data point a validity code on a 0-4 scale, indicating confidence that 20

both eyes were located and tracked correctly) (16) or a tracking ratio (see Chapter 3 for more information). Often the analysis software provided with the equipment can process that information so the data are summarized with images (such as scanpaths, heatmaps, or an AOI sequence chart) and preliminary conclusions can be drawn. Eye trackers collect a large amount of data during each trial and careful consideration is needed to avoid turning a large complex quantitative data set into a qualitative snapshot that can be misleading. By not digging into the data behind those images or considering the research questions beyond those answerable by the analysis metrics, an opportunity may be lost. Exporting and quantitatively analyzing the eye-tracking data has the potential to reveal a more nuanced and possibly extremely valuable data set. One option is to export the data and analyze using R; Chapter 6 explores how R can be used to analyze eye-tracking data.

Figure 7. The export data screen from an SMI system (left) versus a Tobii system (right). Exporting eye-tracking data increases analysis options.

21

Some systems have variable settings that allow researchers using eye trackers from different companies to integrate their data. For example, the REDn 60 Hz eye tracker allows a researcher to modify the sampling frequency from 60 Hz down to 30 Hz, providing the flexibility to share data with systems that have lower sampling frequencies. Being able to shift the settings to accommodate collaborative research offers additional flexibility. Knowing how to export your data and analyze externally using third-party software (such as R) can be extremely helpful if your software license expires, you want to investigate questions outside the scope of your analysis software limitations, or intersystem collaboration is of interest. It is your data and often there are questions to investigate and valuable analysis possible beyond the limitations of the software that came with the eye tracker.

Conclusion Practical considerations always need to be taken into account when planning and executing any research study, but particularly when eye tracking is involved. The researcher must be aware of the benefits eye tracking offers their study, including information about attention allocation and behavioral patterns, but researchers must also be aware of the limitations of their data. Eye tracking is not capable of informing the researcher about why participants look where they do, and it is frequently beneficial to collect data from additional sources such as drawings or participant interviews to help support the results of eye-tracking analyses. Based on these strengths and limitations, the researcher must carefully select an appropriate eye tracker and set up a research space that will allow for the collection of necessary data. Similar foresight is necessary when planning the participant’s task with a focus on the type of data that needs to be collected, as well as how this data might be analyzed. Consider running a pilot study to see if the data analysis addresses the research questions you hoped to investigate as you may need to shift the settings or reconsider the stimuli used (for more discussion of stimuli design, see Chapter 3). Hiccups in obtaining IRB approval, as well as collecting and analyzing data, can be minimized by considering these issues during the planning phase of an eye-tracking study.

References 1.

2.

3.

Tasker, R. ConfChem conference on interactive visualizations for chemistry teaching and learning: Research into practice—visualizing the molecular world for a deep understanding of chemistry. J. Chem. Educ. 2016, 93, 1152–1153. Battacharyya, G. Trials and tribulations: Student approaches and difficulties with proposing mechanisms using the electron-pushing formalism. Chem. Educ. Res. Pract. 2014, 15, 594–609. Rau, M. A. Enhancing undergraduate chemistry learning by helping students make connections among multiple graphical representations. Chem. Educ. Res. Pract. 2015, 16, 654–669.

22

4.

5.

6.

7. 8.

9. 10.

11.

12. 13.

14.

15.

16.

17.

18. 19.

Arjoon, J.A.; Xu, X.; Lewis, J.E. Understanding the state of the art for measurement in chemistry education research: Examining the psychometric evidence. J. Chem. Educ. 2013, 90, 536–545. Williamson, V. E.; Lane, S. M.; Gilbreath, T.; Tasker, R.; Ashkenazi, G.; Williamson, K. C.; Macfarlane, R. D. How does the order of viewing two computer animations of the same oxidation-reduction reaction affect students’ particulate level explanations? J. Chem. Ed. 2012, 89, 979–987. Nyachwaya, J. M.; Gillaspie, M. Features of representations in general chemistry textbooks: A peek through the lens of the cognitive load theory. Chem. Educ. Res. Pract. 2016, 17, 58–71. Holmqvist, K.; Nystrom, M. Eye Tracking: A Comprehensive Guide to Methods and Measures; Oxford University Press: Oxford, 2011. Kelly, R. M.; Hansen, S. J. R. Exploring the design and use of molecular animations that conflict for understanding chemical reactions. Quim. Nova. 2017, 40, 476–481. Hansen, S. J. R. Multimodal Study of Visual Problem Solving in Chemistry with Multiple Representations. Ph.D. Thesis, Columbia University, 2014. Shayan, S.; Bakker, A.; Abranhamson, D; Duijzer, C. Eye-tracking the emergence of attentional anchors in a mathematics learning tablet activity. In Eye-Tracking Technology Applications in Educational Research; Was, C.; Sansosti, F.; Morris, B., Eds.; IGI Global, Hershey, PA, 2017; pp 166–194. Havanki, K. L.; VandenPlas, J. R. Eye tracking methodology for chemistry education research. In Tools of Chemistry Education Research; Bunce, D. M.; Cole, R. S., Eds.; ACS Symposium Series 1166; American Chemical Society: Washington, DC, 2014; Chapter 11, pp 11–191. Cullipher, S.; Sevian, H. Atoms versus bonds: How students look at spectra. J. Chem. Educ. 2015, 92, 1996–2005. Bauer, C. F. Ethical treatment of human participants in chemistry education research. In Tools of Chemistry Education Research; Bunce, D. M.; Cole, R. S. ACS Symposium Series 1166; American Chemical Society: Washington, DC, 2014; Chapter 15, pp 279–297. Prajakt, P.; Chandrasekharan, S. Eye tracking in STEM education research: Limitations, experiences and possible extentions. IEEE Sixth International Conference on Technology for Education, 2014, pp 116−129. Susac, A.; Bubic, A.; Martinjak, P.; Planinic, M.; Palmovic, M. Graphical representations of data improve student understanding of measurement and uncertainty: An eye-tracking study. Phys. Rev. Phys. Educ. Res. 2017, 13, 020125-1–020125-20. Tobii Eye trackers User Guide. https://help.tobii.com/hc/en-us/ article_attachments/207548509/ Alienware_17_R4_-_User_s_Guide.pdf (accessed January 24, 2018). U.S. Department of Health and Human Services. https://www.hhs.gov/ohrp/ regulations-and-policy/guidance/categories-of-research-expedited-reviewprocedure-1998/index.html (accessed February 13, 2018). Hansen, S. J. R. Unpublished work from NSF Award #DUE-1525475. Cornelissan, F. W.; Peters, E. M. Monocular and binocular contributions to ocular plasticity. Behav. Res. Methods. Instrum. Comput. 2002, 34, 613–617.

23

Chapter 3

What They See Impacts the Data You Get: Selection and Design of Visual Stimuli Katherine L. Havanki*,1 and Sarah J. R. Hansen*,2 1Department

of Chemistry, The Catholic University of America, Washington, DC, 20064, United States 2Department of Chemistry, Columbia University, New York, New York 10027, United States *E-mails: [email protected] (K.L.H.); [email protected] (S.J.R.H.).

The stimuli used in eye-tracking research has the potential to greatly impact the reliability of experimental data collected as well as the analysis options available. There are many considerations when selecting visual stimuli to evoke the desired response. This chapter will discuss the fundamentals of selecting visual stimuli for eye-tracking research, including the types of stimuli used in chemistry education research (CER) and related disciplines; the advantages and disadvantages of self-design; and important considerations relating to cognition, human behavior, and eye-tracking equipment. The limitations inherent in stimuli selection are presented along with implications for research conclusions.

© 2018 American Chemical Society

Introduction The choice of stimulus plays a significant role in the selection of data visualization (i.e., heat map, gaze plot, and swarming), analysis (i.e., descriptive statistics, inferential statistics, clustering analysis, and string-edit algorithms), or visual analytics techniques (i.e., space–time cube visualization, spatiotemporal aggregation, sequence or path similarity analysis, temporal histograms, flow mapping, and movement summarization analysis) (1). Careful consideration must be given to stimulus design so that features do not cue attention and/or unintentionally interfere with natural eye-movement behavior by changing the efficiency of visual search. Readability and legibility can affect reading speeds and comprehension of text and lead to participant fatigue. Finally, resource demands and limitations of equipment must be considered in the design.

What Are Stimuli? A stimulus is an image or event that elicits a behavioral or emotional response from a participant. For eye tracking, the response that is elicited is a pattern of saccades and fixations that gives rise to a scan path or the small eye movements that keep a moving object focused on the center, or fovea, of the eye called smooth pursuit movements. In reading research, patterns of fixation on letters, parts of words, and complete words provide researchers with insights into the mechanics and underlying cognitive processes used in reading (2). To study visual search, arrays of items or scenes are used. Patterns of fixations on targets, distractors, and background elements can emerge. Targets are features or items with specific characteristics of interest specified by the research question. Distractors have features or items with characteristics similar to those of the targets. The background are all the remaining elements of the stimulus. By analyzing fixation patterns, researchers can gain insights into a participant’s visual-search behaviors. Eye-tracking data is task dependent, and any conclusions drawn from eye tracking must be tied to the task performed by the participants (3, 4). The implication is that eye movements during a memorization task would be different from those for an identification task. Some tasks have characteristic eye movements that can be anticipated, while other tasks have movements that vary widely among participants. For example, eye movements during reading have an overall pattern dictated by the conventions of the language (for English: left to right, top to bottom), while eye movements in visual search vary widely with task (“free viewing,” target search, recognition, memorization, etc.). Refer to Chapter 10 for more information on visual search. Selecting stimuli for a study should be driven by the research question and the task being studied.

26

Types of Stimuli Anything that can be viewed by a human subject can be a stimulus for eye tracking. Wearable trackers allow data collection as a participant navigates threedimensional space in real time. More common in chemistry education research (CER) is the projection of stimuli on a screen-based tracker. These stimuli include •



• • •



• • •

Text—a symbolic representation of natural language or nonwords used to study reading behaviors such as lexical access, reading span, comprehension, and reading times; Symbolic language—an artificially constructed language that primarily uses symbols to convey meaning (e.g., symbolic logic, mathematical expressions, chemical symbols, and music); Arrays of shapes or alphanumeric characters—a collection of items that can be displayed randomly or in patterns depending on the task; Scenes—a collection of objects arranged in space; Naturalistic scenes—real-world environments where the researcher does not control the objects’ positioning, lighting, color, or shape (i.e., actual static or dynamic images of the real world) Artificial scenes—composite scenes built from computer-generated images or clipart, allowing the researcher to control all aspects of the positioning, lighting, spacing, and so forth; Websites—online or offline collection of webpages that are connected in various ways; Animations and videos—events that are recorded as a set of sequential images that can be played back; and Simulations—representations of a problem, situation, or process that can be used for study.

Characterization of Stimuli When designing stimuli, there are three main features that must be taken into consideration and can be used for characterization. These features are selected based on the research question and, in some cases, the available equipment. The three main categories discussed in this section are the type of visual image, the level of interaction the participant has with the stimuli, and the display properties. Stimuli can be static (S) or dynamic (D) images, which are passively viewed (P) or interactive (I). Additionally, these stimuli can be two-dimensional (2D) or threedimensional (3D). This leads to eight possible types of stimuli (Figure 1). Each feature of a stimulus in the combination has implications for the visualization techniques and choice of analysis. This section will discuss each feature separately to help researchers classify their stimuli.

27

Figure 1. Types of stimuli based on features. For example, a stimuli in which participants passively view an animation on a monitor-based eye tracker would be classified as a DP2 stimuli.

Static and Dynamic Stimuli Static stimuli are still images projected on the screen. The stimuli do not change during viewing. These include text, symbolic languages (e.g., mathematical expressions, chemical structures, and chemical equations), random arrays of shapes or alphanumeric characters, complex arrays of objects (e.g., spectra and instrument panels), and scenes (e.g., photographs and screen captures of animations). Figure 2 shows a static photograph used for an eye-tracking study investigating student critique of animations representing precipitation reactions. Dynamic stimuli (Figure 3) are moving images such as movies, dynamic visualizations (e.g., animations and simulations), immersive virtual reality simulations (e.g., gaming immersion, computer-generated virtual reality, augmented/mixed reality, and 360° video), and real-world environments through the use of wearable eye trackers. In this type of stimuli, the field of view can be controlled by the researcher during the creation of the stimuli or by the subjects, as in the case of virtual reality and real-world viewing. 28

Figure 2. This is a static image of the starting and ending macroscopic view of a precipitation experiment that combines images from the experiment with symbolic representations and names of the species involved. Image adapted from a video developed by Resa Kelly (personal communication) based on a previously developed concept (5). Participants were shown this image after watching the experimental video and before viewing animations of the reaction.

Figure 3. Example stimuli used in eye-tracking experiments. [images from a video developed by Resa Kelly (personal communication) based on a previously developed concept] (5). The image on the left is a single still from an experimental video showing conductivity measurements of solutions before and after a redox reaction takes place. The image on the right is a still from the paired submicroscopic animation; the silver nitrate ions on the right side move towards the copper atoms on the left and electrons are transferred. The image is marked as “A” because it is one of three variations used for this eye-tracking experiment.

Passive and Interactive Stimuli Whether or not the participant interacts with the stimuli will have a significant impact on the type of data visualization and analysis selected. As the name implies, passive stimuli are only viewed by the participant, and participants cannot manipulate the stimuli in any way. Both static and dynamic stimuli can be passively viewed. The advantage of this type of stimuli is that there is little to no additional data that needs to synchronize with the viewing patterns, and visualization and analysis are less complicated because everyone is viewing the same image or movie in the same way. 29

When the participant is allowed to interact with the stimuli, a level of complexity is added to the data visualization and analysis. The types of stimuli that are classified as interactive include interactive simulations, websites, molecular graphics systems, and video games. During the viewing of interactive stimuli, participants can change what they are viewing through their own actions (e.g., manipulate objects, change the viewing angle, zoom in/out, scroll, click on a link, load a new file, etc.). Interactions must be characterized and then synchronized with viewing behaviors, which is a nontrivial task. Synchronization issues are addressed later in this chapter. Additionally, comparing the viewing-interaction patterns of several participants may be difficult depending on the amount of freedom the participants have to interact with the stimuli. Careful consideration must be given to how and to what extent the participants will interact with the stimuli and how these interactions will be classified in order to allow for comparison of viewing-interaction behaviors.

2D and 3D Stimuli Eye tracking commonly uses two-dimensional or “flat” images projected on a screen to study fixation patterns, even when the researcher is interested in how participants interact with three-dimensional objects. Two-dimensional images are used to capture fixation data with x and y coordinates. For these stimuli, there are established methods for visualization and analysis techniques for characterizing attention, including heat maps, gaze plots, and scan path analyses. It is common to use monocular, or “single eye,” viewing data for these types of stimuli to increase data acquisition and simplify the analysis. It is believed that, with normal vision, both eyes look at approximately the same position at the same time; therefore, it is only necessary to analyze the data from one eye. While many high-end eye trackers collect binocular data, providing one data stream for each eye, they often default to monocular data sets for analysis and visualization to increase accuracy and precision of the tracking. In this case, the software either selects the eye with better quality data (determined by parameters set by the researcher) or it averages the position of each eye from the data streams to produce a monocular data set. Researchers have also used stereoscopic images (two separate images, one for each eye) to explore differences in viewing patterns of 2D and 3D images (6–8). While stereoscopic images provide additional depth cueing information, they still do not provide all the depth cues of real-life environments. Viewers also complain of discomfort resulting from the “decoupling” of the way the eye naturally focuses. During focusing, the eye changes the shape of the lens to focus the image on the back of the retina; this process is called accommodation. Fatigue can occur when accommodation becomes out of sync with vergence, when the eye rotates in opposite directions to focus the image on the center of each retina (9, 10). Considerations regarding participant fatigue and discomfort should be part of the stimulus design when these types of images are used. Using wearable eye trackers, it is now possible to track in actual 3D environments. Some wearable trackers use a video-based approach in which fixation patterns are overlaid on a 2D video from a scene camera, effectively 30

reducing the tracking event to a series of fixations with x and y coordinates on a video. Other wearable systems have been developed to track gaze localization or gaze depth in 3D space. In an analysis of this type, data are mapped onto 3D models of the environment. In this case, 2D visualization and analysis approaches are not preferred because the data requirement for two-coordinate data and depth data is lost. Different visualization and analysis techniques must be used to account for the added complexity of depth captured by eye tracking in 3D environments (11). In the current CER literature, the most common type of stimuli used are 2D. The remainder of this chapter will focus on this type of stimuli.

Choosing Stimuli Stimuli can take many forms, and the choice of form is driven by the research questions. Stimuli can come from a variety of sources or be designed by the researcher. The choice between working with premade or custom-designed stimuli depends on the research question and the tasks. This section will discuss the pros and cons of each type of stimulus. The following are advantages of premade stimuli: • • • •

There is a shorter time for implementation of the study. The development of the stimuli is already complete, and materials are used as is. Professionally developed materials tend to have a higher visual quality and graphic design. The use of triangulation data from other studies may be possible. The use of premade materials authentically mirrors the way that teaching and learning take place.

The following are disadvantages of premade stimuli: •

• • •



The viewing task is bounded by the stimulus design; if the stimulus has text and equations, it is impossible to study just the visual search of equations because the presence of the text may confound results. There is no direct control of the visual characteristics of the stimuli without a change in fidelity (e.g., size, typeface, coloration, etc.). There is no control over the cognitive demands of the stimuli on participants (e.g., complexity, linguistic characteristics). There is no control to provide support for the sample population, and accommodations cannot be made for participant characteristics such as eye conditions, color vision deficiency, or non-native speakers of the language used in the study. Accommodations cannot be made for hardware and/or connectivity limitations (e.g., speed of the computer’s processor or data connection, image loading or manipulation, lags).

31

The following are advantages of custom-designed stimuli: • • • •

The researcher has complete control over all aspects of the stimulus design, including visual characteristics, complexity, and coloration. Stimuli can be designed for a specific task, reducing confounding variables that may affect fixation patterns. Stimuli can be designed to account for limitations of hardware and connectivity. There is a greater control of variables such as cognitive load, attention, fatigue, and emotional response.

The following are disadvantages of custom-designed stimuli: •

• •

Considerable time must be invested in the development and testing of stimuli before they can be used in a study. This leads to longer implementation times. There may be direct or indirect costs incurred in the development, graphic design, and testing of quality stimuli. Although stimuli are designed to study a particular task or behavior, they may not be authentic compared to real learning materials or situations. Additional work may be required to test the stimuli in authentic learning environments.

Designing Stimuli The primary purpose of stimuli is to elicit a response that will answer the research question. The research question will drive the selection of a task or tasks for the participants to carry out during the eye-tracking session. Because eyetracking data is task dependent, different tasks, such as reading, facial recognition, visual search, and scene processing, will elicit different types of eye-movement behavior (2, 12, 13). Stimuli need to be developed specifically for the task being studied. Limitations of the tracking equipment is a major factor in the design. Additionally, considerations must be made to ensure that the design of the stimuli allows participants to complete the task without being distracted or influenced by the format or presentation. Finally, the researcher needs to keep in mind how the stimuli influences the visualization and analysis of data. Equipment Limitations and Size When designing or selecting stimuli, one of the most important considerations is the limitations of the equipment available. These limitations include the resolution of the tracker and the computing resources available, as discussed later in this chapter. In eye tracking, resolution usually refers to the smallest movement of the eye that can be detected. These small movements are typically measured in visual degrees. In Figure 4, the visual angle θ is given by the equation θ = 2 arctan(S/2D), 32

where D is the distance of the eye to the object and S is the height of the object. The resulting angle measurement is reported in degrees (°).

Figure 4. Visual angle (θ) is dependent on the distance from the nodal point of the eye’s lens to the object (D) and the height of the object (S). As the distance from the object increases, the viewing angle decreases. The approximate area of the fovea of the eye has a visual angle of 1°, corresponding to 45 pixels (S = 1 cm) for a participant sitting 60 cm (D) from the screen. This means that for images smaller than 45 pixels, the complete image can be focused on the fovea without movement, while images with a visual angle greater than 1° may require a sequence of fixations to completely view the image, giving rise to a scan path for analysis. The error of the tracker is also reported in terms of visual angle. According to Vadillo et al. the error for many trackers is 1° of visual angle (45 pixels) (14). This means that the location of the recorded fixation can be off by as much as 45 pixels under ideal conditions. Researchers are encouraged to refer to their equipment specifications for tracker-specific gaze accuracy. The resolution of the tracker should be a major consideration in selecting or designing stimuli. In practical terms, the size of the target/distractors or words/letters in the text must be large enough so that (1) participants must move their eyes to view the entire image and (2) movements within and between features can be detected accurately by the tracker. Orquin et al. also recommend that efforts should be made to maximize the distance between objects of interest, which reduces overlap of fixation distributions across features and reduces false classifications of fixation on areas of interest (AOI) (15). In addition to size, there are several other considerations regarding the display of the stimuli. During the design phase, it is important to think about how formatting and other display characteristics may influence the viewing behaviors of participants. The next section will discuss several of these considerations. Format There are many features of a stimulus that can influence attention or other measures such as reading speed. Initial attention can be guided by the saliency of features in the stimulus. Differences in color, motion, and luminosity cause some features to “pop out”, attracting the viewers’ attention. Placement of targets and distractors, style of text, and complexity of the stimulus can also influence the attention of the participant. It is important that factors known to influence attention do not dominate the stimulus and thus influence viewing patterns. It 33

should be noted that this is not an exhaustive list; it is meant as a springboard for researchers to begin to check their stimuli with a critical eye for features that may unintentionally impact the eye-movement behaviors of participants. Any discussion of attention and visual search needs to include the visualattention model of saliency, a bottom-up process for selecting relevant regions of a visual scene. The control of gaze is driven largely by the visual properties of the stimulus. Attention is guided by saliency maps, which are 2D maps that guide attention from the most salient feature to the next most salient feature, and so on (16). Figure 5 is an example of a saliency map generated from a spacefilled model of (S)-ibuprofen using MATLAB (17) and the Saliency Toolbox (18). Saliency is assigned to locations in the stimuli that differ from the surroundings based on a variety of characteristics, such as color, motion, and luminosity. A second, slower top-down process directs the spotlight of attention using search criteria under cognitive control. According to this model, saliency drives the initial fixations after stimulus onset. Over time, a balance is struck between the top-down and bottom-up processes.

Figure 5. The saliency map for a spaced-filled model of butane created using MATLAB (17) with the Saliency Toolbox (18).

Color If there was a single red word on this page, it would pop out to readers. This effect is due to saliency. Color has been shown to influence attention (19, 20). Color can reduce visual search time to locate targets and affect the frequency and duration of fixations (21–23). For example, when compared with identical grayscale images, colored images have more fixations on distinct regions of the image, leading to more areas of interest in the cluster analysis (24). Because color is universally believed to attract attention, color in stimuli should only be used when directed by the research question. Incidental use of color on the screen may cause unintended changes in fixation and saccade patterns.

34

Movement Movement has a pronounced effect on attention. While sustained motion does not necessarily attract attention, it has been shown that the onset of motion does (25). Targets that have just started moving attract attention faster than stationary or continuously moving targets. Movement also affects perception. Processing moving images can quickly overwhelm a study participant, particularly when the visual space involves lots of visual elements (e.g., water molecules showing solvated ions that are reacting). Additionally, features that quickly appear and then disappear in a small screen space may be missed by the viewer (e.g., electrons shown during a redox reaction). Slowing down dynamic stimuli gives participants a greater opportunity to attend to elements on the screen and process the visual information. This can be done using video-editing software such as iMovie or ScreenFlow. It may be useful to present the dynamic stimuli multiple times (either a preset number of times for all participants or until your participants decide they are ready to move on to the next step), giving plenty of opportunity to rewatch the dynamic stimulus. Another approach is to show still images of an animation. This can be accomplished using a composite still image, created by combining the start and end scenes from the animation or using multiple storyboard-like images from the animation (Figure 6), so the viewer has an opportunity to focus on the chemically relevant features without the cognitive load added by the animation images moving (5). Some studies add audio or circles or gray out portions of the screen to focus viewers’ attention on relevant features.

Figure 6. A storyboard style still presentation of animation images. Images adapted from a video developed by Resa Kelly (personal communication) based on a previously developed concept (5). Presenting dynamic stimuli in a static form allows the participant to engage with the visual information for a longer period and compare different points in the animation (in this case the starting and ending points of a precipitation reaction).

35

Brightness (Luminosity) Brightness not only plays a role in pupillary response (see Chapter 8 for more information on pupillometry); it also influences attention. Careful consideration needs to be given to the amount of white space on the screen (see notes in “Participant Considerations”). Several studies have shown that luminance contrast, which is a difference in the brightness of the object compared with the surroundings, will lead to preferential fixations on regions of high contrast when compared to control regions (26). Care must also be given, especially when creating composite arrays, to ensure that all the components are equiluminant. If distractor regions have a higher degree of contrast than target regions, an unintended consequence may be that there are more fixations on the distractor regions due to these attention effects. For a simple test of this effect, consider Figure 7. Which formula in each array draws attention?

Figure 7. Example of luminosity effects on attention. In (A), all of the chemical formulas are equiluminant. In (B) the brightness of the first formula is 25% higher than the remaining members of the array. In Figure 7(A), there are no attentional effects due to differences in brightness because all the elements of the array have the same contrast with the background; however, in Figure 7(B), the first formula seems to pop out of the array. If the task for viewing these images was to locate the formula with the alcohol functional group, Array B may give different scan paths than Array A due to the attentional effects of having a distractor (CH3COOH) with a different luminance contrast than the target (CH3OH) in the array. Placement of Targets and Distractors When using a monitor for viewing stimuli, participants will have a tendency to look to the center of the screen rather than the periphery, even when there is no central distribution of objects (21). According to Holmqvist et al. (27), participants are also more likely to make horizontal saccades than ones that are 36

vertical or oblique. The implications for design are that placement of target regions only on the horizontal central axis with distractors in the periphery will lead to inflated fixations on the target regions because of the bias resulting from this central tendency (28, 29). For those researchers designing scenes, animations, websites, and other stimuli for visual search tasks, it is important to select locations for targets that are distributed rather than centrally located or use software for building psychological experiments that can randomize preset locations between trials (see the “Use of External Software” section). Style of Text The way text is presented to the participants can affect their viewing patterns. Many factors must be considered in deciding how text is presented, including size, shape, spacing, print quality, and line length. The size and shape of the font chosen for the stimuli should be a consideration, especially if the study investigates changes in fixation times and reading speeds. In the earliest study on the subject, Tinker (30) found that the optimal font size for printed work was 10 points. A smaller font size (6 points) had longer fixation durations and slower reading times, while large fonts (14 points) had more fixations. For a review of size effects, see Legge and Bigelow (31). Reading times and fixations increased with increasing font size when text was displayed on LCD screens (32). However, it has been shown that a font size of 14 points on a computer screen may be the optimal size for shorter reading times (33). Other typographical factors that affect eye movements include letter spacing, print quality, and line length (34). It has been suggested that font shape or typeface may affect reading times. Slattery and Rayner found that fonts like Times New Roman are easier to encode because of their simple shapes, leading to “faster reading times, fewer fixations, and shorter durations (35)”. It has also been shown that serif fonts like Tahoma are read significantly faster than ornate fonts like Corsiva (36). There is inconclusive evidence as to the effects of font style (serif vs. sans serif) on on-screen viewing behavior (32, 37, 38),; however, it has been shown that participants prefer to read sans serif fonts like Arial when displayed on a computer monitor (33, 39). Given the definite influence of font size and style, researchers should consider them when creating stimuli. A standard style should be chosen over an ornate or decorative style. Researchers should make efforts to keep font size and typeface consistent among stimuli unless otherwise guided by the research question. This is especially important when the stimuli are websites, different passages of text, or images containing text that may be scaled to fit on the tracker screen. Complexity Complexity of eye-tracking stimuli can take two forms: language complexity and visual complexity. Both have been shown to influence eye-movement behavior. If the stimuli involve text, such as arrays, reading prompts, or annotated diagrams, considerations need to be made related to syntagmatic complexity (e.g., word and sentence lengths), paradigmatic complexity (e.g., tense distinctions), organizational complexity (e.g., word order), and hierarchic complexity (e.g., 37

recursion and lexical knowledge). In general, as the complexity of the text increases, there is an overall increase in fixations, the saccade length shortens, and reading times increase (2, 40, 41). Rayner notes that the frequency effect, in which common words are processed faster and more accurately than less common words, also relates to eye-movement behavior. High-frequency words have shorter fixations than low-frequency words. Regressions (a look-back behavior) also increase as the complexity of the sentence increases because participants need to look at previously viewed words as they work out the meaning of the sentence. The overall complexity of the text should be taken into account when selecting or writing text as stimuli. The language usage, syntax, and grammar of the passages need to be scrutinized. Depending on the research question, if the text complexity is high and the reading level of participants does not match the text presented, eye movements may be confounded. For images, like scenes, animations, and arrays of shapes, similar considerations are important. During the viewing of images, the primary task is a visual search for key elements. The complexity of an image is related to the number of visual elements shown and the relationships between them (42). In Figure 8, there are two representations of hydrochloric acid added to water. In the first image, there are significantly more water molecules than in the second image. Vlaskamp and Hooge would argue that as the number of elements (water molecules) in the image increases, the diagram becomes more complicated and the fixations increase in frequency and duration (43). Because the number of elements in the image and their relationships influence fixation patterns, researchers need to be aware of the level of complexity in their images. For example, when developing parallel stimuli to compare eye-movement data, the number of visual elements should remain constant. If the number of visual elements is increased by including more visual information in one of the paired stimuli, fixation durations and frequency will also unintentionally increase for that stimulus, leading to problematic data (44).

Figure 8. Example stimuli used in eye-tracking experiments. The image on the left shows a submicroscopic representation of hydrochloric acid added to water, the chloride ion is surrounded by water molecules and the hydronium ion is obscured by the water. Next, most of the water molecules fade away during animation, resulting in the image on the right. Once the water molecules fade, the hydronium ion is more easily viewed. Images from a video developed by Resa Kelly (personal communication) based on a previously developed concept (5).

38

Once stimuli have been selected and choices have been made about the visual characteristics, the stimuli have to be programmed into the computer and presented to the participants.

Development and Presentation of Stimuli In addition to features of the stimulus that influence attention, how the stimulus is presented to participants needs to be considered as part of the design. The equipment being used, the software controlling the stimuli, and how the presentation will affect the participants (eye/participation fatigue) should be considered. Hardware and Connectivity Requirements When developing stimuli for a study, researchers should always keep in mind the demands the stimuli will make on the hardware, peripheries, and internet connection. All updates and manufacturer’s hardware recommendations should be followed to ensure better performance of the eye tracker and the control computer. Even when this advice is followed, there can still be problems with the timing of stimuli and the recording of eye-movement data. Researchers should analyze pilot or trial data for syncing issues and latency effects before data collection begins so that changes in the stimuli can be made to account for these effects. Demands that exceed the computer resources available can impact the timing of stimuli and synchronization of participant input. This shows up in the data as latency and temporal precision issues. Latency is a delay between the actual eye movement and the time when the computer records the signal. Low latency is critical for most studies, and heavy central processing unit (CPU) loads can lead to higher latencies. Temporal precision is a measure of the consistency between successive eye-tracking signals. High precision is required to synchronize eye movements with other collected data (e.g., key presses, mouse clicks, button pushes, audio recordings, heart rates, etc.). A decline in precision is usually due to increased demands on the computer’s processors. While latency and temporal precision are concerns, most high-end eye trackers have compensation mechanisms to prevent or correct for them. More concerning are stimulus-synchronization latencies, which arise from differences between the presentation of stimuli and the recording of eye-movement data. When presentation and recording programs are running on separate machines, these latencies are reduced; however, poor synchronization can arise from differences between the computer’s set time or problems with the ports that are transferring signals between the computers. These are often remedied with hardware upgrades. Of greater concern is a scenario in which the presentation and recording programs are running on the same computer, sharing the same processors and hard drive. High-demand presentations (e.g., web browsing or videos) can lead to poor synchronization and lag between the time that the stimuli are presented and the time the eye position is recorded. 39

These effects can often be seen by inspecting the raw data for inconsistencies in time stamps. They can also be experienced during data collection when there is a perceived slowdown of the video playback or a lag between participant input (e.g., keystroke, mouse click, etc.) and the response of the display. These latencies are common, especially when running presentation software to display stimuli. Any chosen display methods should be tested for latencies before the study starts. Additionally, if a web browser is employed for presenting live webpages, the researcher should be aware of factors that can affect the session, including network speed, slowdown, and connectivity issues. Websites with high CPU demands may also increase the stimulus-synchronization latencies. If connectivity is a concern, local hosting of the website is a viable option to reduce site slowdown and increase responsiveness. Use of Manufacturer’s Eye-Tracking Control Software Eye-tracking hardware is bundled with manufacturer software that can perform a variety of functions, including stimuli development, experiment programming, visualization, and analysis. The quality, functionality, and stimuli format of these bundled software packages varies from manufacturer to manufacturer. Stimuli formats can include instructions, images, movies, websites, screen recordings (real-time use of software applications and operating systems), external video input (e.g., DVD, TV, or game console), questionnaires, and PDF documents. The advantage of using this software is that the interface with the hardware is already established and the learning curve tends to be small; however, issues of latency may still be a problem due to computer hardware demands. Researchers should analyze data for synchronization issues and latency effects. Use of Experimental Programs There are a variety of external programs that can be employed in conjunction with the manufacturer’s eye-tracking control software to display stimuli, including experimental software and nonexperimental programs. Experimental software can be used to design more complex experiments than the software bundled with the tracker. Acting as control software, these packages interface directly with the eye-tracking hardware, orchestrating the collection of eye-movement data. These systems often use a two-computer setup, with one computer controlling the eye tracker and the other computer running the control software. This type of setup reduces stimulus-synchronization latencies by splitting the CPU and hard drive demands of the study across two machines; however, technical issues can arise related to the local configuration of the computer. Experimental software has the added advantage of automating the synchronization of eye-tracking data with other sources. Data can be collected from other hardware dedicated to measuring neuroscience and physiological responses [e.g., lateralized readiness potential (LRP), electroencephalography (EEG), galvanic skin response (GSR), and heart rate monitoring] as well as participant responses (e.g., audio, mouse clicks, key presses, and button pushes). 40

These experimental programs allow the user to build a sequence of trials using a programmed protocol or develop custom scripts. Users can program the experiment by importing images or building the stimuli directly in the software, controlling variables like timing, duration, luminosity, color, spacing, randomization of stimuli, shading, and distortions. While these programs are very powerful and provide a way to synchronize multiple channels of response, the learning curve for the interface and programming processes for experiments is very steep and may slow implementation. Learning communities can be found on the internet to provide support for those who decide to implement experiment software in their labs. There are a variety of open-source and commercial experimental software packages. Each have their own interface requirements; however, many of the major eye-tracker manufacturers provide information, extensions, and support for setting up some of these systems. Researchers should check with their tracker’s manufacturer for details. Some options for experimental software include PsychoPy (45), MATLAB with Psychophysics Toolbox (46), E-Prime (47), SuperLab 5 (48), and PsyScopeX (49). Use of Nonexperimental Programs Another option for the presentation of stimuli is the use of nonexperimental programs like presentation software (e.g., PowerPoint, Keynote, Google Slides, OpenOffice Impress, etc.). The advantage of this approach is that the learning curve is small, so the time to implementation is short; however, the interface between the manufacturer’s eye-tracking control software and these programs are not always reliable. Stimulus-synchronization latencies are an issue because these packages demand a lot of computer resources, and they usually share the same CPU and hard drive as the eye-tracking control software. Consistent timing of stimuli can be problematic. Great care should be taken when selecting this option, and preliminary data should be analyzed for stimulus-synchronization latency effects. Eye/Participant Fatigue In addition to technical requirements, a researcher should also be mindful of the human requirements of participants in the study. Although the tracker can display an infinite number of stimuli, participants will eventually experience fatigue. In human–computer-interaction (HCI) research, it has been shown that there are several physiological eye responses when a participant experiences fatigue, including a decrease in pupil diameter, an increase in blink rate that interferes with the ability of the tracker to locate the eyes, an increase in look-back behavior, a decrease in fixation time and frequency, and an inability to accurately identify targets (50, 51). To prevent fatigue, stimuli need to be designed as short, discrete tasks, and the session during which the stimuli is presented should not take more than 60 min (including breaks for participants) (52, 53). There is one final consideration that a researcher should keep in mind when designing stimuli: the analysis. Design decisions may have unintended 41

consequences for the type of visualization and analysis that can be performed. Being mindful of how the design can influence certain aspects of the analysis can save the researcher time and, in some cases, prevent the loss of data.

Considerations for Analysis and Evaluation A researcher should always keep an eye toward the analysis scheme and the evaluation of data when designing stimuli. Mindfulness of these requirements can save a researcher a lot of time and aggravation when it finally comes time to publish eye-tracking results. These requirements include the timing and duration of tracking events, issues related to the placement and scale of targets and distractors, and tracking accuracy. Timing Important decisions must be made with respect to timing in the display of stimuli that will affect the way in which fixation data are analyzed. The first consideration is how to begin the experiment. The most common way is to have all of the participants look at the same region of the screen at the start of each stimulus. In psychological research, this is accomplished by the use of a fixation cross in the center of the screen. This cross is not a stimulus but is displayed for a short period of time (usually 500–1000 ms) to center the participant’s gaze on the screen. Then either a blank screen or the stimulus is shown to the participant. This ensures that participants start viewing the stimulus from the same initial fixation point, allowing researchers to better compare scan paths between and within participants. Alternatively, some systems allow for a trigger AOI that advances the study once fixations are detected for a preset amount of time (usually also 500–1000 ms). Refer to Chapter 2 for practical considerations related to the use of trigger AOIs and an example. The next consideration to address is the duration of the stimulus. The choice of how long a stimulus is shown to participants depends on the type of stimulus and the research question. For video, the duration of the stimulus is often the duration of the clip; however, for scenes, text, arrays, websites, and simulations, the stimulus duration is not as clear. Selection of the type of viewing is key. Participants can be allowed to exhibit natural viewing behavior, for which the duration is untimed and the stimulus ends with participant input (i.e., mouse click, keystroke, or button press), or timed viewing, in which a preset time is chosen by the researcher and the stimulus ends automatically based on programming. Each choice has implications for the analysis that are discussed subsequently. For timed durations, how long the stimulus is displayed depends on the research question. The authors suggest that literature can serve as a guide. For example, in an experiment that integrated text with geometric shapes, Lee and Wu displayed the description of a problem for up to 60 s and followed the viewing with a true/false comprehension question for a maximum of 30 s (54). If there is no literature to guide the design of the stimuli, there are some general observations that can act as a guide when selecting timing. For the lower bound 42

of duration, Potter found that targets can be briefly identified in 175 ms, but the ability to immediately recall targets needed more than 300 ms of exposure (55). Additional time is necessary if the task involves the delayed recall of a target in a scene. Intraub showed that the minimum viewing time needed for a picture was 5–6 s (56). Additionally, the researcher must decide whether to show the next stimulus immediately, show a blank screen, or collect additional data (e.g., question and answer, think aloud, recall activity, etc.). The choice again depends on the research question. Intraub showed that memory was enhanced with a blank interstimulus interval (ISI) of at least 5 s before the onset of the next stimulus (56); however, performance on immediate recall activities shown following the stimulus was improved without an ISI. Analysis of the eye-movement data will be guided by the duration of the viewing. For natural viewing, the fixation duration and frequency should be reported as direct standardized values to allow for comparison between subjects. For example, for a given participant, the number of fixations on a specific target is divided by the total number of fixations to indicate the percentage of fixations on that target. For a more detailed discussion of analyses, see Chapters 4 and 5 of this book. These standardized measures can then be used to compare fixation frequency between subjects for the same stimulus. Scan path analysis is complicated by the differences in duration. One option is to first analyze fixation patterns or scan paths for a specific duration (e.g., the first 5 s). Timed durations allow for a direct comparison of fixation durations and frequencies because the participants view the stimulus for the same amount of time, and scan paths can be directly compared as a function of this limited time.

AOI Overlap, Scale, and Position Careful considerations must be given during the stimulus design phase to placement of the stimulus on the screen, as well as the placement of AOIs and the precision of data collection. The location of AOIs are often defined by the research question. For example, for a study looking at the pattern of fixations on elements of a chemical equation when participants read, the AOIs are defined by the elements of the chemical equation: reactants, arrow, products, and plus signs (Figure 9).

Figure 9. Definition of rectangular AOIs for a chemical equation. Each element is a separate rectangular AOI, including the plus sign and the arrow. There is no overlap of the AOIs in this design. When designing stimuli for a study, the spacing of the targets should be wide enough to allow for space between the AOIs. Overlap of AOIs should be avoided because it complicates the calculations relating to fixations, making it impossible to attribute fixations to just one of the AOIs in the overlap region. This 43

can confound results. If overlapping AOIs are unavoidable, researchers need to address how to handle fixations in the overlap and be explicit in publication. As discussed earlier in this chapter, the resolution of the tracker is important when determining the size of targets/distractors and words/text. This decision will also affect the size of the AOIs. With a participant seated 60 cm from the screen of the tracker, a 1° error translates into approximately 1 cm (45 pixels). The minimum size of the AOI should be bounded by the limitations of the eye tracker. Holmqvist et al. suggest the AOI should have a border, or margin, of 1° to 1.5° around the feature of interest (27); however, there are no standard current practices. Orquin et al. found that a majority of researchers do not use the resolution of the tracker as a guide for the size of AOIs; instead, they report using “AOIs smaller, the same size, and larger than the object of interest (15)”. Because all trackers have an error associated with the data collection that is directly proportional to the number of pixels on the screen, smaller targets may be harder to track accurately, leading to misclassification of fixations within or outside of the AOIs. Under these conditions, small targets with correspondingly small AOIs will lead to fewer fixations within the AOI and disappointing tracking results. Additionally, if AOIs are too close together, the error of the tracker may lead to the classification of fixations in the wrong AOI. During the design of stimuli, it is recommended that the size of the stimulus be maximized to increase the size of the AOIs and the distance among features of interest to prevent the misclassification of fixations. It is also important to note that the placement of targets can influence the quality of the tracking data. As Holmqvist et al. point out, the precision of tracking data is “lower at the extremities of the screen (27)”. This will lead to an artificial shortening of fixation durations by the detection algorithms. If fixation durations are an important aspect of the analysis, it is advised that stimuli be located in the center of the monitor rather than the periphery.

Tracking Percentage Measures Tracking percentage (also known as sample percentage) is an important tool for assessing the quality of data; however, it is often overlooked and under-reported. Not only can it provide a means of validating the data collection and determining the suitability of a participant for inclusion in a study, but it can also be used to evaluate stimuli. In many ways, it is a measurement of how the participants interact with the stimuli. Tracking percentage is usually reported as the percentage of the stimulus display time during which eye position data were collected. When the tracker cannot detect a participant’s pupil, it interprets this as invalid or missing data, causing a decrease in the tracking percentage. Many things can affect this measure, including suitability of the participant for eye tracking (about 10–20% of the population), nonviewing tasks such as using the keyboard or listening to verbal instructions (57), external distractions, excessive head movements, glasses or contact lenses, and other look-away behaviors. Tracking percentage can be used to evaluate an individual stimulus or as an item in a set of stimuli. During piloting, if a stimulus has consistently low tracking percentages across participants, the suitability of this stimulus should 44

be reevaluated. Likewise, if a stimulus in a set has significantly lower tracking percentages than the other stimuli, this should also be cause for reevaluation. A word of caution is warranted for using tracking percentages as a measure of data quality or evaluating stimuli: For studies with large look-away behaviors, the percentages may be lower than expected, complicating how the researcher validates the quality of the eye-tracking data collected. Look-away events can be planned as part of an experimental design (e.g., a paper-and-pencil recall task between stimuli, a planned break during the tracking session, or an offline survey) or a spontaneous act by the participant (e.g., rubbing eyes, boredom, fatigue). For planned look-away events, researchers need to account for the time that the participants look away from the screen in order to better assess the quality of the tracking data. There are a variety of ways this can be done, including: •

• •

programming the experiment to account for look-away events by allowing for discrete, planned breaks in the stimuli display when tracking is not taking place; monitoring the participants’ viewing behaviors by either using a secondary camera or making direct observations and coding the playback of the participant camera (if available) on the eye tracker for look-away behavior.

In this way, the researcher can make corrections for the timing of the lookaway in the calculation of tracking percentages. Spontaneous look-away behaviors are trickier to account for; however, they should be considered during the development stage of experimentation. This is where piloting the stimuli is very important. If a stimulus consistently prompts spontaneous look-away behavior, the researchers may need to adjust the design (e.g., removing problematic stimuli, reducing duration of stimulus viewing to limit fatigue and boredom, or rewriting participant directions) to limit this behavior. Without efforts to account for or reduce look-away events, the researcher runs the risk of not having a way to determine the quality of tracking data and whether it is robust enough for analysis. In addition to the design of the stimuli, the hardware and software requirements, and the relationship of the stimuli to the analysis, there is one more thing to keep in mind when designing or choosing stimulus: participant characteristics.

Participant Considerations In a good research design, the researcher does not have an opportunity to select subjects with specific attributes and must rely on volunteers; therefore, researchers need to be aware of the limitations of stimuli when working with specific populations. This section will discuss three considerations that may have significant impact on how individuals interact with stimuli: pupil size, reduced color perception, and cultural differences in viewing behaviors. 45

Pupil Size Pupil size is an individual characteristic. Because the pupil is how most trackers follow eye movements, small pupils make it difficult for the tracker to locate the eyes. For stimuli in which the screen has a large degree of white space that increases the brightness of the screen, the pupils contract and a small pupil will get even smaller. This is especially important when working with children. One way to control for this is through design (e.g., dark backgrounds with high contrast text reduce the brightness and restrict contraction of the pupil) (53). If that is not possible, the effects can also be mitigated by designing stimuli with uniform background brightness so that the pupil size will not change drastically from one stimulus to the next. Reduced Color Perception Another participant consideration for stimulus design is the experience of those who have reduced color perception (colorblindness). According to the National Eye Institute, approximately 8% of human males and 0.5% of females have a reduction in their ability to perceive color (58). Stimuli should be designed to be accessible when possible, especially if the study does not screen for reduced color perception. Many programs that can be used to design stimuli, including presentation software (e.g., PowerPoint, Keynote), photo editors (e.g., GIMP, Photoshop), and web browsers (e.g., Chrome), that have extensions or tools can help designers edit stimuli to meet accessibility standards. Visolve is software that automatically transforms colors on a display to make the screen more accessible (59). There are also several resources for checking and correcting stimuli for reduced color perception: •





Vischeck (60) and Daltonize (61) check a standard image file against the three most common forms of reduced color perception (Vischeck) and correct the image accordingly (Daltonize). Color Oracle (62) is a standalone color blindness simulator that overlays the screen with a filter allowing the researcher to see what a participant with reduced color perception would see. Chromatic Vision Simulator (63) is a real-time color deficiency simulator showing any combination of what someone with normal vision, protanopia, deuteranopia, and tritanopia sees.

Cultural Differences in Viewing Behaviors Researchers have shown significant differences in the viewing patterns of Western and non-Western participants that stem from cultural or ethnic differences. Chua et al. found that there were cultural differences between the viewing patterns of American and Chinese participants (64). American participants focused on foreground objects sooner and longer, while Chinese participants focused more on the background and evenly split their fixation times between background and foreground. Similar effects comparing Western and non-Western perceptions have 46

been reported, including room size perception (65), recall of underwater scenes (66), and facial emotion (67). For a review, see Nisbett and Miyamoto (68). Depending on the ethnic diversity of the sample population studied, considerations to foreground and background details of the stimuli should be considered. For example, when designing a scene for a visual search task, if all the targets are in the foreground and distractors are in the background, Western students may have significantly different viewing patterns than non-Western students due to cultural differences rather than differences in the processing of the visual information. One way to control for this is to randomize the placement of targets and distractors.

Lessons Learned in the Lab: Implications for Design Developing or selecting stimuli for a study can be a time-consuming and frustrating task. From the authors’ experience designing and using stimuli, here is some practical advice that may save time and increase the reliability of results. •











Pilot all stimuli under authentic eye-tracking conditions with real participants and analyze the data using the selected protocols. This data can be used to determine problems such as latency issues, participant fatigue, tracking accuracy, look-away behavior, and analysis problems related to stimuli features (e.g., AOI overlap or loss of data at the periphery) Check colored stimuli to ensure that participants with reduced color perception are able to see differences. Ensure that directions and interview scripts do not refer to specific colors, such as blue or red, because colors may not appear blue or red to all participants. Be consistent in font sizes and line shapes. Check stimuli after importation into the experiment for relative sizes and shapes. Eye-tracking control programs can distort or resize images without warning, and this may lead to problems in the data analysis. Use websites that are stored locally whenever possible. Poor data connections may lead to unintentional lags in the display that are difficult to control for in the analysis. Also, be aware that updates to live websites can happen during data collection. These updates may change the appearance of the pages (e.g., organization, coloration, sizing, movement, luminosity, and fonts), basically creating different stimuli during the data collection and making comparisons between participants difficult, if not impossible. Stimulus-synchronization latencies can propagate through an entire session. Be mindful during data collection of any lag in response. It is better to address this during data collection. After collection, synchronization may be impossible to achieve adequately. Be aware of how the design of stimuli affects the conclusions that can be drawn from an eye-tracking study. Consider stimuli and detection issues part of the process and accept that some participant data may not be usable for analysis. 47





Know that stimuli design can lead to limitations of a study’s findings. The key is to acknowledge the limitations and be mindful of avoiding issues in future designs, when possible. Consider the accuracy and resolution of the eye-tracking equipment being used when designing or selecting stimuli.

Careful consideration to design elements of stimuli used in eye-tracking studies is an important part of the overall experimental design. This chapter presented a variety of features to consider when developing or selecting stimuli for a study and should help researchers start to look critically at their stimuli for features that may unintentionally influence attention and eye-movement behavior.

References 1.

Andrienko, G.; Andrienko, N.; Burch, M.; Weiskopf, D. Visual Analytics Methodology for Eye Movement Studies. IEEE Trans Vis Comput Graph. 2012, 18, 2889–2898. 2. Rayner, K. Eye Movements in Reading and Information Processing: 20 Years of Research. Psychol. Bull. 1998, 124, 372–422. 3. Yarbus A. L. Eye Movements and Vision; Plenum Press: New York, 1967; pp 171–211. 4. Bushwell, G. T. How People Look at Pictures; University of Chicago Press: Chicago, IL, 1935. 5. Kelly, R. M.; Hansen, S. J. R. Exploring the Design and Use of Molecular Animations that Conflict for Understanding Chemical Reactions. Quím. Nova 2017, 40, 476–481; Hansen, S. J. R. Unpublished work from NSF Award #DUE-1525475. 6. Lang, C.; Nguyen, T. V.; Katti, H.; Yadati, K.; Kankanhalli, M.; Yan, S. Depth Matters: Influence of Depth Cues on Visual Saliency. In Computer Vision – ECCV 2012. Lecture Notes in Computer Science; Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C., Eds.; Springer: Berlin, Germany, 2012; Vol. 7573, pp 101–115. 7. Huynh-Thu, Q.; Schiatti, L. Examination of 3D Visual Attention in Stereoscopic Video Content. In Human Vision and Electronic Imaging XVI; International Society for Optics and Photonics: Bellingham, WA, 2011; Vol. 7865, p 78650J. 8. Jansen, L.; Onat, S.; König, P. Influence of Disparity on Fixation and Saccades in Free Viewing of Natural Scenes. J. Vision 2009, 9, 29–29. 9. Banks, M. S.; Read, J. C. A.; Allison, R. S.; Watt, S. J. Stereoscopy and the Human Visual System. SMPTE Motion Imaging J. 2012, 121, 24–43. 10. Yang, S.; Sheedy, J. E. Effects of Vergence and Accommodative Responses on Viewer’s Comfort in Viewing 3D Stimuli. In Stereoscopic Displays and Applications XXII; International Society for Optics and Photonics: Bellingham, WA, 2011; Vol. 7863, pp 1–13. 11. Blascheck, T.; Kurzhals, K.; Raschke, M.; Burch, M.; Weiskopf, D.; Ertl, T. State-of-the-Art of Visualization for Eye Tracking Data. In Proceedings 48

12.

13. 14.

15.

16. 17. 18. 19.

20. 21.

22. 23. 24.

25. 26. 27.

28.

of EuroVis – STARs; Borgo, R., Maciejewski, R., Viola, I., Eds. The Eurographics Association: Geneva, Switzerland, 2014; pp 63–82. Rayner, K. The 35th Sir Frederick Bartlett Lecture: Eye Movements and Attention in Reading, Scene Perception, and Visual Search. J. Exp. Psychol. 2009, 62, 1457–1506. Liversedge, S. P.; Findlay, J. M. Saccadic Eye Movements and Cognition. Trends Cognit. Sci. 2000, 4, 6–14. Vadillo, M. A.; Street, C. N. H; Beesley, T.; Shanks, D. R. A Simple Algorithm for the Offline Recalibration of Eye-Tracking Data Through Best-Fitting Linear Transformation. Behav. Res. Methods. 2015, 47, 1365–1376. Orquin, J. L.; Ashby, N. J. S.; Clarke, A. D. F. Areas of Interest as a Signal Detection Problem in Behavioral Eye‐Tracking Research. J. Behav. Dec. Making 2016, 29, 103–115. Koch, C.; Ullman, S. Shifts in Selective Visual Attention: Towards the Underlying Neural Circuitry. Hum. Neurobiol. 1985, 4, 219–227. Matlab; MathWorks, 2017. https://www.mathworks.com/products/ matlab.html (accessed May 24, 2018). Bernhardt-Walther, D. Saliency Toolbox; BWLAB: Ontario, Canada, 2016. Jost, T.; Ouerhani, N.; von Wartburg, R.; Müri, R.; Hügli, H. Assessing the Contribution of Color in Visual Attention. Comput. Vis. Image Underst. 2005, 100, 107–123. Itti, L.; Koch, C. A Saliency-Based Search Mechanism for Overt and Covert Shifts of Visual Attention. Vision Res. 2000, 40, 1489–1506. Ozcelik, E.; Karakus, T.; Kursun, E.; Cagiltay, K. An Eye-Tracking Study of How Color Coding Affects Multimedia Learning. Comput. Educ. 2009, 53, 445–453. Lohse, G. L. Consumer Eye Movement Patterns on Yellow Pages Advertising. J. Advert. 1997, 26, 61–73. Findlay, J. M. Saccade Target Selection During Visual Search. Vision Res. 1997, 37, 617–631. Hamel, S.; Guyader, N.; Pellerin, D.; Houzet, D. Contribution of Color Information in Visual Saliency Model for Videos. In 22nd European Signal Processing Conference (EUSIPCO-2014); Elmoataz A., Lezoray O., Nouboud F., Mammass D., Eds. Springer, Cham, 2014; Vol. 8509, pp 213–221. Abrams, R. A.; Christ, S. E. Motion Onset Captures Attention. Psychol. Sci. 2003, 14, 427–432. Parkhurst, D.; Law, K.; Niebur, E. Modeling the Role of Salience in the Allocation of Overt Visual Attention. Vision Res. 2002, 42, 107–123. Holmqvist, K.; Nyström, M.; Andersson, R.; Dewhurst, R.; Jarodzka, H.; Van de Weijer, J. Eye Tracking: A Comprehensive Guide to Methods and Measures; Oxford University Press: Oxford, U.K., 2011. Bindemann, M. Scene and Screen Center Bias Early Eye Movements in Scene Viewing. Vision Res. 2010, 50, 2577–2587.

49

29. Tatler, B. W. The Central Fixation Bias in Scene Viewing: Selecting an Optimal Viewing Position Independently of Motor Biases and Image Feature Distributions. J. Vision. 2007, 7, 1–17. 30. Tinker, M. Legibility of Print; Iowa State University Press: Ames, IA, 1963. 31. Legge, G. E.; Bigelow, C. A. Does Print Size Matter for Reading? A Review of Findings from Vision Science and Typography. J. Vis. 2011, 11, 1–22. 32. Franken, G.; Podlesek, A.; Možina, K. Eye-Tracking Study of Reading Speed from LCD Displays: Influence of Type Style and Type Size. J. Eye Mov. Res. 2015, 8, 15–18. 33. Banerjee, J.; Majumdar, D.; Pal, M. S.; Majumdar, D. Readability, Subjective Preference and Mental Workload Studies on Young Indian Adults for Selection of Optimum Font Type and Size During Onscreen Reading. Al Ameen J. Med. Sci. 2011, 4, 131–143. 34. Morrison, R. E.; Inhoff, A. W. Visual Factors and Eye Movements in Reading. Visible Language 1981, 15, 129–146. 35. Slattery, T. J.; Rayner, K. The Influence of Text Legibility on Eye Movements during Reading. Appl. Cogn. Psychol. 2010, 24, 1129–1148. 36. Bernard, M.; Mills, M.; Peterson, M.; Storrer, K. A Comparison of Popular Online Fonts: Which Is Best and When. Usability News 2001, 3, 50–60. 37. Dogusoy, B.; Cicek, F.; Cagiltay, K. How Serif and Sans Serif Typefaces Influence Reading on Screen: An Eye Tracking Study. In Design, User Experience, and Usability: Novel User Experiences; Marcus, A., Ed.; Springer International Publishing: Basel, Switzerland, 2016; Vol. 9747, pp 578–586. 38. Beymer, D.; Russell, D.; Orton, P. An Eye Tracking Study of How Font Size and Type Influence Online Reading. In Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction − Volume 2; BCS Learning & Development Ltd: Swindon, U.K., 2007; Vol. 2, pp 15–18. 39. Ling, J.; Van Schaik, P. The Influence of Font Type and Line Length on Visual Search and Information Retrieval in Web Pages. Int. J. Hum. Comput. Stud. 2006, 64, 395–404. 40. Clifton, C., Jr.; Ferreira, F.; Henderson, J. M.; Inhoff, A. W.; Liversedge, S. P.; Reichle, E. D.; Schotter, E. R. Eye Movements in Reading and Information Processing: Keith Rayner’s 40 Year Legacy. J. Mem. Lang. 2016, 86, 1–19. 41. Just, M. A.; Carpenter, P. A. A Theory of Reading: From Eye Fixations to Comprehension. Psychol. Rev. 1980, 87, 329–354. 42. Halford, G. S.; Wilson, W. H.; Phillips, S. Processing Capacity Defined by Relational Complexity: Implications for Comparative, Developmental, and Cognitive Psychology. Behav. Brain Sci. 1998, 21, 803–831. 43. Vlaskamp, B. N. S.; Hooge, I. Th. C. Crowding Degrades Saccadic Search Performance. Vision Res. 2006, 46, 417–425. 44. Havanki, K. A Process Model for the Comprehension of Organic Chemistry Notation. Ph.D. Dissertation, The Catholic University of America, Washington, DC, 2012. 45. PsychoPy: Psychology Software in Python; The University of Nottingham. http://www.psychopy.org/ (accessed May 13, 2018). 50

46. MathWorks; MATLAB and Simulink for Neuroscience. https:// www.mathworks.com/solutions/neuroscience/behavior-psychophysics.html (accessed May 13, 2018). 47. E-Prime; Psychology Software Tools. http://www2.pstnet.com/eprime.cfm (accessed May 13, 2018). 48. SuperLab 5 & X5; Cedrus. https://cedrus.com/superlab/ (accessed May 13, 2018). 49. Bonatti, L. PsyScope X. http://psy.ck.sissa.it/ (accessed May 13, 2018). 50. Bruneau, D.; Sasse, M. A.; McCarthy, J. D. The Eyes Never Lie: The Use of Eye Tracking Data in HCI Research. Proceedings of the CHI 2002, 2, 25–30. 51. Lavine, R. A.; Sibert, J. L.; Mehmet, G.; Dickens, B. Eye-Tracking Measures and Human Performance in a Vigilance Task. Aviat. Space Environ. Med. 2002, 73, 367–372. 52. Goldberg, J. H.; Helfman, J. I. Comparing Information Graphics: A Critical Look at Eye Tracking. In Proceedings of the 3rd BELIV’10 Workshop; ACM: New York, 2010; Vol. 15, pp 71–78. 53. Goldberg, J. H.; Wichansky, A. M. Eye Tracking in Usability Evaluation: A Practitioner’s Guide. In The Mind’s Eyes: Cognitive and Applied Aspects of Eye Movements; Hyönä, J., Radach, R., Deubel, H., Eds.; Elsevier Science: Oxford, U.K., 2003; pp 493–516. 54. Lee, W. K.; Wu, C. J. Eye Movements in Integrating Geometric Text and Figure: Scanpaths and Given-New Effects. Int. J. Sci. Math Educ. 2018, 16, 699–714. 55. Potter, M. C. Meaning in Visual Search. Science 1975, 187, 965–966. 56. Intraub, H. Presentation Rate and the Representation of Briefly Glimpsed Pictures in Memory. J. Exp. Psychol. Hum. Learn. 1980, 6, 1–12. 57. Crowe, E. C.; Narayanan, N. H. Comparing Interfaces Based on What Users Watch and Do. In Proceedings of the 2000 Symposium on Eye Tracking Research & Applications; ACM: New York, 2000; pp 29–36. 58. Facts about Color Blindness. National Eye Institute, National Institutes of Health. https://nei.nih.gov/health/color_blindness/facts_about (accessed May 11, 2018). 59. Visolve: The Assistive Software for People with Color Blindness; Ryobi Systems Solutions. http://www.ryobi-sol.co.jp/visolve/en/ (accessed May 12, 2018). 60. Vischeck. http://www.vischeck.com/vischeck/ (accessed May 12, 2018). 61. Daltonize. http://www.vischeck.com/daltonize/ (accessed May 12, 2018). 62. Jenny, B. Color Oracle: Design for the Color Impaired. http:// colororacle.org/ (accessed May 12, 2018). 63. Chromatic Vision Simulator. http://asada.tukusi.ne.jp/cvsimulator/e/ (accessed May 12, 2018). 64. Chua, H. F.; Boland, J. E.; Nisbett, R. E. Cultural Variation in Eye Movements During Scene Perception. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 12629–12633. 65. Saulton, A.; Bülthoff, H. H.; de la Rosa, S.; Dodds, T. J. Cultural Differences in Room Size Perception. PLoS One 2017, 12. https://doi.org/10.1371/ journal.pone.0176115 (accessed May 24, 2018). 51

66. Masuda, T.; Nisbett, R. E. Attending Holistically Versus Analytically: Comparing the Context Sensitivity of Japanese and Americans. J. Pers. Soc. Psychol. 2001, 81, 922–934. 67. Masuda, T.; Ellsworth, P. C.; Mesquita, B.; Leu, J.; Shigehito, T.; Van de Veerdonk, E. Placing the Face in Context: Cultural Differences in the Perception of Facial Emotion. J. Pers. Soc. Psychol. 2008, 94, 365–381. 68. Nisbett, R. E.; Miyamoto, Y. The Influence of Culture: Holistic Versus Analytic Perception. Trends Cogn. Sci. 2005, 9, 467–473.

52

Chapter 4

Using Fixations To Measure Attention Steven Cullipher*,1 and Jessica R. VandenPlas2 1Science

and Mathematics Department, Massachusetts Maritime Academy, Buzzards Bay, Massachusetts 02532, United States 2Department of Chemistry, Grand Valley State University, Allendale, Michigan 49401, United States *E-mail: [email protected].

Identifying fixations, a point when eye movement stops and attention is focused, is an important first step to analyzing and interpreting the results of an eye-tracking study. This chapter discusses the importance of fixations in eye tracking, as well as methods for identifying fixations and common metrics used in fixational data analysis. The use of fixations in a chemical education research study on interpretation of infrared spectra is discussed to demonstrate the utility of this valuable data source.

Introduction Eye tracking as a technique is broadly interested in collecting data on an individual’s eye movements. At the most basic level, human eye movements are comprised of two components: fixations and saccades. During a fixation, the eye pauses movement, focusing visual attention on a specific location. Saccades are rapid eye movements that occur between fixations and allow the eye to travel from one location to another. During a saccade, vision is essentially suppressed. While fixations generally last approximately 200–300 ms, saccades are much shorter, and generally last only 30–80 ms (1). Many eye-tracking studies focus specifically on analyzing fixations: how often the eye fixates in a particular region, the length of time each fixation lasts, etc. The reason for this is that eye movements in general have been linked to cognitive processing (2), and we can therefore infer something about an individual’s thought process by following their eye movements. This is based on two basic assumptions: the eye-mind assumption and the immediacy assumption © 2018 American Chemical Society

(2). These assumptions propose two things; first, that the eye and the mind are linked, such that the mind processes the object upon which the eye is focused. Second, that this processing happens immediately—that is, the brain begins processing an object as soon as the eye focuses on that object. Fixations are therefore the eye movements of interest to researchers, as these are the moments where the eyes pause to take in new information. If we can identify what the eye is focused on during a fixation, we can determine what object has drawn the individual’s attention in that moment. There are, however, some limitations to relying on fixational data alone when attempting to interpret an individual’s thought process. While fixational data tell us exactly where the individual looks and allow us to infer what information the individual may be processing at that moment in time, these data do not tell us why the individual has chosen to look at this area or what they plan to do with this information. It may be wise to incorporate other research methods into an eye-tracking study, including think-aloud protocols, interview, or even biometric data (refer to Chapters 7 and 8) to triangulate the results of fixational data analysis. This chapter will discuss methods of identifying fixations within eye-tracking data, selecting appropriate fixation metrics and statistical analyses to address research questions, and displaying fixational data appropriately. Finally, this chapter will examine how fixational data analysis was applied to a particular study of eye tracking in chemistry education in order to demonstrate how researchers make experimental decisions about the inclusion and analysis of fixational data in their work.

Identifying Fixations Eye trackers map eye position in real time to x,y coordinates on a 2D scene (generally a computer screen or recording of a 3D environment). These data are recorded simply as x,y coordinates versus a timestamp, and it is up to the researcher to identify particular eye movements, whether fixations or saccades, within these individual gaze points. There are many ways to identify fixations and saccades, as described later. Because fixations are defined as a pause in eye movements to focus attention on a specific area, fixations are most frequently defined based on duration and radius. Gaze points that occur together both spatially and temporally are generally aggregated together as a single fixation. The spatial and temporal bounds used to define fixations differ based on field and are ultimately set at the researcher’s discretion, based on the type of stimuli being used and the research questions being asked. Fixations tend to be shorter in silent reading (225–250 ms) than in scene perception (260–330 ms), for example, and are slightly shorter yet in visual search tasks (180–275 ms) (3). Because of this, researchers may use a lower duration threshold for visual search tasks or reading studies than for scene perception studies. Because studies in chemistry education frequently use complex stimuli incorporating multiple representations, an intermediate threshold value of 100 ms is frequently used (4–8). 54

Some researchers instead choose to identify fixations as essentially the space between saccades. Because saccades are such rapid movements, they are easy to identify based on measuring the acceleration and velocity of the eye, and are therefore easy to filter out of the raw gaze data, leaving only fixations behind. In chemistry education research, for example, several studies have defined a saccade as an eye movement with an acceleration of 9500°/sec and a velocity of over 30°/ sec (2, 9, 10). Fixations are then defined as the time between these saccadic events. Most commercial eye-tracking software will include proprietary fixationdetection algorithms that are some combination of fixation and saccade identification. For example, Tobii Studio’s standard fixation filter is based on the spatial distance between successive gaze points and is set to detect differences in eye position greater than 35 pixels by averaging eye position over five data points (approximately 85 ms of gaze data). Consecutive gaze points within the 35-pixel cutoff region are reduced into a single fixation (11). For more finely detailed stimuli, including smaller text or more complex images, it may be more appropriate to identify fixations based on saccadic detection (velocity/amplitude of eye movements), a smaller pixel radius, or even more specific fixation duration. It is important that researchers disclose the method used to identify fixations in their own studies when sharing results, in order to allow comparisons to be made between researchers.

Areas of Interest Once the raw data have been processed to identify fixations, the researcher must then make sense of these fixations. Of primary interest is the location of each fixation, and how these locations relate to the research questions being asked. On what is the individual focusing in this moment? Where do they look next? Are there objects that are never fixated during the duration of the task? To answer these questions, it is generally useful to identify areas of interest (AOIs) within the scene being analyzed. AOIs are, as the name implies, simply regions within the scene that are determined to be important for analysis. There are several ways to identify AOIs, and a researcher may choose to break a scene into any number of AOIs based on the complexity of the scene and their particular research question. The simplest way to do this is for the researcher to identify AOIs before data collection even begins, based on contentrelevance or position within the scene. Using scene position to identify AOIs is particularly useful if your research questions focus on how the individual moves through the scene regardless of content, or if a scene is particularly crowded. Some researchers, for example, simply break the scene up into a simple 5x5 grid (Figure 1, left), analyzing data based on how the individual moves through this grid. More often, researchers identify AOIs based on their research questions. For example, individual AOIs may be assigned to paragraphs of text, individual sentences, or even individual words, depending on the specificity of one’s research. Figure 1 (right) shows how AOIs may be assigned to text and images separately in order to allow a researcher to compare how users view these two types of content. The researcher may choose to further collect AOIs A, B, D, and 55

E into an AOI group to analyze all textual areas together, or may choose to keep these AOIs independent if they wish to compare viewing behavior among them.

Figure 1. Areas of interest based on scene location (left) and content (right). A second way to identify AOIs is to use user-derived AOIs. This method is appropriate for exploratory studies, when the researcher may not wish to identify areas a priori without data from the population under study. This method may also be useful when comparing the viewing behavior among different populations, for example between experts and novices. In this method, eye-tracking data are collected and aggregated into fixations. The fixations for a particular population can then be overlaid on top of the scene being studied to identify “hotspots”—that is, areas where a large number of fixations are naturally being made by the population. This method is closely related to the making of heat maps, discussed later in this chapter. The benefit of user-derived AOIs is that they may identify areas not originally believed to be important by the researcher. Further, user-derived AOIs may rule out large portions of the scene that do not need to be analyzed by the researcher. Any piece of the scene not included in a user-derived AOI is by definition unimportant to the population under study and may be ignored. Finally, comparing user-derived AOIs between or among populations allows the researcher to identify which portions of the scene are most salient to each group, without imposing their own ideas of importance by identifying AOIs ahead of time and potentially missing interesting patterns. Most commercially available eye-tracking software allows the user to view a particular scene from their study and then add AOIs on top of the scene. These are not viewed by the participants during data collection but simply allow the software to assign data points to a particular AOI during data analysis. These AOIs are generally drawn in by the researcher. It is most common to use rectangular AOIs (as shown in Figure 1), but most software will also allow for circles, polygons, or even irregular hand-drawn shapes as determined by the researcher. If overlapping AOIs are drawn over a particular scene, the researcher may have difficulty assigning a particular fixation to the appropriate AOI, so care should be taken not to overlap these areas. Many researchers also choose to draw their AOIs slightly larger than the actual AOI, in order to allow a buffer region around the border of the area to avoid this same issue. While the maximum size of the AOI is only determined by the size of the scene, the researcher should take 56

care not to make his/her AOIs too small. While the AOIs should be drawn at an appropriate grain size to address the research questions being investigated, it is also true that the accuracy and precision of the eye-tracking system places a lower limit on the practicable size of an AOI. Although a researcher may be interested in how long a subject focuses on each letter in a word, for example, the eye tracker may not be able to reliably distinguish between AOIs sized for an individual letter in 10-point font. In this case, AOIs should either be defined at a larger grain size or the stimulus itself should be blown up to put more space between the AOIs. Finally, the previous advice is most readily applied to static stimuli. For dynamic stimuli, care must be taken to draw AOIs that move as the stimulus moves. For example, in a study of how individuals view an animated chemical process, an AOI drawn around a particular molecule must follow the molecule’s path through the entire animation. Many modern software packages make this easy, allowing a researcher to draw an AOI around the initial and final screen positions of an object. The software can then interpolate a pathway for the object and move the AOI as the stimulus moves. If this is not easily accomplished in a given software package or for a particular stimulus, the researcher may instead choose to break a dynamic stimulus down into individual scenes, perhaps based on the framerate of a movie (one scene per frame, with AOIs drawn over each frame for example). Researchers may instead choose to define AOIs based on the layout of the screen, regardless of how content changes within the stimulus (one scene per panel of a website, for example, regardless of how the user scrolls within the panel). Once AOIs have been identified, software can then be used to assign fixations to particular AOIs by matching the x,y coordinates of the fixation to the onscreen location of each AOI. Once accomplished, the researcher can then analyze the fixational data to identify trends in eye movements.

Fixation Metrics There are many methods of analyzing fixation data, depending on the research questions being addressed, the population being studied, and the stimulus under investigation. Several of the most common methods used in chemistry education research are discussed here. Fixation Duration Fixation duration is a straightforward measure of the total length of time (usually on the order of milliseconds) an individual spends fixating within an AOI. Within a single AOI, fixation duration may be reported as the average length of one fixation within the AOI or as a total of all fixations within the AOI. Average fixation length may be useful to address research questions about the type of visual behavior being exhibited by an individual. For example, a series of very short fixations may indicate search behavior or reading behavior, while a series of longer fixations may indicate more in-depth processing or possibly confusion. Total fixation length, on the other hand, is frequently used as a measure of the relative importance of a particular object. A long total fixation duration (TFD) 57

inside an AOI may indicate that the AOI was highly salient to the individual, or that the AOI was confusing or difficult to process (1, 12). Fixation duration can be reported as raw duration values (in terms of seconds or milliseconds) or may be converted to a percentage value. To do this, the total amount of time an individual spends fixating on any AOI is calculated. The total fixation time within a specific AOI can then be divided by the total fixation time to convert to a percentage. This method is particularly useful when comparing fixation durations between individuals on tasks whose completion time varies widely. Although we may find that some participants simply take longer to complete a task than others, and therefore have higher fixation times overall, the way they split their time between various AOIs may not differ as a percentage of this total time. The researcher should decide whether raw data or percentage data is a more appropriate metric based upon their research question. If total task completion time does not vary widely, or if this total time is of interest to the research question at hand, raw values may be appropriate. If the researcher is looking for patterns in how visual attention is split among various AOIs, on the other hand, using percentages to standardize the data may be more appropriate. Fixation Count Closely related to fixation duration is fixation count, the number of fixations an individual makes within an AOI. This metric ignores how long an individual spends in a particular AOI, but counts the number of fixations that land within the AOI instead. While several studies have shown that fixation count and fixation duration are strongly correlated, including several from chemistry education research (10, 13), it is always worthwhile to check this correlation for a given data set. This can be accomplished by running a simple correlation test (such as a Pearson product-moment correlation) through an appropriate software package in order to test the relationship between the variables of fixation count and fixation duration. A statistically significant correlation (p < 0.05) may prompt a researcher to focus on only one of these two metrics. Although it is reasonable that an individual who makes a large number of fixations inside an AOI also spends a large amount of time on that AOI, this is not always the case. We may find some tasks instead show an inverse correlation, with individuals making a large number of fixations but spending only a small amount of total fixation time in the area, as is most commonly seen with search behavior. For this reason, many researchers calculate both fixation duration and fixation count, but may report only one of these metrics, depending on correlation. Visits (Dwells) Fixation duration and number of fixations, as previously described, are aggregate measures. That is, these measures total up all fixations in a particular AOI, regardless of when these fixations happen. While these metrics are useful to track overall behavior, many research questions are better addressed by tracking individual visits to an AOI. A visit, sometimes referred to as a “dwell” in psychology, begins when the eye first fixates inside an AOI and ends when 58

it moves away. Thus, a visit may consist of any number of individual fixations. Visit count, then, is a measure of how many times the individual returns to an AOI. This may give the researcher insight into how attractive a particular AOI is, how useful the individual finds the AOI, or how confusing a particular AOI might be to the individual. In a problem-solving task in chemistry, for example, it may be useful to track the number of visits to a periodic table, as a way of measuring how many times the individual needs to use this resource for the given problem. Using TFD or fixation count instead would mask this important finding, making visit count a more useful metric. Closely related to visit count is visit duration. This measures the TFD during each visit to a particular AOI. Combining visit count and visit duration may give the researcher an idea of how a particular AOI is being used. For example, an AOI that is only visited once for a brief visit duration is likely not useful to the individual. An AOI that is visited repeatedly for short durations may be a useful resource (such as the periodic table example) or may be a confusing resource the individual struggles to process. Analyzing data by visits is a finer-grained analysis than cumulative fixation duration or count and may give the researcher greater insight into the specific behavior driving an individual’s eye movements. Time to First Fixation Another useful metric to consider when analyzing data is time to first fixation. This data point measures the amount of time that passes from the onset of the experiment until an individual fixates on a specific AOI. This is a valuable measure of visual saliency, which can give the researcher insight into how attractive or important a particular AOI is to an individual. Measuring time to first fixation is particularly useful in design tasks, when the researcher may be interested in determining how to best layout a website, textbook, or simulation, and wishes to identify which components draw the eye first. Scanpaths Although time to first fixation may give the researcher insight into the order in which an individual first approaches AOIs within a stimulus, the pattern of behavior beyond this point is lost. Similarly, visit count/duration and total fixation count (TFC)/duration obscure the sequence of events taking place during the individual’s interaction with the stimulus. While this aggregate data can be incredibly useful to answer questions about the relative importance of various AOIs, or exactly how much visual attention each AOI garners, such measures are unable to answer research questions about the exact course the individual plots through these AOIs. To answer questions about the order in which AOIs are visited or revisited, fixation data are frequently converted into scanpaths. Scanpaths are chronological maps of eye movements, showing the exact order in which the individual fixated on each AOI. Here, duration is removed from consideration, and only the sequence of fixations is considered. Figure 2 shows a scanpath on top of a sample stimulus. If each numbered dot represents a single fixation, we can see that the individual made three fixations on the paragraph 59

enclosed by AOI A, then transitioned to look at the periodic table in AOI C, then moved back to AOI A, then AOI B, the periodic table, and, finally, to AOI E. A scanpath would simply show the order of these fixations as AAACABCE. Further discussion of creating and analyzing scanpaths, as well as suggestions for their use in chemistry education research, can be found in Chapter 5 of this book.

Figure 2. A scanpath overlaid on a stimulus. Transitions Transitions are essentially scanpaths with a length of two. Transitions are eye movements from one AOI to another and are generally presented as a simple count. For example, how many times does an individual move directly from AOI A to AOI C, or from AOI B to AOI C? Figure 2 shows an individual making two such transitions between AOIs A and C. A count of the number of transitions made during a task can be used to make judgments about the interrelatedness of two objects, the design of objects or materials, or a user’s expertise. For example, one might expect to see a large number of transitions between two objects when an individual is making comparisons between them (for example deciding if two molecules are enantiomers), or when an individual has less expertise (a novice may make a greater number of transitions between a question and given diagram as they try to integrate this information).

Statistics Once fixational metrics have been identified, the researcher no doubt wishes to compare user behavior using statistical methods. While Chapter 6 of this book gives more in-depth suggestions on how to use the program R (14) to analyze your data, some suggestions for comparing fixational metrics are given here. Based on the research questions being addressed, fixational data are most often compared between AOIs (within-subjects comparison) or among individuals (between-subjects comparison). For example, it may be worthwhile to compare fixation duration on AOI A versus fixation duration on AOI B to determine where 60

individuals spent the most time viewing. Similarly, one may wish to compare how many visits an expert makes to AOI A to the number of visits novices make to the same AOI. Standard between-subjects comparisons, such as t-tests or analysis of variance (ANOVA) tests are appropriate here. More information on analyzing data using these methods in chemistry education research can be found elsewhere (15, 16).

Displaying Fixational Data Once fixational data have been analyzed, the researcher must determine how to best share this information with others. The most common method of sharing fixational data is the use of heat maps, which provide a visualization of how fixations are spread over a given stimulus. Heat maps can be generated for a single individual over the course of an experiment or can aggregate the data of multiple individuals to highlight the most fixated regions for a particular population. These heat maps can be generated using a number of different metrics, but fixation duration or fixation count are the most common. Similar to electron density plots, heat maps use color to indicate fixation “density,” using a gradient of colors to show areas where more or fewer fixations were made. In Figure 3, the red color indicates areas with a large number of fixations (up to 40 fixations), while the green areas received fewer fixations. Most commercially available eye-tracking software packages include the ability to generate heat maps for a given stimulus. These software packages generally give the researcher discretion to choose not only which metrics to display, but also how these metrics are calculated.

Figure 3. Heat map showing location and frequency of fixations for a stimulus. Areas colored in red indicate regions with a greater number of fixations. (see color insert) Because they convey information visually and are easy to interpret, heat maps are frequently used to share information with a general audience and are frequently included in poster presentations and talks where data visualization is of the utmost importance. However, it is important to remember that heat maps, although generated from quantitative data, are at best qualitative summaries of this data. On their own, heat maps do not give statistical insight into the significance of this data. While they are useful in demonstrating a general data trend, they should not be used without being accompanied by more quantitative measures, 61

including statistical results, tables, and/or graphs. Displaying descriptive data, such as minimum/maximum values, means, and standard deviations, for metrics such as TFD/count, visit duration/count, or time to first fixation is much more useful to readers than heat maps alone. Along with statistical results, these allow the audience to draw more nuanced conclusions about the data than a single summary image.

Application of Fixational Data to Chemistry Education Research In order to best understand how the suggestions in this chapter can be applied to chemistry education research, it may help to consider a specific example. The description of the study that follows will focus specifically on the methodological decisions made by researchers as they relate to the recommendations given in this chapter pertaining to identifying fixations, selecting appropriate AOIs, and employing suitable fixational metrics and analyses. Where appropriate, suggestions will also be given for alternate approaches that could be considered by researchers designing their own studies. The following study was conducted to investigate students’ understanding of the relationship between molecular structures of several volatile hydrocarbons and the infrared (IR) spectra of these compounds. While a great deal of data has been collected on student understanding of structure-properties relationships (SPR), there has been a lack of quantitative methodologies to further probe this understanding. This study attempted to address the following research questions: 1.

2.

What implicit assumptions appear to constrain the reasoning of students at different educational levels as they relate molecular structures to IR spectra? What do eye-gaze patterns reveal about chemistry students’ assumptions about SPR when relating molecular structures to IR spectra?

A technique of methodological triangulation was used, combining a think-aloud protocol with eye tracking. For more on the use of methodological triangulation, refer to Chapter 1. The discussion that follows will focus primarily on the eye tracking featured in the study, naturally shifting the focus to research question 2. Full details of the study have been published elsewhere (4, 17). Methods Participants A total of 26 study participants were recruited from a medium-sized nontraditional university in the northeastern United States. Participants were undergraduates (freshmen through seniors) enrolled in a chemistry course and chemistry graduate students. Table 1 shows a breakdown of participant demographic data. 62

Table 1. Demographic Information about Study Participantsa Educational Level (Abbr)

Course Enrolled In (End-of-Course Grade Distribution)

Male

Female

N

First year (F)

4

5

9

General Chemistry II (4 A, 3 B, 1 C+, 1 C)

Second year (S)

4

3

7

Organic Chemistry Ib (3 A, 1 A-, 1 B, 1 B-, 1 W)

Final year of studies (SR)

3

1

4

Quantum Mechanicsc (1 A, 2 A-, 1 B-)

Graduate student (GS)

3

3

6



Total

13

12

26



a Reproduced with permission from ref (4).

Copyright 2015 American Chemical Society. b Participation in this study occurred after the unit on IR spectroscopy. c One student in final year of studies was not enrolled in any chemistry courses; end-of-course grade reported for this student is the course completed the semester prior (Inorganic Chemistry).

Eye-Tracking Apparatus A Tobii X2-60 remote eye-tracking system was used to monitor participants’ eye movements. The eye tracker was mounted to a 22-in computer monitor with a resolution of 1680 x 1050 pixels. The system has a sampling rate of 60 Hz. Before beginning the eye-tracking session, participants were calibrated using a nine-point manual calibration. Tobii Studio 3.2.3 was used to create the visual stimuli, operate the eye-tracking hardware, and collect audio and eye-tracking data.

Procedure Individuals each participated in one 30 to 60-minute data collection session. Each session consisted of eye tracking with a think-aloud interview. Before beginning the eye-tracking session, participants were provided with an explanation of the think-aloud protocol. Then, while monitoring their eye movements, participants were shown a question, two molecular structures, and the respective IR spectra for the molecules shown. Figure 4 shows the complete visual stimulus discussed here. The question was intentionally open-ended and participants were not time-restricted. The interviewer advanced to the next stimulus when the participants indicated they had provided as complete an answer as they thought possible. The interviewer did not speak during the eye-tracking session.

63

Figure 4. Visual stimulus shown to participants. Reproduced with permission from ref (4). Copyright 2015 American Chemical Society.

Identifying Fixations and AOIs A fixation threshold of 100 ms was used for this study based on previously published literature (5, 7). As described earlier in this chapter, once fixations are identified, they must be mapped to specific onscreen features. AOIs were assigned to complement the research questions and are shown in Figure 5. There were five groups of AOIs, including: the question (F), the molecular structures (A, G, and H), the spectra axes (B, C, D, and E), the spectral peaks (U, W, O, S, K, I, and J), and the spectral baselines (R, P, N, T, L, and M). While AOIs should be a consideration for researchers during the stimulus selection and design phase (refer to Chapter 3 for a more detailed discussion of stimulus design), sometimes it may be necessary to revise AOIs during the data analysis process. After collecting data, the researcher may find they need a finergrained analysis to answer their research question. For example, a researcher working on this example research study may have originally wanted each spectrum to be one AOI. Then, after reviewing the data, they might have decided that it was important to consider how the participants viewed the spectral peaks. In the end, the most important consideration is whether the AOIs are sufficient to answer the posed research question(s).

64

Figure 5. Researcher-defined AOIs for the visual stimulus shown in Figure 4. Reproduced with permission from ref (4). Copyright 2015 American Chemical Society.

Qualitative Data Audio recordings of the participants’ think-aloud sessions were first transcribed to text. A qualitative coding scheme was used to highlight primary thinking patterns and explanations of features of the IR spectra for each participant. Similar codes were grouped until only two groupings remained. Coders came to 100% agreement on coding of the transcripts. A more thorough explanation of the coding process can be found elsewhere (17).

Fixational Metrics Research question 2 was intentionally broad as the researchers were looking to identify patterns of viewing behavior between and among participants at various educational levels in chemistry. As such, all of the most common fixational metrics discussed in this chapter were analyzed. To begin with, TFD and TFC inside AOIs were examined. As previously discussed, it is important to look at these measures independently prior to beginning statistical analysis, in order to determine if one or both measures should be used. Because participants were not time-restricted, fixation count and fixation duration were converted to percentages of the total for analysis. This is frequently done as a way of “normalizing” the data as it allows the researcher to account for individuals’ natural variation in task completion time. Percentage of TFD was determined by dividing the TFD for each AOI group (in seconds) by the TFD for all AOIs combined. Percentage of TFC was calculated in the same manner.

65

Scanpath analysis was also performed in order to uncover further patterns in gaze behavior. Analysis was carried out using eyePatterns, an open-source software tool (18). Sequences were collapsed such that a participant’s multiple successive fixations within a single AOI were condensed into a single gaze or dwell for the purposes of analysis. Using AOIs C, A, and T as an example, a sequence such as CCCAAAATT would be condensed to CAT. This is a common technique among eye-tracking researchers because it eliminates the variable associated with fixation duration and also allows the researcher to focus on transitions between AOIs. For this study, patterns in fixation sequences three AOIs long were identified for analysis based on literature indicating that this is the maximum number of gazes that an individual can hold in working memory (19, 20).

Results and Analysis Qualitative Data From the coding of think-aloud transcripts, three implicit assumptions made by students during problem-solving emerged: Atoms-as-Components, Bonds-as-Components, and Bonding. Figure 6 shows a distribution of these implicit assumptions by education level.

Figure 6. Distribution of implicit assumptions by education level.

Atoms-as-Components Participants exhibiting Atoms-as-Components thinking approached the molecules as agglomerations of atomic components, each giving rise to spectral properties in an additive fashion. These participants gave explanations for the peaks in the spectra based on the presence or absence of atoms, or atom types, in the molecules. Participants focused their explanations on atoms, specifically quantity, type, atomic mass, and electronegativity. This type of thinking was present in 32% (n = 8) of participants. Most of these (n = 6) were first-year students. 66

Bonds-as-Components Participants exhibiting Bonds-as-Components thinking approached the molecules as collections of components without consideration for how those components interacted. These students discussed the molecular components as two or more atoms connected together, which they referred to as bonds (e.g., “the C—C bond”) or functional groups (e.g., “the chlorine group, —Cl”), each with independent properties. This assumption was present among 28% (n = 7) of participants. Some students who relied on this assumption made direct associations between bonds within the molecule and specific peaks in the spectrum.

Bonding Participants exhibiting the Bonding assumption cued on types of vibrational motion, rotation, symmetry and dipole induction, and energy absorption when explaining the IR spectra in relationship to the molecules. These students most often spoke about interactions between energy and matter and made specific reference to relationships between intramolecular forces and the respective IR spectra. This type of thinking was present in 40% (n = 10) of participants. Unsurprisingly, this advanced level of conceptual sophistication was most prevalent among seniors and graduate students.

Fixation Data Both percent TFD, shown in Figure 7, and percent TFC, shown in Figure 8, tell a similar story. First, AOIs in the Molecules group were the most viewed. According to the eye-mind assumption (refer to Chapter 1), the increased fixation duration and fixation count within the Molecules group indicates that participants spent more time processing information in those AOIs. Without further information, though, it would be difficult to determine whether the increased processing was because the information contained in the molecular structures took more time for participants to understand, or possibly for some other reason (e.g., greater complexity of the AOIs, closer examination of the AOI content, etc.). These data also show that freshman-level participants had the longest duration and highest counts on the question. This could be due to students attempting to clarify the question, seek out further information to help them answer the question, or other reasons. Tang and Pienta saw similar results, noting that students who were unsuccessful at solving a gas law problem had a higher occurrence of fixations on the question (5).

67

Figure 7. Percent TFD for each AOI group by educational level.

Figure 8. Percent TFC for each AOI group by educational level. Second, participants at higher educational levels spent more time viewing AOIs highlighting parts of the spectra (i.e., Peaks and Baseline) than the freshman- and sophomore-level participants. This could indicate that participants at lower educational levels saw less relevant information contained in the spectra, or that they were unable to interpret the information the spectra contained, and thus ignored it almost entirely. Analysis of the collapsed fixation sequence data revealed a total of 1032 unique groupings three AOIs in length. Sequences that only appeared once among the 26 participants were eliminated, reasoning that such sequences were likely indicative of searching behavior or transition states from one type of thinking to another (e.g., from “what is similar about these structures” to “what is similar about these spectra”). After eliminating these sequences from the data set, 299 three-AOI sequences remained. These sequences were then grouped based on the types of AOIs they included. Table 2 shows the identified patterns along with descriptions and examples of each. 68

Table 2. Patterns Resulting from Analysis of 3-AOI Sequencesa Description

Examplesb

1

Only look at molecular features

AGH, GHG

2

Return to the question

AFA, FGH

3

Look at molecular features and spectra

HGI, JHG

4

Look at molecule, spectrum, and axis

AUB, JEG

5

Look at a spectrum and an axis

QCQ, IDJ

6

Look only within spectra

SKI, UIJ

7

Look only at spectra axes

BCB, DEC

8

Patterns which indicate random searching

BCA, ASC

Pattern

a Reproduced with permission from ref (4).

Copyright 2015 American Chemical Society.

b

Refer to Figure 5 for AOI labels.

Correlation between Eye Tracking and Qualitative Findings Figure 9 shows the percent occurrence of each sequence pattern type sorted by assumption. Percent occurrence is indicative of the average frequency of the pattern type for each participant, not the number of participants exhibiting the pattern.

Figure 9. Percent occurrence (frequency for each participant) of fixation sequence patterns by assumption. Participants using the Atoms-as-Components assumption had the highest occurrences of sequence patterns 1, 2, and 4. In pattern 1, the participants only viewed AOIs of the molecules within the three-AOI sequence. This is indicative of individuals making comparisons among the structural and compositional features of the two molecules, presumably in order to identify differences that would allow them to formulate a reasonable explanation for the observed spectral features. The prevalence of pattern 2 (return to the question) among this most novice level of thinking further supports the percent TFD and percent TFC data, in which the students at the lowest educational level spent the most time looking at the question. 69

Participants using the Bonds-as-Components assumption had the highest occurrence of pattern 3, in which the viewer’s gaze shifted between the molecular structures and the spectra. This seems to indicate that the students relying on this assumption (1) understood that IR spectra are related to molecular structures and (2) tried to relate specific peaks to particular structural features. For second-year students in an organic chemistry course, the most prevalent AOI sequence of pattern 3 occurred for the two large peaks of the spectrum of Compound 2 (AOIs I and J shown in Figure 5), indicating these students may think molecular differences show up as the largest peaks in the spectrum. Participants using the Bonding assumption showed the highest occurrence of patterns 5 (spectrum and axis) and 6 (within a spectrum). The high frequency of pattern 5 could indicate participants were identifying whether peaks occurred at typical wavenumber regions, which they had likely committed to memory. However, the prevalence of this pattern among participants with greater conceptual sophistication corroborates the think-aloud data in which some participants reasoned about why certain vibrational excitations required greater or lesser energy. The high frequency of pattern 6 also reflects a more advanced tactic in interpreting IR spectra in which the participants were trying to interpret spectral peaks relative to each other.

Conclusions and Limitations The research study presented demonstrates that use of an eye-tracking methodology can provide a robust set of data that goes beyond what can be learned from think-aloud interviews alone. In particular, the study shows that eye-movement behaviors differ depending on the implicit assumptions made by participants with regard to how spectroscopic responses are related to molecular structures. While the data are suggestive of trends, the small sample size of the study was not sufficient to obtain statistical significance when making comparisons between educational levels or between implicit assumptions. Additionally, some of the AOIs from the eye-tracking analysis were not ideally spaced to avoid overlap. In the case of some of the adjacent AOIs (e.g., B and U in Figure 5), there was the possibility that a participant may have been looking at one of the AOIs but was recorded as looking at the other. It is important when presenting or reading the results of any eye-tracking study to consider the types of limitations before making claims about the generalizability of the study. At the same time, these results are able to demonstrate how attention is allocated by these participants when completing this specific task, and serve to illustrate the value in using multiple methods to both analyze and present the results of any eye-tracking study.

70

Conclusions It is clear that fixational data are a robust source of data for eye-tracking studies. As a researcher one must be careful to define fixations within the context of a particular study and identify AOIs that will allow the research questions to be addressed. Most importantly, there are a variety of fixational data metrics that can be employed to address these research questions, and care must be taken to select appropriate metrics. Fixation duration and fixation count can give us an idea of how individuals split their time, but visit duration/count, time to first fixation, and scanpath analysis may also be used to give a more chronological view of the same data. Finally, care must be taken when choosing how to share the results of eye-tracking data analysis. Employing both qualitative and quantitative visual representations gives the audience a more accurate view of the results. It is important that each researcher consider the best fixational analysis methods for their own study and that the specific choices of how fixations are identified or assigned to AOIs, as well as which metrics are reported and how these are analyzed and displayed, must be dictated by the research questions being asked. The chapters that follow in this book give excellent examples of how essential fixations are to the analysis of eye-tracking data, and the many methods chemistry education researchers may use to incorporate them into their own studies.

References 1.

2. 3. 4. 5. 6.

7.

8.

9.

Holmqvist, K.; Nyström, M.; Andersson, R.; Dewhurst, R.; Jarodzka, H.; Van de Weijer, J. Eye Tracking: A Comprehensive Guide to Methods and Measures; Oxford University Press: Oxford, 2011. Just, M. A.; Carpenter, P. A. A Theory of Reading: From Eye Fixations to Comprehension. Psychol. Rev. 1980, 87, 329–354. Rayner, K. Eye Movements and Attention in Reading, Scene Perception, and Visual Search. Q. J. Exp. Psychol. 2009, 62, 1457–1506. Cullipher, S.; Sevian, H. Atoms versus Bonds: How Students Look at Spectra. J. Chem. Educ. 2015, 92, 1996–2005. Tang, H.; Pienta, N. Eye-Tracking Study of Complexity in Gas Law Problems. J. Chem. Educ. 2012, 89, 988–994. Tang, H.; Abraham, M. R. Effect of Computer Simulations at the Particulate and Macroscopic Levels on Students’ Understanding of the Particulate Nature of Matter. J. Chem. Educ. 2016, 93, 31–38. Tang, H.; Topczewski, J. J.; Topczewski, A. M.; Pienta, N. J. Permutation Test for Groups of Scanpaths Using Normalized Levenshtein Distances and Application in NMR Questions. In Proceedings of the Symposium on Eye Tracking Research and Applications; ACM: Santa Barbara, CA, 2012; pp 169–172. Topczewski, J. J.; Topczewski, A. M.; Tang, H.; Kendhammer, L. K.; Pienta, N. J. NMR Spectra through the Eyes of a Student: Eye Tracking Applied to NMR Items. J. Chem. Educ. 2016, 94, 29–37. Williamson, V. M.; Hegarty, M.; Deslongchamps, G.; Williamson, K. C.; Shultz, M. J. Identifying Student Use of Ball-and-Stick Images versus 71

10.

11. 12.

13.

14. 15.

16.

17.

18.

19.

20.

Electrostatic Potential Map Images via Eye Tracking. J. Chem. Educ. 2013, 90, 159–164. Stieff, M.; Hegarty, M.; Deslongchamps, G. Identifying Representational Competence With Multi-Representational Displays. Cogn. Instr. 2011, 29, 123–145. Tobii AB. Tobii Studio User’s Manual. Version 3.4.5; 2016; pp 1–170. Goldberg, J. H.; Kotval, X. P. Computer Interface Evaluation Using Eye Movements: Methods and Constructs. Int. J. Ind. Ergon. 1999, 24, 631–645. Tang, H.; Day, E.; Kendhammer, L.; Moore, J.; Brown, S.; Pienta, N. J. Eye Movement Patterns in Solving Science Ordering Problems. J. Eye Mov. Res. 2016, 9, 1–13. Team, R. C. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria. Pentecost, T. C. Introduction to the Use of Analysis of Variance in Chemistry Education Research. In Tools of Chemistry Education Research; Bunce, D. M., Cole, R. S., Eds.; ACS Symposiumm Series 1166; American Chemical Society: Washington, DC, 2014; pp 99–114. Lewis, S. E. An Introduction to Nonparametric Statistics in Chemistry Education Research. Tools of Chemistry Education Research; Bunce, D. M., Cole, R. S., Eds.; ACS Symposiumm Series 1166; American Chemical Society: Washington, DC, 2014; Vol. 1166, pp 115–133. Cullipher, S. G. Research for the Advancement of Green Chemistry Practice: Studies in Atmospheric and Educational Chemistry. Ph.D. Thesis, University of Massachusetts, Boston, MA, 2015. West, J. M.; Haake, A. R.; Rozanski, E. P.; Karn, K. S. eyePatterns: Software for Identifying Patterns and Similarities across Fixation. In ETRA; ACM: New York, 2006; pp 149–154. Havanki, K. L. A Process Model for the Comprehension of Organic Chemistry Notation. Ph.D. Thesis, The Catholic University of America, Washington, DC, 2012. Underwood, G.; Chapman, P.; Brocklehurst, N.; Underwood, J.; Crundall, D. Visual Attention While Driving: Sequences of Eye Fixations Made by Experienced and Novice Drivers. Ergonomics 2003, 46, 629–646.

72

Chapter 5

Sequence Analysis: Use of Scanpath Patterns for Analysis of Students’ Problem-Solving Strategies Elizabeth L. Day,*,1 Hui Tang,1 Lisa K. Kendhammer,2 and Norbert J. Pienta1 1Department

of Chemistry, University of Georgia, Athens, Georgia 30602, United States 2Department of Chemistry and Biochemistry, California State University, Chico, California 95929, United States *E-mail: [email protected].

This chapter aims to introduce sequence analysis of scanpaths, which are spatial and temporal sequences of fixations and saccades, often defined as fixations on a series of areas of interest (AOIs). This chapter will provide an overview of sequence analysis methods and visualization techniques from eye-tracking literature, describe how sub-sequences can be quantified as a measure of cognitive load, and illustrate how scanpath analysis can be used in support of other eye-tracking metrics. The intended audience of this chapter is eye-tracking researchers in discipline-based education (DBER) fields who are interested in elucidating patterns from participants’ eye movements.

© 2018 American Chemical Society

Introduction As educators and administrators implement active, multimodal instructional methods, there is a growing need for discipline-based education research (DBER) to evaluate those educational interventions and improve upon them. This research produces sophisticated experimental protocols that generate a wealth of data that has been previously under-utilized. Traditionally, the analysis of quantitative measures has been limited to tests of significance; DBER research under-utilizes methods to analyze the sequence of events, such as those generated by online activities and emerging patterns of interest from eye-tracking studies. Alternatively, qualitative data with a temporal dimension—such as observation protocols like the Classroom Observation Protocol for Undergraduate STEM (COPUS) (1)—lack a facile method to visualize and compare patterns. While the theory and examples in this chapter are specific to eye-tracking studies, some of these avenues of pattern analysis could be applied to other contexts, such as searching for patterns in log files of online activities or in qualitative data sets. The analysis of sequences has been applied in eye-tracking studies in literacy, psychology, and usability research as a means of supporting fixational or qualitative measures, such as think-aloud interview protocols (2, 3). The example study contained at the end of this chapter investigates the effect of multimedia instructional materials on the cognitive load of participants, and sequence analysis is used to explain the significance of the statistical analysis. In the field of chemistry education research (CER), cognitive load of educational tools has been studied through eye tracking, heart rate monitors, and pattern analysis of online tools (4–6). Likewise, classification of novice and expert-like cognition has also been a popular subject for eye-tracking research (7–9). This chapter aims to add to these tools through a discussion of sequence analysis. Although the provided example study is eye-tracking data, this treatment of data could be applied to other areas of CER.

Theoretical Framework Cognitive Load Theory Cognitive load theory (CLT) builds on the foundation of information processing theory (10–12). CLT assumes that humans possess a working memory capacity that is limited to holding 5−9 elements (or “bits”) of novel information and can only actively process 2−4 novel, interacting elements simultaneously (13, 14). The working memory architecture houses the conscious processing of novel sensory information as well as information retrieved from long-term memory. While novel information is limited in retention as well as the number of elements that can be actively processed, there is no known limitation of information from the schemata of long-term memory (15). Schemata are cognitive constructs that treat multiple bits of information as a single chunk to reduce the strain (or cognitive load) on the working memory (14). These retrieval structures vary in degree of complexity and automation with experience and are specific to certain subjects. Experts are assumed to have 74

expanded processing capacity because they have more developed schemata which allow for fast encoding of information and efficient retrieval from long-term memory (3). With this support from the long-term memory store, the working memory can expand processing capability and minimize overload (14). The constructs of CLT have been implemented in curriculum and multimedia design to ensure that the learner is able to efficiently access previously held information for application to new situations (commonly referred to as “transfer”) (16). Intrinsic, extraneous, and germane cognitive loads are additive, and the high total load interferes with learning. Intrinsic load is inherent to any novel task; therefore, any working memory capacity available after processing intrinsic load is dedicated to extraneous or germane load processing. Cognitive Theory of Multimedia Learning One implication of CLT is that the limited working memory structure must be a major consideration when designing instructional activities. Mayer’s cognitive theory of multimedia learning also relies on information processing theory to inform the design of educational materials (17). While the visuospatial capacity and phonological processing of the working memory are separate, subsidiary systems, their processing capabilities are interdependent in an additive fashion. Processing information through both centers (via multiple modalities of input) can reduce extraneous load; this is considered more effective than relying on one processing center alone (18).

Measurements of Cognitive Load Cognitive load is determined through a measurement of a participant’s mental load, mental effort, and performance outcome (14). Mental load is specific to the participant and the task, and it can be estimated based on the current knowledge of the participant’s experience. Mental effort refers to the cognitive resources available for the load imposed by the task. Performance—the most commonly reported measure—is the participant’s outcome on an exam, the number of errors committed, or the time spent on the task (14). The most common measurements gauge a participant’s mental effort and performance. These techniques fall under three broad categories: subjective, task-based, and physiological (14, 19). Subjective, or rating-scale, techniques require the participants to reflect on and report the mental effort necessary for a task. This measurement is typically accomplished through a questionnaire with one or more semantic differential scales for groups of highly correlated variables (such as mental effort, fatigue, or frustration) (14). Although subjective, this self-reported data has been demonstrated to be valid and reliable and is less burdensome than other measurement techniques (14). However, this data is collected post-hoc and lacks the ability to measure cognitive load during the task (19). Task- or performance-based evaluations of cognitive load fall into primary or secondary task measurements. While primary task measurements evaluate the 75

performance of a given task (5, 20), evaluating the performance of a secondary task (such as sustained attention to or detection of a sensory signal) can indicate the cognitive load of a primary task (14). These measurements are reported in reaction time (to the sensory signal), accuracy, or error rate in performance. Although secondary task performance is a sensitive and reliable method, it is reported less often, likely due to the considerable limitation on the complexity of the primary task (14). Physiological techniques have been prevalent in education and chemistry education research (19). These measurements assume that cognitive load produces a physiological response that can be measured in real time (14). Heart rate variability can be monitored with heart monitors (19). Brain activity can be monitored with functional magnetic resonance imaging (fMRI) or electroencephalography (EEG) (21–23). Commonly, eye-movement metrics have been used as measures of cognitive load (3–6, 24–27). Fixation duration and fixation count are commonly reported metrics that can indicate cognitive load. Fixation duration/length and fixation count are correlated to the difficulty and complexity of the visual stimuli and can reflect cognitive load (23, 28); duration or higher count implies more complex material under visual perusal (29). Similarly, experts’ shorter fixation durations have confirmed the assumption that experts yield faster and more efficient encoding and retrieval of information than novices (3). A meta-analysis of eye-tracking studies also confirmed the theory that expert-like schemata optimize information processing; experts had shorter fixations on task-irrelevant information than novices, and experts spent longer fixating on information that was relevant to the task (as compared to novices) (3). A scanpath (or gaze sequence) is a pattern of fixations and saccades constructed from the path of eye movements over a certain timespan (2, 4). Repeated visits to an area of interest (AOI) in these sequences may indicate features that the viewer deems important or interesting (30). In terms of length, experts tend to have more efficient/focused scanpaths than novices (31). Shorter subsequences called transitions can be defined and used to measure cognitive load. In the design of educational materials, the number of transitions between text and picture is related to complexity of stimulus (2).

Sequence Analysis Introduction Instead of comparisons of raw fixation data, a wealth of information can be gained from the investigation of scanpath patterns. Most of the methods of sequence analysis described below require mapping AOIs onto the stimulus, whether in a grid overlay, via semantic designation, or based on the post-experiment density of user fixations. For best results, it is recommended that AOI assignments be built into the experimental hypotheses (2), ideally before data collection begins. Following a brief introduction to the seminal theory of scanpaths, this section will briefly describe common visualization models for qualitative scanpath comparison. The final portion will group the quantitative methods of sequence analyses via (1) similarity/dissimilarity measures, (2) 76

transition probability measures, and (3) pattern detection and identification methods.

Scanpath Theory Scanpaths are sequences of eye movements (32), typically defined through fixations and saccades. First proposed in the early 1970s, scanpaths have been theorized to be unique patterns of eye movements to an individual on repeated stimulus exposure (33, 34). Upon initial exposure to a stimulus, an individual develops a sequence of eye movements as they process the stimulus, and theoretically this pattern of movements is reproduced upon repeated exposure to the stimulus (34). Compared to other eye-movement metrics, scanpaths reveal the spatial and sequential allocation of attention to a stimulus. The complexity of this data has presented a challenge in how to quantitatively compare patterns amongst participants and between groups. Ideally, an effective method for scanpath comparison would retain (1) “ordinal sequence of fixations and saccades” (32), (2) the spatial coordinates to avoid the uncertainty associated with defining AOIs, (3) the proportions and shape of the scanpath, and (4) the fixation durations. Currently, no method is able to retain all these features and utilize multiple scanpaths in its analysis (35).

Visualizations From the fixation data, eye-movement sequences can be modeled as (1) dwell maps, (2) sequence charts, (3) scarf plots, or (4) gaze plots. Dwell time—sometimes called gaze duration or fixation duration (4)—can be mapped onto the stimulus as a heat map, or the values for dwell time on a stimulus with a superimposed grid can demonstrate the participant’s visual interest (2). Sequence charts are graphs of proportion-over-time that demonstrate dwells of a single participant over each AOI as a function of time (2), as seen in Figure 1. Similarly, scarf plots represent the dwell time as a function of time for multiple participants (2); these plots condense all of the fixation durations on AOIs into a single bar, and each bar in the plot represents a participant in the study. Traditionally, scanpaths are visualized as gaze plots on top of the stimulus, as seen in Figure 2, in which the lines represent the saccades between the dots that represent fixations. The fixation points are numbered to reflect their position in the scanpath, and the size of the fixation point corresponds to fixation duration. These plots are reported as illustrations of participants’ visual perusal, or they are used within an experiment as data quality checks, for preliminary impressions of the data, or to cue retrospective think-aloud protocols (2). This plot is a common feature of eye-tracking software. Blascheck and colleagues provide a thorough review of visualizations for scanpaths, with illustrative examples (36).

77

Figure 1. An example of a sequence chart of a single participant in which the duration of each fixation (in milliseconds) is represented by the width of each bar. This plot was generated in R using a modified code for a Gantt plot. (see color insert)

Figure 2. An example of the gaze plots of several participants overlaid on a stimulus with AOIs defined. This type of AOI assignment is termed “semantic designation” in eye-tracking literature. (see color insert) 78

Although these visual representations have the advantage of retaining the spatial and duration information of the fixations, the data can be difficult to meaningfully interpret. As the number of participants increases, projecting multiple participants’ scanpaths onto the stimulus as a gaze plot yields a cluttered qualitative depiction of participant behavior. Nonvisual representations of scanpaths, such as AOI strings, can be analyzed to elucidate patterns not readily apparent in the visualization. The AOI strings generated in the string-edit methods described below are commonly reported representations of a small number of participant scanpaths. Analysis of scanpaths via treatment as AOI strings utilizes fixations or dwells to generate temporal sequences. Dwells are defined as all the fixations and saccades within the AOI from initial entry to exit. Treatment of scanpaths as character strings of AOIs facilitates analysis of the scanpath as a vector of zero or more characters (2) (2, 37). For instance, a sequence of fixations to AOIs can be fully represented as QAFFDDQA, in which each letter—typically written in a typewriter font (2) but shown in bold and italics in this chapter—represents a fixation to that AOI. This fictitious example is demonstrated in Figure 3.

Figure 3. Illustration of the example scanpath “QAFFDDQA”. Some methods remove the duplicate characters from the string, collapsing the fixations into a sequence of dwells (i.e., QAFDQA). These sequences can be further reduced to only include the initial visits (i.e., QAFD). Collapsed (i.e., QAFDQA) and full character (i.e., QAFFDDQA) strings are the result of implementing an AOI-based scanpath definition; this definition of scanpath is dependent on the grain size of the AOI as well as whether the AOIs are gridded or semantic. The assumption of the semantic AOI assignment is that everything within that AOI is equivalent in terms of attentional importance; 79

therefore, retaining the dwells within the AOI (using full strings) does not add any additional information, but it could impact similarity measures in a way that may not be meaningful. Collapsed strings remove the dwells within the AOI, but this further reduces information about the fixation duration in the AOI. The choice to use collapsed or full strings depends on how important fixation duration is to the hypothesis (although using full strings does not fully express the temporal dimension of a scanpath) or on the homogeneity within each AOI. These issues are further discussed in the similarity measures section below. There are several R packages (which are typically bolded font) capable of basic operations on character strings; for example, stringr, stringb, and stringi can accomplish character count, conversions, detection of patterns, extractions/ replacements, and splitting or concatenation of strings. The R packages utils, stringdist, scanpath, and GrpString, the analysis tool eyePatterns, and MATLAB tool ScanMatch have functions that compare strings using various calculations of distance (30, 37, 38), although the eye-tracking literature contains numerous other methodologies. Furthermore, some scanpath analysis methods note that treatment as character strings omits information about fixation duration, saccade amplitude, and the general shape of a scanpath; these non-AOI-based methods will be discussed further in the similarity measures section.

Similarity (Dissimilarity) Measures Of the similarity (or dissimilarity) measures, the majority work in a pairwise fashion, iteratively comparing two scanpaths at a time. Most of the methods discussed here will be variations of string-based methods that depend on AOI assignment, but geometric methods of comparing scanpaths (which do not use AOIs) will also be briefly discussed. The decision to use one algorithm over another is typically pragmatic. While more sensitive methods are available, the cost of the tool or software as well as the level of expertise needed may preclude some methods more sophisticated than the Levenshtein distance. The most common measures are dissimilarity calculations using a string-edit method (also known as the Levenshtein distance); this method computes the minimum number of editing operations (insertions, deletions, or substitutions) necessary to transform one string into another (2, 35). Fewer transformations indicate strings with more similarity and would yield a smaller Levenshtein distance (35). Because the strings in an analysis are frequently of different lengths, the normalized string edit distance (d̂) is calculated from the distance (d) and the maximum string length, as seen in equation 1 (2). This normalized distance is often expressed as a percent. When this string-edit algorithm is applied iteratively to a group of scanpaths, it yields a matrix of distances between scanpaths of the group.

80

There are three disadvantages of a Levenshtein distance type dissimilarity calculation. One is that the strings do not account for fixation duration on an AOI nor the spatial positions of the AOIs on the stimulus (35). The duration aspect could be adjusted by retaining repetitive AOIs in the string (rather than collapsing to sequence of dwells) and assigning temporal bins of an appropriate duration (say 50 milliseconds per character); this expression of fixation duration introduces another level of subjectivity to the analysis that could make results difficult to generalize (35).

Figure 4. A stimulus with an overlaid grid of AOIs, labeled A−X. The black, numbered dots represent fixations. Figure 4 illustrates the spatial issue that arises from the grouping of fixations by AOI (32). As seen for AOI K, fixations 5−7 are on different regions of the AOI but would be assigned the same character in the Levenshtein method. Similarly, on AOI I, fixations 3 and 4 will both be labeled with the same character whereas fixation 2 on AOI C falls pixels outside of the boundary of AOI I and is therefore assigned a different character (or omitted from analysis if not on an AOI). In Levenshtein methods, the distance between fixations 2 and 3 appears further than the distance between fixations 3 and 4, despite the relative proximity of fixations 2 and 3 in terms of pixels. The distance calculations are then influenced by AOI assignment, and the local movements of the eye can introduce small, insignificant variations. Furthermore, because the Levenshtein distance assigns equal value to each editing operator, it tabulates the cost of fixating on proximal visual elements as the same as the cost of fixating on AOIs that are far apart, despite the difference in saccade lengths. In Figure 4, the cost of substitution of the fixation 2 on AOI C is equal to the cost of either fixation 3 or 4 on AOI I, despite the difference in saccade lengths. To account for these issues, a researcher could instead employ a Euclidean or City Block distance calculation, as seen in equations 2 and 3, respectively (32, 35). In these equations, vector U and vector V are AOIs, vector U1 and vector U2 are x- and y-coordinates to the center of the AOI/vector U, and α is a normalization parameter (35). These parameters retain more spatial information 81

about the position of a fixation within an AOI, although there is still a lack of fixation duration information. Despite these issues, various string-edit algorithms have been widely used in eye-tracking research (35, 39).

An advanced application of string-edit methods is the ScanMatch MATLAB tool which applies the Needleman−Wunsch algorithm to align sequences (35, 38, 40). This algorithm utilizes a gridded assignment of AOIs as depicted in Figure 4 (as opposed to just assigning semantically meaningful AOIs as in Figure 3). From this AOI assignment, two strings of fixations are aligned via a substitution matrix in which the cells contain all the possible pairs of characters between strings (32). This substitution matrix is used to determine the substitution cost (which is inversely related to the Euclidean distance). The ScanMatch tool iteratively assigns a cost for each cell in a comparison matrix—generated with one scanpath as the columns and the other as the rows; in this calculation, the lowest cost (or worst score) is assigned to AOIs that are far from each other on the stimulus (32, 35). The Needleman−Wunsch algorithm computes the path from the top left corner of the matrix to the bottom right corner that has the highest cost (or the most similar pair of AOIs). This similarity score is normalized to the length of the scanpaths and reported as a value from 0 (dissimilar) to 1 (perfectly similar). Unlike the Levenshtein distance, it provides relational information between AOIs, but the drawback of this method is the level of calibration needed in the threshold setting for what constitutes a positive or negative cost (35). In contrast to the string-edit methods that depend on AOI definitions, there are non-AOI-based methods of comparing scanpaths, such as attention/heat maps, calculation of the Mannan linear distance, or geometric methods such as the MATLAB tool MultiMatch (32, 35, 41). The attention maps used for scanpath comparison differ from those generated by commercial eye-tracking software; these Gaussian landscape functions represent fixations (as well as their visit count) as a function of height, but they lack the ordinal sequence and shape information of scanpaths (32). The Mannan linear distance is a “nearest neighbor” non-AOI-based comparison of fixation positions, as described in equation 4 (32). Given the xand y-coordinates of two sets of points A and B (the locations in space given the variable names M and N), the Euclidean distance (d) between each point (i) of the M possible points for set A is mapped to the nearest neighbor (j) of the possible N points for set B. As each fixation in one set is mapped onto the nearest neighbor of the other set, the similarity (DMannan) is tabulated from the mean of the Euclidean distances (32). Despite the benefit of not relying on AOIs, this similarity index does not account for ordinal sequence of the scanpath. 82

The geometric analysis of MultiMatch relies on the spatial coordinates as well as saccade direction and amplitude to represent the scanpath as a series of vectors. Similarity between two scanpaths is determined by the extent to which two series of vectors align well (42). Unlike the similarity measures calculated by string-edit methods, the locations of fixations and the angular information of saccades can be retained without the need to bin the vectors based on angles (32). The MultiMatch MATLAB tool calculates the ideal, shortest saccades that would connect the fixations in a scanpath. The program uses a comparison matrix containing all the possible pairings to match the resultant series of vectors with the lowest possible cost (32). For all of the methods proposed in this section, there is an important caveat for similarity measures based on distance calculations: the values of similarity cannot be interpreted absent the context of the data set, and without a hypothesis there is a high likelihood of a false positive similarity value (32, 42).

Transitions From the AOI strings, transitions between any combination of AOIs can also be modeled through a transition matrix (2, 35). A transition is an AOI-based event, and an example is demonstrated in Figure 5 (2). Although transitions and saccades share similarities (they both have amplitudes and durations), a transition is most often defined as a gaze shift from one AOI to another. As seen in Figure 5, saccades in which one or more fixations fall outside of an AOI might not be considered transitions (2). Saccades within an AOI—such as fixation 4 to fixation 5—are not considered transitions. Fixations 12 and 13 represent a clear transition between AOI_F and AOI_C, but it is less clear whether fixations 6 and 7, fixations 9, 10, and 11, or fixations 13 and 14 count as transitions. For fixations 6 and 7, fixation 6 falls outside of AOI_Q. For fixations 10 and 11, the saccade from fixation 9 to 10 brings the fixation outside of AOI_F, but fixation 10 to 11 reenters the AOI. Also, fixations 13 and 14 may be considered a transition, depending on experimental conditions; fixation 14 sits on the boundary of AOI_D. From the fixation data, a gaze transition can be defined as two-character substring (e.g., 2 1 or 3 G). These transitions can be catalogued in a matrix (of two or more dimensions), as illustrated by randomly generated data in Table 1. The row and column labels reflect AOIs, and the values in each cell are the frequencies that those transitions (either one-direction or bidirectional) have occurred, normalized to the source AOI (2, 43). In a bidirectional definition, 3 G and G 3 would be considered one transition, whereas a one-direction definition would count 3 G and G 3 as separate transitions; the decision to employ one definition or the other is dependent on the nature of AOI assignment (semantic or gridded). In the scanpath analysis tool eyePatterns, these frequencies are accompanied by 83

percentages that describe the Markovian probability of a transition from one AOI to another (30, 34, 35).

Figure 5. Example of scanpath with labeled fixations to illustrate potential transitions between AOIs.

Table 1. A Transition Matrix for Transitions of Length 2 From/To

Q

Q

A

F

M

P

3

7

20

4

17

13

15

6

11

A

6

F

23

31

M

8

9

13

P

9

5

7

19 4

By collapsing AOI strings to ignore dwells within an AOI, the transition matrix does not return a value for Q to Q, for instance. Although transitions with lengths longer than 2 can be defined, this data analysis is more difficult because of the increased size of the data set and increased dimensionality of the matrix. Based on the context of the stimulus and the groups of participants, the frequency of a particular transition has been used to link cognition, interest, and decision making (2). In problem-solving contexts, the frequency of transitions between two AOIs is proportional to higher dwell time (2). This matrix can be used to generate probabilities that a particular transition will occur (35). The number of unidirectional transitions between two AOIs has been linked to complexity 84

of stimulus material and expertise of viewer, while the number of bidirectional transitions may suggest a need to refresh working memory (2). From the matrix of transitions between AOIs, a list of transitions ordered by their frequencies can be used to calculate the entropy of transitions as shown in equation 5, in which freqsi is the ith frequency of a particular transition in a string (2). This calculation can be performed on the individual participant level (using an ordered list of transitions within a single string) or on a group level (using an ordered list of transitions within a group of strings).

The entropy value is an indication of the dispersion of fixation distributions on a stimulus; a larger entropy value indicates equal probabilities of transitions between any two AOIs (43). Entropy is appropriate to calculate when a research question calls for a quantification of a participant’s eye movements in a meaningful way. This value is commonly interpreted using the eye−mind hypothesis, such that smaller values of entropy indicate more targeted, intentional scanpaths. Large values of entropy reveal search behaviors for AOIs on the stimulus that are far apart physically. In the example study below, we interpret that a greater entropy value suggests a greater dispersity of transitions, and therefore is an indication of higher cognitive load.

Patterns The scanpath theory suggests that individuals have unique viewing patterns to a stimulus, and by grouping participants we can identify what viewing pattern(s) is common to a group. This can serve as a training set for the prediction of a future participant’s likely group membership (through cluster analysis). Furthermore, studies on expert/novice behavior could benefit from identifying patterns that have a high likelihood of occurring only in a particular group, helping to develop materials that attract novice attention into the featured patterns of experts. Eraslan et al. provide an exhaustive list of methods of detecting patterns and identifying the patterns common to a group or sub-group of scanpaths (35). Analytical tools for pattern detection offer a variety of options, from simple pattern-of-interest detection to searching for all the patterns within a group of scanpaths (30, 35, 44). Patterns in scanpaths may indicate features that the viewer deems important or interesting (30); experts tend to have more efficient/focused scanpaths than novices (31). At its most basic, detecting whether a certain sequence exists within a group of scanpaths is relatively easy, but this tool lacks temporal or spatial information about the AOIs within the sequence (35). In programs such as eyePatterns or R packages like GrpStrings, it is possible to elucidate patterns within groups of scanpaths or featured patterns that have a high probability of appearing in one group of scanpaths compared to other groups (30, 35, 44). 85

Example Study The goal of this example is to demonstrate—using unpublished results from a previously published study (39)—how the significance tests of fixation-based data were supplemented by sequence analysis. By applying a pattern analysis, we were able to elucidate problem-solving strategies of students who answered correctly and compare them—both qualitatively and statistically—to the patterns of students who answered incorrectly. Furthermore, the entropy of transitions for each group showed a significant difference, which further supports the difference in problem-solving approaches of each group.

Context of the Study While eye-tracking studies on problem solving have been well-documented (45–48), multiple literature searches for “problem-solving ordering questions” or “temporal sequence eye tracking” yielded few results. One of the few relevant studies was one that assessed difficulties in causal reasoning in biology; researchers tested eighth and tenth grade students on their ability to organize steps in their temporal sequence after a short, written introduction to a topic and found that most students could not place the steps in their proper temporal sequence, likely due to the complexity of the material (49). Despite the availability of and psychological research into animations for education (13, 17, 50–52), the research on animations versus static graphics yields conflicting results as to whether the number of user-controlled views was important to animation effectiveness, as well as whether “decorative” animations were useful in an online course (52). Instances where animations succeeded over static images in teaching a concept could be attributed to a difference in presentation of the information, more detail provided in the animation, or a higher element of interactivity with animations (51). Animations may also be more difficult to understand by novices and may be distracting or harmful to the construction of knowledge. One study compared the effectiveness of animations and static images as a tutorial for medical students and found that the static tutorial was just as effective as the animation (as determined by a test score) and produced the same amount of cognitive load burden as the animation (31).

Experimental Conditions The hypotheses in this study were: 1. 2. 3.

The eye-tracking measures (such as fixation duration and fixation count) are correlated. Students in the animation group would perform better than the students in the static-image group on the “ordering” items. Between students who correctly answered the ordering items and students who incorrectly answered, there would be a significant difference in: 86

a. b. c. d.

Amount of time spent on answering the items Fixation durations and counts Time spent in each phase of problem solving Scanpath patterns

For this experiment, gaze data was collected on a Tobii T120 eye tracker with a 60 Hertz sampling rate using Tobii Studio 2.0.4 software. The threshold for a fixation was set at 100 milliseconds and a radius of 30 pixels on the 17-inch LCD screen with a resolution of 1280 by 1024 pixels. Eighty-one anatomy/physiology students were randomly assigned to an experimental group: animated or static presentation. An independent t-test was performed on the hourly exam scores to determine that the groups were not significantly different in terms of course achievement. The groups watched the randomly assigned multimedia presentation twice, with a short break in the middle. After the two viewings, all participants answered the same post-test under eye tracking (39). The post-test contained 17 items, three of which were “ordering” items. These ordering items were included to test student recall of the steps of the physiological pathway presented in the multimedia materials. The dependent variable was the post-test score on ordering items (0−3 points), while the independent variables were movie type, sex, time spent in each phase, total fixation duration, fixation count, and visit count (39). Results Based on a correlation matrix (Pearson’s r), total fixation duration, fixation count, visit count, and mouse-click events (from moving answer choices) were highly correlated in each ordering item. As a result, any significant difference found with one eye-tracking metric is likely to be significant among other highly correlated eye-tracking metrics as well. This allows for dimension reduction to a single eye-tracking metric for subsequent analysis. Subsequently, the total fixation duration was used in logistic regressions. The use of logistic regressions determines the contribution of each independent variable (in this study the visualization type, sex, and total fixation duration on each ordering item) to the participant’s score overall on the ordering items; a significant contribution for any of these independent variables reveals which stimuli merit scanpath pattern analysis. To determine the relationship between post-test scores and eye-movement metrics within each item, sex, and visualization type, an ordered logistic regression was performed. The use of ordered logistic regression rather than binomial regression was appropriate because the dependent variable (score) had more than two possible values (i.e., 0, 1, 2, and 3). The results are summarized in Table 2. While we initially expected the media type to influence their performance and serve as the distinguishing factor between groups (correct vs incorrect), the lack of significant contribution of media type to the score ruled out that group classification. The interpretation of the results of this regression is that the negative coefficient indicates that participants who had lower scores overall 87

fixated longer on the problems. Of the independent variables, this difference was only significant for the third ordering item, which covered material from the end of the multimedia materials on the muscle contraction pathway. Therefore, because of its significant difference between correct and incorrect groups on ordering item #3, the large dataset of participants’ fixations can be narrowed even further for scanpath analysis. This allows for an answer of the final research question regarding differences in scanpath patterns during problem solving (which is indicative of differences in problem-solving strategy).

Table 2. Ordered Logistic Regression of Participant Scores and Total Fixation Duration on Each Ordering Item, Visualization Type, and Sex Independent Variable

a

Coefficient

t(81)

p-valuea

Visualization Type (Static or Dynamic)

0.186

0.438

0.661

Sex

0.564

1.317

0.188

Ordering item #1

−0.015

−1.881

0.060

Ordering item #2

0.014

1.274

0.203

Ordering item #3

−0.023

−2.181

0.029*

Significant p-values are bolded and starred.

Previous studies have described four problem-solving phases: (1) translating each component of the prompt into a mental representation of the problem, (2) integrating the mental representations into a cohesive picture, (3) planning a problem-solving approach, and (4) executing the approach (29). A more simplistic model of problem-solving would have two (planning and execution) or three (reading, planning, calculation) phases (31). Given the recording of click events in the eye-tracking data, we were able to define three problem-solving phases for this analysis: reading-and-planning, problem-solving, and answer-checking. Reading-and-planning (or planning) is defined from the time when a participant began to read a problem to the time before the participant moved a choice AND completely dropped it into a box. Problem-solving is defined from the end of the first complete answer-choice drop to the time when the participant dropped the last choice into a box without subsequent changes. The third phase, answer-checking (or checking), is from the last answer-choice drop to the time when the participant clicked the “submit” button (39). To determine the relationship between post-test scores and eye movement metrics within each problem-solving phase on ordering item #3, an ordered logistic regression was performed. The results are summarized in Table 3. When the third ordering item was split into problem-solving phases, there were significant differences for phases two and three. In phase two, the problemsolving phase, participants with lower scores fixated longer; in phase three, the participants with higher scores fixated longer. This statistical analysis could only indicate that there was a statistically significant difference in participants’ fixations 88

in phases two and three (which is indicative of a difference in strategy), but it was not enough to elucidate what those differences in strategy were. Thus, the implementation of scanpath pattern analysis for both groups (correct vs incorrect) in each problem-solving phase is described below.

Table 3. Ordered Logistic Regression between Participant Scores and Total Fixation Duration within Each Phase on Only the Third Ordering Item Independent Variable

a

Coefficient

t(81)

p-valuea

Visualization type (static or dynamic)

0.046

0.0105

0.916

Sex

0.750

1.692

0.091

Planning phase

0.000

0.011

0.991

Problem solving phase

−0.020

−3.795