no Human factors in simulation and training (2 volumes) [1, 2, 2 ed.] 9781032512525, 9781032512532, 9781003401360

Volume 1 of 2: Theory and methods, pp. 1-352 Volume 2 of 2: Application and practice, pp. 353-716

294 56 82MB

English Pages 716 Year 2024

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

no 
Human factors in simulation and training (2 volumes) [1, 2, 2 ed.]
 9781032512525, 9781032512532, 9781003401360

Table of contents :
Human Factors in Simulation and Training: Theory and Methods; Second Edition
Cover
Half Title
Human Factors in Simulation and Training: Theory and Methods; Second Edition
Title Page
Copyright Page
Table of Contents
Preface
Editors
Contributors
Chapter 1 Human Factors in Simulation and Training: An Overview
Introduction
Simulation: The Perfect Storm
Human Factors in Simulation and Training: A Brief History
Why Simulate?
Simulation versus Modeling
The Modeling and Simulation Process: Verification, Validation, and Accreditation
Advantages and Disadvantages of Simulation
Advantages
Cost-Effectiveness
Availability
Safety
Surrogate Value
Reduced Environmental Impact
Improved Training Environment
Standardized Training Environments
Provide Data
Lack of Realism
Disadvantages of Simulation
Does Not Necessarily Reflect Real-World Performance
Surrogate Value
User Acceptance
A Sampling of Progress in Simulation
War-Gaming
Online Gaming
Aviation
Extended Reality: Augmented Reality, Virtual Reality, and Mixed Reality
Introduction to XR
Historical VR Devices
XR Applications in Training
VR in Medical Training
AR in Medical Training
AR in DoD Tactical Combat Casualty Care
VR Flight Training Devices
MR Flight Training Device
AR Trainer for H-60R Preflight Procedures
XR Applications in Operational Support
AR in Army Tactical Operations
VR in Operational Medicine
AR in Operational Medicine
AR for Aircraft Maintainers
Future Directions
XR HMD Enhancements
AR for Red Air
VR for Spatial Disorientation Training
Pilot Training Next (PTN) and Naval Aviation Training Next (NATN)
Challenges to the Adoption of XR Technology
XR Device Visual Fidelity
User Fatigue
Integration
User Resistance
Safety
Regulations
XR Is Not Always the Optimal Training Solution
Augmented Reality in Decision-Making
The Perfect Storm Revisited: The Future of Human Factors in Simulation and Training
Technological Trends
Computation Power
Innovations in Education and Training
The Changing Nature of Education
Acceptance of Simulation/Gaming
Acknowledgments
Disclaimer
References
Chapter 2 Justification for Use of Simulation
Introduction
Purposes
Training
Systems Engineering Evaluation
Research
Recreation
Domains of Application
Aviation
Military
Medical
Driving
Emergency Response
Education and STEM
Entertainment
Maintenance
Achievable Outcomes
Cost Benefit
Safety
Data
Intervention
Flexibility and Availability
Realism
Conclusion
References
Chapter 3 Simulation Fidelity
Introduction
Definition of Fidelity
Physical Fidelity
Visual–Audio Fidelity
Equipment Fidelity
Motion Fidelity
Psychological–Cognitive Fidelity
Other Fidelity
Measuring Fidelity
The Mathematical Model
Subjective Methods
Fidelity Evaluation Frameworks
Fidelity and Transfer of Training
Summary
References
Chapter 4 Transfer of Training
Introduction
Transfer of Training: Terms and Concepts
Positive Transfer
Negative Transfer
Near Transfer
Far Transfer
A Model of Factors Affecting the Transfer of Training
Training Input Factors
Training Outputs
Conditions of Transfer
Dynamic Models of Training Transfer
Research Methods
Transfer of Training Performance Measurement
Objective Measures
Subjective Measures
Selecting Performance Measures
Using Performance Measures to Indicate Transfer
Experimental Design
Forward Transfer Study
Backward Transfer Study
Quasi-Experimental Study
Curve-Fitting Method
Summary
References
Chapter 5 Simulation-Based Training for Decision-Making: Providing a Guide to Develop Training Based on Decision-Making Theories
Theoretical Background of Decision-Making
Normative Decision Models
Decision Models: New Perspectives
Naturalistic Decision-Making (NDM) Framework and Recognition-Primed Decision (RPD) Model
Biases in Decision-Making
Confirmation
Over- and Under-Confidence
Framing
Probability Perception (Gambler’s Fallacy)
Sunk Costs
Decision Theory Applied to Training
Steps in Developing Simulations to Train Decision-Making
Conduct a Needs Assessment
Identify Learning Objectives
Set the Simulation Context
Establish KSAs
Create Events to Elicit KSAs
Establish an Assessment Plan
Conclusion
References
Chapter 6 Almost Like the Real Thing – The Hidden Limits in Flight Simulation and Training
Introduction
Introduction to Simulator Motion
How Do Humans Perceive Motion Drive?
Let Them Eat Humble Pie—Or Not?
Same, Same But Different
Fiddling with Fidelity While Missing the Story
A Perfect Tool or a Tool That Can Be Perfected?
Summary
Acknowledgment
References
Chapter 7 Cybersickness in Immersive Training Environments
Introduction
Cybersickness Background
Individual Susceptibility and Stimulus Intensity
Quantifying Immersive Stimulus Intensity
Usage Protocol
Conclusions
Acknowledgments
References
Chapter 8 Distributed Debriefing for Simulation-Based Training
Introduction
Issues to Consider in Providing Distributed Debriefing for Simulation-Based Training
The Rest of This Chapter
Debriefing Functions and Methods
Functions of Debriefs
Methods of Debriefs
Challenges of Distributed Debriefs
Performance Diagnosis
Performance Recall, Comparison, and Extrapolation
Assessment and Display of Competence
Requirements for Distributed Debriefs
Communication
Collaboration
Automated Data Capture
Data Presentation
Data Selection
Replay Perspective
Expert Models of Performance
Flexible Delivery Style
Post-Exercise Review
Store Lessons Learned
Scalable
Ease of Use
Current Techniques for Debriefing Distributed Teams
State of the Art in Distributed Debriefing
Large-Scale Distributed Simulation Training Exercises
Small-Scale Distributed Simulation Training Exercises
Summary
Acknowledgments
References
Chapter 9 Performance Assessment in Simulation
Subjective Methods of Performance Measurement
Purpose of Performance Measures
Special Properties of Performance Measures in Simulators
Defining and Assessing Reliability
Data Requirements
Qualitative versus Quantitative
A Qualitative Index
Quantitative Indices
Special Problems with Simulators
The Gouge
Instructor Attitudes
Objective Methods of Performance Measurement
Automated Data Collection Systems
Flight Technical Error
Deviation-Based Metrics
Root Mean Square Error (RMSE)
Number of Deviations and Time-Outside Standard
Time within FAA Practical Test Standard
Non-FTE Measures
Rates of Change
Control Input
Summary
Note
References
Chapter 10 Performance Measurement Issues and Guidelines for Adaptive, Simulation-Based Training
Introduction
Research Advances
Adaptive SBT Implementation
A Confirmatory Performance Measurement Framework for Adaptive SBT
Dimensions and Essential Characteristics of Performance Measures
Validity
Criterion Relevance
Reliability
Measure Invariance
Objectivity and Intrusiveness
Diagnosticity
Measurement Principles for Adaptive Training
Principle 1: Ensure that Performance Measure Development Is Guided by Sound Theory
Principle 2: Consider and Exploit Measurement Affordances
Principle 3: Ensure Usefulness of Measures for Evaluating Training Effectiveness
Summary and Conclusions
References
Chapter 11 Scoring Simulations with Artificial Intelligence
Artificial Intelligence and Reproducing Expert Ratings
Traditional Approach to Scoring Open-Ended Content: Rater Training
The Architecture
The Data
Output
Other Considerations
Scoring Actions in Simulated Environments
Traditional Approaches to Scoring Simulations
Data Representations for Modeling Simulations
Machine Learning Methods for Scoring Simulations
Static Methods Using Summarized Representations
Time Series Methods
Applications
Trainee Feedback
Early Prediction
Real-Time Feedback
Adaptive Simulations
Conclusion
References
Chapter 12 Dissecting the Neurodynamics of the Pauses and Uncertainties of Healthcare Teams
Introduction
The Significance of Structure in EEG Amplitudes
Neurodynamic Correlates of Uncertainty
Estimating the Frequency Magnitude and Duration of Uncertainty
Augmenting Debriefings with Neurodynamics
Using Neurodynamic Analyses to Train the Trainers
Early Novices
Later Novices
Evolving the Technology
Summary
References
Chapter 13 The Future of Simulation
Proem
The Fundamental and Practical Reasons for Simulation
Simulations in the Past
On Predicting the Future
The Practicalities of Simulation
Simulation and Training
Discourse between Two Worlds
Hybrid Simulation Worlds
Assessing the Progress of Simulation Technologies
The Turing Test of Simulation
Supersimulation
The Moral Dimension of Simulation
A Philosophical Valediction
Summary and Conclusion
Acknowledgments
References
Appendix A: Glossary of Modeling Terms
Appendix B: Glossary of Simulation Terms
Appendix C: Glossary of Verification, Validation, and Accreditation Terms
Index

Human Factors in Simulation and Training: Application and Practice: Second Edition
Cover
Half Title
Human Factors in Simulation and Training: Application and Practice: Second Edition
Title Page
Copyright Page
Table of Contents
Preface
Editors
Contributors
Chapter 1 Controls and Displays for Aviation Research Simulation: A Historical Review
Disclaimer
Introduction
Fixed-Based Simulators
Integrated Information Presentation and Control System Study (IIPACSS) Simulator, Boeing, Seattle, WA
Display Technology
Control Technology
Representative Research
Impact
Digital Synthesis (DIGISYN) Simulator
Display Technology
Control Technology
Representative Research
Impact
Microprocessor Applications for Graphics and Interactive Communication (MAGIC) Simulator
Display Technology
Control Technology
Representative Research
Impact
Panoramic Cockpit Control and Display System (PCCADS) Simulator
Display Technology
Control Technology
Representative Research
Impact
Helmet-Mounted Oculometer Facility (HMOF) Simulator
Display Technology
Control Technology
Eye and Head Monitor
Representative Research
Impact
Synthetic Interface Research for UAV Systems (SIRUS) Simulator
Display Technology
Control Technology
Representative Research
Impact
Vigilant Spirit Control Station (VSCS) Simulator
Display Technology
Controls Technology
Representative Research
Sense and Avoid (SAA) Display Symbology Evaluation
Cyber Threat Information Requirements Investigation for UAV Crews
Impact
Intelligent Multi-Unmanned Vehicle Planner with Adaptive Collaborative/Control Technologies (IMPACT) Simulator
Display Technology
Control Technology
Representative Research
Impact
Motion-Based Simulators
Dynamic Environmental Simulator (DES)
Display Technology
Control Technology
Representative Research
Impact
Disorientation Research Device
Display Technology
Control Technology
Representative Research
Impact
In-Flight Simulators
NASA’s OV-10
Representative Research
Impact
Total In-Flight Simulator (TIFS) NC-131H Transport Aircraft
Representative Research
Impact
Variable In-Flight Stability Test Aircraft (VISTA) Lockheed NF-16D Fighter Aircraft
Representative Research
Impact
University of Iowa Operator Performance Laboratory Aero L-29 Delfin Jet
Representative Research
Impact
Multisensory Displays and Controls
Displays and Controls to Support Human–Machine Teaming
Summary
Acknowledgment
References
Acronyms/Abbreviations
Chapter 2 Augmented Reality as a Means of Job Task Training in Aviation
Augmented Reality (AR)
Historical Overview: AR and Training
Cognition and AR
Elaboration and Recall
Spatial Relations
Memory Channels and AR
Knowledge Development and Training Transfer
What Is the Future of Job Training – Training on the Job Literally?
Conclusion
References
Chapter 3 Civil Aviation: Flight Simulators and Training
Introduction
An Overview of Civil Aviation Training, Flight Simulators, and the Human Factors Therein
Introduction
Flight Simulator Basics
Human Factors and Flight Simulators
Major Drivers in Civil Aviation Pilot/Crew Training
Flight Simulators and Flight Training Devices
Overview
Flight Simulators and Training
Flight Training Devices and Training
Flight Simulator and FTD Assessment
SFAR 58, AND AQP
History
SFAR/AQP: Overview and Synopsis
SFAR 58
AQP, LOS/LOFT, and Simulators
Line-Oriented Flight Training (LOFT)
Background
Maximizing LOFT: The Mission Performance Model and the Operational Decision-Making Paradigm
LOFT: Current and Future
Risk Identification and Management: Training and Evaluation with MPM and ODM Paradigm
Mission Performance Model
From CRM/MPM to ODM
Operational Risk Management and Decision-Making
Optimizing Performance during Complex Operations
Introduction
The Overall Mission Continuation Decision
Determining Risk
Operational Envelope
The Unstable, Missed Approach Decision
Bayesian Probability
Problem-Solving under Conditions of Uncertainty
The Takeoff “Go/No-Go” Decision
Unexpected Operational Difficulties
Optimizing the Decision Function
Operational Analysis of the Takeoff Decision
Conclusion
ODM and MPM in LOFT Design, Development, and Evaluation
Introduction
LOF: DOM and MRM
LOFT Design: Another Approach
Training
References
Federal Aviation Administration Advisory Circulars (Ac) and Regulations
Chapter 4 Integrating Effective Training and Research Objectives: Lessons from the Black Skies Series of Exercises
Introduction
The Black Skies Exercises
Outcomes for Military Operators
Performance Benefits and Transfer
Auxiliary Benefits
Research on Team Coordination Dynamics
Communication Dynamics
Physiological Dynamics
Combining Communication with Physiological Data
Operator Acceptance of Physiological Monitoring
Evaluation of Training Capability
Summary and Conclusion
Acknowledgements and Dedication
References
Chapter 5 Extended Reality in Training Environments: A Human Factors Trend Analysis
Background
Methodological Approach
Results
Study Designs over Time
Simulation Domains over Time
Simulation Cluster Areas over Time
Military Simulation
Aerospace Simulation
Driving Simulation
Healthcare Simulation
Manufacturing Simulation
Entertainment Simulation
Simulation Methods and Design
General Simulation Areas
Military Cluster Areas over Time
Author Affiliations over Time
Military Author Affiliations over Time
Funding Agencies over Time
Military and Government Funding over Time
XR System Types over Time
XR Systems’ Breakdown over Time
Physiological Recording Systems over Time
Conclusions
Major Conclusions and Recommendations
References
Chapter 6 Mitigation of Motion Sickness Symptoms by Adaptive Perceptual Learning: Implications for Space and Cyber Environments
Motion Symptoms
Mitigation
Individual Differences in Adaptation
The Long-Term Retention, Conditioning, and Transfer of Adaptation
Long-Lasting Adaptation
Generalizability
Adaptive Perceptual Learning (APL) Training
Pre-Adaptation on VR
Pre-Adaptation on Vection Drum
Pre-Adaptation on VIMS
Conclusion
Acknowledgments
In Honorem
References
Chapter 7 Decision-Making under Crisis Conditions: A Training and Simulation Perspective
Introduction
Effects of Time Stress and Uncertainty on Decision-Making
Other Effects on Human Decision Makers under Crisis Conditions
Decision-Making Theories
Decision-Making Performance Measures
Crisis Decision-Making Training
General and Stress Training
Simulation
Microworld
Conclusion
References
Chapter 8 Healthcare Simulation and Training
Introduction
Benefits of Simulation in Healthcare
Drawbacks of Simulation in Healthcare
Dimensions of Simulation
History of Medical Simulation
Mannequins
Virtual Reality Systems
Surgical Systems
Collaborative and Immersive Training Systems
Standardized Patients
Hybrid Systems
Training
Team Training
Benefits
History and Scope of Team Training in Healthcare
Training Transfer
Healthcare Simulation and the Pandemic
Conclusion
References
Chapter 9 Best Practices in Surgical Simulation
Introduction
Current Technologies Used in Surgical Simulation for Skills Training
Technical Skills Simulation
Low-Cost Simulators
High-Cost Simulators
Nontechnical Skills Simulation
The Objective Structured Clinical Examination (OSCE)
Team-Based Training
Application of Simulation across the Continuum of Surgical Training
Undergraduate Medical Education
Graduate Medical Education
Postgraduate Training of Practicing Surgeons
Human Factors and Surgical Simulation
Equipment Design and Ergonomics
Performance Optimization
Mental Skills Optimization and Coaching
Developing Expertise
Surgical Team Dynamics
Utilizing Simulation to Minimize Subjectivity in Assessment
Future of Simulation and Human Factors in Surgery
Conclusion
References
Chapter 10 Healthcare Simulation Methods: A Multifaceted Approach
Introduction
What is Healthcare Simulation?
Concepts in Healthcare Simulation
Simulation Fidelity
Psychological Safety
Locations and Modes of Healthcare Simulation
Applications of Healthcare Simulation
Simulation for Practice and Learning
Simulations with Reflective Debriefing
Rapid Cycle Deliberate Practice
Mastery Learning
Just-in-Time Training
Choice of Training Type
Simulation for Evaluation and Testing
Systems Testing
Conclusion and Future Directions
References
Chapter 11 Design and Development of Algorithms for Gesture-Based Control of Semi-Autonomous Vehicles
Introduction
Background
Gestures and Gesture Capture Technology
Cognitive Loading Considerations
Approach, Implementation, and Results
Approach
Developing Proper Training System Familiarization as a Prelude to Training
Implementation and Results: Phase 1
Gestures and LMC Measurements
Virtual Environment Development
User Testing
Algorithm Redesign
Implementation and Results: Phase 2
Physical Demonstration Setup
User Testing
Summary and Conclusions
References
Chapter 12 The Influence of New Realities: How Virtual, Augmented, and Mixed Reality Advance Training Methods in Aviation
Introduction
Virtual, Mixed, and Augmented Reality as We See It … or Don’t
The Growth of Technology in Aviation Training
Use of AR, MR, and VR in the Aviation Field
The Shortage and the Future
Benefits, Drawbacks, and Resolutions
Conclusion
References
Chapter 13 Training, Stress, Time Pressure, and Surprise: An Accident Case Study
Introduction and Background
Training and System Design
An Accident Case Study: Colgan Air Flight 3407
Why Didn’t Previous Training Prevent This Accident?
Opportunities for Better Outcomes
Training: A Countermeasure for Responding to Startle and Surprise
Automation Design: Keeping the Human in the Loop Where Feasible
Highlighting the Lessons Learned
References
Index

Citation preview

Human Factors in Simulation and Training Human Factors in Simulation and Training: Theory and Methods covers theoretical concepts on human factors principles as they apply to the fields of simulation and training in the real world. This book discusses traditional and nontraditional aspects of simulation and training. Topics covered include simulation fidelity, transfer of training, limits of simulation and training, virtual reality in the training environment, simulation-based situation awareness training, automated performance measures, performance assessment in simulation, adaptive simulation-based training, and scoring simulations with artificial intelligence. This book will be a valuable resource for professionals and graduate students in the fields of ergonomics, human factors, computer engineering, aerospace engineering, and occupational health and safety.

Human Factors in Simulation and Training Theory and Methods Second Edition

Edited by

Dennis Vincenzi, Mustapha Mouloua, P. A. Hancock, James A. Pharmer, and James C. Ferraro

Front cover image: Sergey Ryzhov/Shutterstock Second edition published 2024 by CRC Press 2385 NW Executive Center Drive, Suite 320 Boca Raton, FL 33431 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2024 selection and editorial matter, Dennis Vincenzi, Mustapha Mouloua, Peter A. Hancock, James A. Pharmer, and James C. Ferraro; individual chapters, the contributors First edition published by CRC Press 2019 Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www​.copyright​.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermissions​@tandf​.co​​.uk Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe.

Library of Congress Cataloging‑in‑Publication Data Names: Vincenzi, Dennis A., editor.   Title: Human factors in simulation and training : theory and methods / edited by Dennis A. Vincenzi, Mustapha Moloua, Peter A. Hancock, James Pharmer, and James C. Ferraro.   Description: Second edition. | Boca Raton : CRC Press, [2023] | Includes bibliographical references and index. | Identifiers: LCCN 2023001393 (print) | LCCN 2023001394 (ebook) | ISBN 9781032512525 (hbk) | ISBN 9781032512532 (pbk) | ISBN 9781003401360 (ebk) Subjects: LCSH: Simulation methods. | Occupational training. | Human engineering. Classification: LCC T57.62 .H845 2023 (print) | LCC T57.62 (ebook) | DDC 620.8/2--dc23/eng/20230113  LC record available at https://lccn​.loc​.gov​/2023001393 LC ebook record available at https://lccn​.loc​.gov​/2023001394​ ISBN: 978-1-032-51252-5 (hbk) ISBN: 978-1-032-51253-2 (pbk) ISBN: 978-1-003-40136-0 (ebk) DOI: 10.1201/9781003401360 Typeset in Nemilov by Deanta Global Publishing Services, Chennai, India

Contents Preface......................................................................................................................vii Editors........................................................................................................................ix Contributors............................................................................................................ xiii Chapter 1 Human Factors in Simulation and Training: An Overview..................1 T. Chris Foster, William F. Moroney, Henry L. Phillips IV, and Michael G. Lilienthal Chapter 2 Justification for Use of Simulation...................................................... 65 Meredith Carroll, Summer Rebensky, Maria Chaparro Osman, and John Deaton Chapter 3 Simulation Fidelity.............................................................................. 91 Dahai Liu, Jiahao Yu, Nikolas D. Macchiarella, and Dennis A. Vincenzi Chapter 4 Transfer of Training.......................................................................... 109 Dahai Liu, Jacqueline McSorley, Elizabeth Blickensderfer, Dennis A. Vincenzi, and Nikolas D. Macchiarella Chapter 5 Simulation-Based Training for Decision-Making: Providing a Guide to Develop Training Based on Decision-Making Theories................................................................ 125 Richard J. Simonson, Kimberly N. Williams, Joseph R. Keebler, and Elizabeth H. Lazzara Chapter 6 Almost Like the Real Thing – The Hidden Limits in Flight Simulation and Training.................................................................... 141 Shem Malmquist, Deborah Sater Carstens, and Nicklas Dahlstrom Chapter 7 Cybersickness in Immersive Training Environments....................... 159 Kay M. Stanney, Claire L. Hughes, Peyton Bailey, Ernesto Ruiz, and Cali Fidopiastis



v

vi

Contents

Chapter 8 Distributed Debriefing for Simulation-Based Training..................... 181 Cullen D. Jackson, Di Qi, Anna Johansson, Emily E. Wiese, William J. Salter, Emily M. Stelzer, and Suvranu DeJared Freeman Chapter 9 Performance Assessment in Simulation............................................205 Steve Hall, Michael Brannick, and John L. Kleber Chapter 10 Performance Measurement Issues and Guidelines for Adaptive, Simulation-Based Training............................................................... 229 Phillip M. Mangos and Joan H. Johnston Chapter 11 Scoring Simulations with Artificial Intelligence.............................. 257 Carter Gibson, Nick Koenig, Joshua Andrews, and Michael Geden Chapter 12 Dissecting the Neurodynamics of the Pauses and Uncertainties of Healthcare Teams.......................................................................... 279 Ronald Stevens, Trysha Galloway, and Ann Willemsen-Dunlap Chapter 13 The Future of Simulation.................................................................. 297 P. A. Hancock Appendix A: Glossary of Modeling Terms......................................................... 321 Appendix B: Glossary of Simulation Terms....................................................... 327 Appendix C: Glossary of Verification, Validation, and Accreditation Terms............................................................................................. 331 Index....................................................................................................................... 333

Preface As we look toward the future, we find that it is neither totally random nor totally predictable. This is a truth that persists despite the years that have passed since the publication of the previous edition of this book. We maintain that, if the future were completely predictable, there would be no point looking forward because we would already know what was to come. If it were completely random, we would not bother because we could not know anything systematic about forthcoming events. That life lies between these two polar extremes gives us both the motivation to try to understand the future and the belief that we can do so, at least to a useful degree. Indeed, the triumphs of science encourage us to believe that we are making “progress” in so far as our predictions of the future are concerned. At least in relation to many physical processes, these are growing more accurate as the years progress. And, of course, the more we can know about the future, the more we can generate rational courses of action based on this understanding. This respective confluence of ideas encourages us to develop theories, models, methodologies, and other such instruments to continue to improve our predictive capabilities. However, although certain forms of prediction work well for some of the simpler physical processes, there are many forms of complex interaction in which our predictive capacities are at present rudimentary at best. Unfortunately, many of these complex processes—global warming, for example—may prove so dangerous to our species that we cannot afford to assert predictions that are radically incorrect. Flawed prediction here can spell our end. As a consequence, we are in ever greater need of technologies that allow us to generate and refine predictions as well as exploring alternative potentialities found to be countertheses and antitheses to these various propositions. One such technology is simulation. As a tool, simulation is an aid to the imagination. It allows us to create, populate, and activate possible futures and explore the ramifications of these developed scenarios. However, in common with all tools, it performs its task only to the degree that it is open to facile interaction with the user. One can imagine that on many occasions a poor simulation with its impoverished or wildly inaccurate outcomes might be of even more harm than good. Thus, as with all tools and technologies, we certainly need the application of the branch of science that turns user–machine antagonism into user–machine synergy. That branch of science is human factors. Hence, the focus of this present work is on human factors issues as they pertain to simulation in support of training and predicting humans to do certain tasks. In general, these issues revolve around two central themes, represented uniquely across two books. The theme of this book concerns the methods by which simulation technology is utilized, in addition to theories surrounding its potential shortcomings and how best to get the most out of simulated training and assessment. Indeed, there are many techniques and insights from the behavioral sciences that can help us to construct better ways to create and visualize possible futures. In this work, we have solicited chapters that deal with a wide variety of topics, beginning with theory and methods, vii

viii

Preface

in areas ranging from traditional training to augmented reality to virtual reality. Areas of coverage include fields related to healthcare and aviation. This theory-based book will focus on human factors aspects of simulation and training ranging from the history of simulators and training devices to future trends in simulation from both a civilian and military perspective. The chapters in the book discuss methods of utilizing simulation technology to guide decision-making and evaluate performance. Chapters will be comprised of in-depth discussions of specific issues, including fidelity, interfaces and control devices, transfer of training, simulator sickness, effects of motion in simulated systems, and virtual reality. Dennis A. Vincenzi Mustapha Mouloua P. A. Hancock James A. Pharmer James C. Ferraro

Editors Dennis A. Vincenzi earned his doctoral degree in 1998 from the University of Central Florida in Human Factors and Applied Experimental Psychology and has over 25 years of experience as a Human Factors researcher. He has been employed by Embry-Riddle Aeronautical University from 1999 to 2004, where he held the position of Assistant Professor in the Department of Human Factors and Systems in Daytona Beach. In 2004, Dr. Vincenzi left Embry-Riddle to work for the United States Navy as a Senior Human Factors Engineer at the Naval Air Warfare Center Training Systems Division (NAWCTSD) in Orlando, FL. His duties included performing Human Factors research involving simulation and training system development for a variety of Navy sea and air platforms, including the F/A 18 Hornet and Super Hornet, F-35 JSF, Los Angeles, Ohio, and Virginia class submarines, and Littoral Combat Ship (LCS). He was also heavily involved in research involving pilot selection, human performance, and ground control station design for a number of Navy, Marine Corps, and Special Operations Command Unmanned Aerial Systems (UAS). Since returning to Embry-Riddle Aeronautical University in 2012, Dr. Vincenzi has been involved in research related to UAS and regulatory requirements within the NAS and has been heavily involved in the development of an experimental gesturebased interface used for investigating user preference, usability, and functionality issues related to interface design in virtual environments. Dr. Vincenzi is currently the Program Chair for the Master of Science in Human Factors program at EmbryRiddle Aeronautical University. Mustapha Mouloua is Professor of Psychology and Director of the Transportation Research Group at the University of Central Florida (UCF), Orlando, FL. He earned his Ph.D. (1992) and M.A. (1986) degrees in Applied/Experimental Psychology from the Catholic University of America, Washington, DC. Before joining the faculty at UCF in 1994, he was Postdoctoral Fellow at the Cognitive Sciences Laboratory of the Catholic University of America from 1992 to1994, where he studied and researched several aspects of humanautomation interaction topics sponsored by NASA, and the Office of Naval Research (ONR). He has over over 30 years of experience in the teaching and research related to complex humanmachine systems. His research interests include vigilance and sustained attention, cognitive aging, human performance assessment, humanautomation interaction, pilotalerting systems interaction, automation and workload in aviation systems, simulation, and training in transportation systems. Dr. Mouloua made over 300 conference presentations with his undergraduate and graduate students, as well as his professional colleagues. He also has about 200 research publications and scientific reports published in journals and proceedings, such as Experimental Aging Research, Human Factors, Ergonomics, Perception and Psychophysics, Journal of Experimental Psychology: Human Perception and Performance, International Journal of Aviation Psychology, Journal of Cognitive Engineering and Decision Making, Proceedings of the Human ix

x

Editors

Factors and Ergonomics Society, Applied Ergonomics, Transportation Research Part F: Traffic Psychology and Behaviour, Ergonomics in Design, Transportation Research Record, and International Journal of Occupational Safety and Ergonomics. Together with his colleagues Raja Parasuraman and Robert Molloy, he was the winner of the Jerome Hirsch Ely Award of the Human Factors and Ergonomics Society in 1997. He was previously Director of the Applied/Experimental and Human Factors Psychology doctoral program (20082017). At UCF, Dr. Mouloua earned eight prestigious Teaching and Research Awards and was inducted into the UCF College of Sciences Millionaire Club for procuring over $1 million in research funds. He was awarded a UCF “Twenty Years’ Service” award in 2014, was awarded the UCF International Golden Key and Honorary member status in 2011, and his research was selected to be among the top 30 best published research articles in the last 50 years by the Human Factors and Ergonomics Society in 2008. P. A. Hancock, D.Sc., Ph.D., is Provost Distinguished Research Professor in the Department of Psychology and the Institute for Simulation and Training, as well as at the Department of Civil and Environmental Engineering and the Department of Industrial Engineering and Management Systems at the University of Central Florida (UCF). At UCF in 2009, he was created the 16th ever University Pegasus Professor (the Institution’s highest honor) and in 2012 was named 6th ever University Trustee Chair. He directs the MIT2 Research Laboratories. He is the author of over 1,100 refereed scientific articles, chapters, and reports as well as writing and editing more than 25 books. He has been continuously funded by extramural sources for every one of the forty years of his professional career. This includes support from NASA, NSF, NIH, NIA, FAA, FHWA, NRC, NHTSA, DARPA, NIMH, and all of the branches of the US Armed Forces. He has presented or been an author on over 1,200 scientific presentations. In association with his colleagues Raja Parasuraman and Anthony Masalonis, he was the winner of the Jerome Hirsch Ely Award of the Human Factors and Ergonomics Society for 2001, the same year in which he was elected a Fellow of the International Ergonomics Association. In 2006, he won the Norbert Wiener Award of the Systems, Man and Cybernetics Society of the Institute of Electrical and Electronics Engineers (IEEE), being the highest award that Society gives for scientific attainment. He is a Fellow and past President of the Human Factors and Ergonomics Society and a Fellow and twice past President of the Society of Engineering Psychologists as well as being a former Chair of the Board of the Society for Human Performance in Extreme Environments. Most recently he has been elected a Fellow of the Royal Aeronautical Society (RAeS) and in 2016 was named the 30th Honorary Member of the Institute of Industrial and Systems Engineers (IISE). He currently serves as a member of the United States Air Force, Scientific Advisory Board (SAB), and has also served on the US Army Science Board (ASB). He is also a Fellow of AAAS and IEEE. James Pharmer is the Chief Scientist for the Research, Development, Test, and Evaluation (RDT&E) Department and the Head of the Experimental and Applied Human Performance Research and Development (R&D) Division at the Naval Air

Editors

xi

Warfare Center Training Systems Division (NAWCTSD) in Orlando, Florida.. He is a Naval Air Warfare Center Aviation Division (NAWCAD) Fellow and has over 20 years of experience in training and human performance R&D for advanced military systems across a variety of warfare domains. His work includes conducting R&D and direct participation on systems acquisition teams to support human systems integration (HSI) implementation for Navy ships, aircraft, and systems. He chairs multiple working groups to develop HSI policy, processes, and education. He holds a doctoral degree in Applied Experimental Human Factors Psychology from the University of Central Florida and a master’s degree in Engineering Psychology from the Florida Institute of Technology. James C. Ferraro is a human factors research scientist specializing in simulation and game-based assessment of human performance in complex systems. He earned his Ph.D. in Human Factors and Cognitive Psychology from the University of Central Florida (UCF) in 2022 and his M.A. in Applied Experimental and Human Factors Psychology from UCF in 2019. Dr. Ferraro has led and contributed to a number of research efforts in support of government-sponsored (NAVAIR, USAF) projects to improve training and selection of personnel in various occupations. Areas include air traffic control, tactical urban warfare, explosive ordnance disposal, special forces rotary wing operations, and unmanned aircraft operations. His research on topics such as pilot/operator attentional strategies, trust in automated systems, and predictors of individual performance has been presented at local, regional, and international conferences (Human Factors and Ergonomics Society Annual Meeting, International Symposium on Aviation Psychology, Conference on Applied Human Factors and Ergonomics) and published in multiple academic journals (Ergonomics, Applied Ergonomics). He is the technical editor of the two-volume book set Human Performance in Automated and Autonomous Systems (2019) and the co-author of published book chapters pertaining to human monitoring of automated systems and the role of trust in unmanned vehicle operations. Dr. Ferraro is currently a Senior Research Scientist with Adaptive Immersion Technologies, based in Tampa, FL.

Contributors Joshua Andrews Modern Hire Raleigh, NC Peyton Bailey Design Interactive Orlando, FL Elizabeth Blickensderfer Embry-Riddle Aeronautical University Daytona Beach, FL Michael Brannick University of South Florida Tampa, FL Meredith Carroll Florida Institute of Technology Melbourne, FL Deborah Sater Carstens Florida Institute of Technology Melbourne, FL Nicklas Dahlstrom Lund University School of Aviation Lund, Sweden Suvranu De FAMU-FSU College of Engineering Tallahassee, FL John Deaton Florida Institute of Technology Melbourne, FL Cali Fidopiastis Design Interactive Orlando, FL

T. Chris Foster Naval Air Warfare Center Aircraft Division (NAWCAD) Patuxent River, MD Jared Freeman Aptima, Inc. Arlington, VA Trysha Galloway The Learning Chameleon, Inc. Culver City, CA Michael Geden Modern Hire Raleigh, NC Carter Gibson Modern Hire Birmingham, AL Steve Hall Winter Springs, FL P. A. Hancock University of Central Florida Orlando, FL Claire L. Hughes Design Interactive Orlando, FL Cullen D. Jackson Beth Israel Deaconess Medical Center Boston, MA Anna Johansson Beth Israel Deaconess Medical Center, Department of Medicine Boston, MA

xiii

xiv

Joan H. Johnston United States Army Combat Capabilities Development Command Soldier Center Natick, MA Joseph R. Keebler Embry-Riddle Aeronautical University Daytona Beach, FL John L. Kleber Embry-Riddle Aeronautical University Daytona Beach, FL Nick Koenig Modern Hire Rogers, AR Elizabeth H. Lazzara Embry-Riddle Aeronautical University Daytona Beach, FL Michael G. Lilienthal EWA Government Systems Inc. Herndon, VA Dahai Liu Embry-Riddle Aeronautical University Daytona Beach, FL Nikolas D. Macchiarella Embry-Riddle Aeronautical University Daytona Beach, FL

Contributors

Jacqueline McSorley Embry-Riddle Aeronautical University Daytona Beach, FL William F. Moroney University of Dayton Dayton, OH Maria Chaparro Osman Aptima Inc. Orlando, FL Henry L. Phillips IV Soar Technology, Inc. Pensacola, FL Di Qi Chapman University Orange, CA Summer Rebensky Aptima, Inc. Dayton, OH Ernesto Ruiz Florida Department of Health Orlando, FL William J. Salter Strategic Solutions Consulting Harvard, MA Richard J. Simonson Embry-Riddle Aeronautical University Daytona Beach, FL

Shem Malmquist Florida Institute of Technology Melbourne, FL

Kay M. Stanney Design Interactive Orlando, FL

Phillip M. Mangos Adaptive Immersion Technologies Tampa, FL

Emily M. Stelzer MITRE Corporation McLean, VA

Contributors

xv

Ronald Stevens* UCLA School of Medicine, Brain Research Institute Los Angeles, CA, and The Learning Chameleon, Inc. Culver City, CA,

Ann Willemsen-Dunlap JUMP Simulation and Education Center Peoria, IL

Dennis A. Vincenzi Embry-Riddle Aeronautical University Daytona Beach, FL

Jiahao Yu Embry-Riddle Aeronautical University Daytona Beach, FL

Kimberly N. Williams Embry-Riddle Aeronautical University Daytona Beach, FL

Emily E. Wiese Blueprint Test Preparation Manhattan Beach, CA

* The editors would like to pay their respects to Dr. Ron Stevens, who sadly passed away prior to publication of this book project. We are very grateful for his contributions and dedication to the field of human factors in simulation and training.

1

Human Factors in Simulation and Training An Overview T. Chris Foster, William F. Moroney, Henry L. Phillips IV, and Michael G. Lilienthal

CONTENTS Introduction................................................................................................................. 2 Simulation: The Perfect Storm.................................................................................... 3 Human Factors in Simulation and Training: A Brief History..................................... 7 Why Simulate?............................................................................................................ 9 Simulation versus Modeling..................................................................................... 10 The Modeling and Simulation Process: Verification, Validation, and Accreditation........................................................................................ 11 Advantages and Disadvantages of Simulation.......................................................... 13 Advantages....................................................................................................... 13 Cost-Effectiveness............................................................................... 13 Availability........................................................................................... 14 Safety .............................................................................................. 15 Surrogate Value.................................................................................... 16 Reduced Environmental Impact........................................................... 16 Improved Training Environment.......................................................... 16 Standardized Training Environments................................................... 17 Provide Data........................................................................................ 17 Lack of Realism................................................................................... 17 Disadvantages of Simulation........................................................................... 18 Does Not Necessarily Reflect Real-World Performance..................... 18 Surrogate Value.................................................................................... 18 User Acceptance.................................................................................. 18 A Sampling of Progress in Simulation..................................................................... 19 War-Gaming..................................................................................................... 19 Online Gaming................................................................................................. 23 Aviation............................................................................................................25 Extended Reality: Augmented Reality, Virtual Reality, and Mixed Reality............. 29 Introduction to XR........................................................................................... 29 Historical VR Devices..................................................................................... 31

DOI: 10.1201/9781003401360-1

1

2

Human Factors in Simulation and Training

XR Applications in Training............................................................................ 33 VR in Medical Training....................................................................... 33 AR in Medical Training.......................................................................34 AR in DoD Tactical Combat Casualty Care........................................34 VR Flight Training Devices................................................................. 36 MR Flight Training Device.................................................................. 38 AR Trainer for H-60R Preflight Procedures........................................ 39 XR Applications in Operational Support.........................................................40 AR in Army Tactical Operations......................................................... 41 VR in Operational Medicine................................................................ 41 AR in Operational Medicine................................................................ 41 AR for Aircraft Maintainers................................................................. 41 Future Directions............................................................................................. 42 XR HMD Enhancements..................................................................... 42 AR for Red Air..................................................................................... 42 VR for Spatial Disorientation Training................................................ 43 Pilot Training Next (PTN) and Naval Aviation Training Next (NATN)......................................................................... 43 Challenges to the Adoption of XR Technology...............................................44 XR Device Visual Fidelity................................................................... 45 User Fatigue......................................................................................... 45 Integration............................................................................................ 45 User Resistance....................................................................................46 Safety ..............................................................................................46 Regulations.......................................................................................... 47 XR Is Not Always the Optimal Training Solution............................... 47 Augmented Reality in Decision-Making.................................................................. 49 The Perfect Storm Revisited: The Future of Human Factors in Simulation and Training............................................................................................................ 50 Technological Trends....................................................................................... 50 Computation Power......................................................................................... 52 Innovations in Education and Training................................................ 53 The Changing Nature of Education..................................................... 53 Acceptance of Simulation/Gaming...................................................... 54 Acknowledgments..................................................................................................... 54 Disclaimer................................................................................................................. 54 References................................................................................................................. 55

INTRODUCTION Simulation is pervasive. Indeed, we probably received a simulator during the first days, if not hours, of our lives. Interestingly, it was called a “pacifier.” Although the name might have been provided by parents seeking a respite, the pacifier is both a simulator and a stimulator (see Appendix B for definitions). It simulated the nipple

Human Factors in Simulation and Training

3

of a mother’s breast or a feeding bottle and stimulated a sucking/rooting reflex in the infant, which improved the child’s muscle tone and provided a sense of comfort. Young children simulate as they play with their toys, athletes envision their success, and would-be fighter pilots do battle in the safety of their homes, whereas actual and would-be politicians rule simulated cities. For engineers, educators, and trainers, simulation is a standard tool of the trade. Business relies heavily on simulation as part of its planning process. Simulation is not a relatively modern development attributable to technological and mathematical advances. It has a much longer history. We use or have used simulation in many ways in our daily life. Consider the following examples (modified from Raser, 1969): • Make-believe: a child playing with blocks and model toys of various scales. It is interesting to note that as individuals age, the level of fidelity they expect increases. Thus, for most young children, scale is irrelevant, and they see no inconsistency in playing with toy vehicles of various sizes. For an adult, scale is critical, when creating a diorama. • Artificial: as in an “artificial Christmas tree.” • Substitution: margarine (as a butter substitute) or clothing made of natural or synthetic material in lieu of animal fur. • Imitation: faux leather purses or wallets made of a synthetic material. • Deception: simulated storefronts as used in movies and on theatrical stages, or inflated military vehicles positioned where they can be observed by aerial reconnaissance or satellites. • Mimicry: animal calls used by hunters to lure other animals; vibration device built into an infant’s bouncy seat to serve as a surrogate for parental rocking motion. • Metaphor: Shakespeare’s play Henry V is a morality play applicable to modern warfare. • Analogy: simulated sunlight used to reduce seasonal adaptive disorders. • Representation. • Mathematical: a graph representing the dynamics of stock markets. • Logical: a “You-are-here” map. • Physical: ranges from toys (scaled representations) to tinker-toy-like representations of chemical bonds or DNA and holographic images. • Mental: our perception or belief of how a piece of software works. • Visual: computer-driven monitors/displays such as Oculus Rift and HoloLens.

SIMULATION: THE PERFECT STORM The second decade of the 21st century could be described as a perfect storm in which multiple factors converged to increase the use of simulators for training, including the following.

4

Human Factors in Simulation and Training

Technological advances: • Major technological innovations in the capabilities of visual, auditory, and tactile technologies facilitate immersion in the learning process. Today we have multiple untethered head-mounted displays (HMDs) such as Microsoft’s HoloLens (https://www​.youtube​.com​/watch​?v​=uIHPPtPBgHk) and Apple’s Apple Glass (https://www​.youtube​.com​/watch​?v​=JLNvSYr4eeI). Wearable computing has arrived. • Virtual Reality (VR) “finally beginning to come of age, having survived the troublesome stages of the famous ‘hype cycle’—the Peak of Inflated Expectation, even the so-called Trough of Disillusionment” (Ruben & Gray, 2020). Despite claims in the popular press of superior training capacities, VR and traditional training are essentially equivalent. However, Extended Reality (XR) offers the additional advantage of safety, affordability (Kaplan et al., 2019), increased presence and immersion as well as the psychological dimension of flow (Kaplan et al., 2020), and portability. • The development of artificial neural networks, AI and intelligent tutors, allows for virtual adversaries and trainers. In a 2020 simulation, an intelligent adversary defeated a highly qualified F-16 pilot in five rounds of simulated Air Combat Maneuvering (ACM) (https://www​.thedrive​ .com​/the​-war​-zone​/35888​/ai​-claims​-flawless​-victory​-going​-undefeated​ -in​-digital​-dogfight​-with​-human​-fighter​-pilot). With respect to trainers, VIPER (Virtual Instructor Pilot Referee) is an AI tutor designed “to guide and provide feedback to students in practice sessions using virtual reality” (Natali et al., Aug 2020). Live, virtual, and constructive entities can now be combined to fully exercise a system’s capabilities (https://www​ .lockheedmartin​. com​/ en​-us​/ news​/ features​/ 2016​/ the​- future​- of​- training​ -blends​-real​-and​-virtual​-worlds​.html). • Commercial off-the-shelf (COTS) computer displays and networking technologies were pivotal to accessing cutting-edge training simulations (Wilson, 2018). Schultz (2014) describes how Boeing capitalized on an operator’s gaming experience by using a common Xbox game console controller to control the US Army’s high-energy laser mobile demonstrator. • Access to broadband, while still not universal, has increased. Increased computation power: • The fusion of inexpensive computation power, affordable broadband, and wireless networks in a persistent cloud-based environment that is available 24/7. • The exponential decrease in computation and storage costs over the past decades: 1 MB of memory cost $1 million in 1967, and by 2017 it cost $.02 (Mearian, 2017), which enabled the creation of realistic visualizations and dynamic human–computer interactions. • Reduced size, weight, and power requirements are embedded in these decreased costs.

Human Factors in Simulation and Training

New generation of leaders and learners: • Millennials (born between 1981 and 1996) and Generation Z (born 1997–2012) have imprinted on technology-based platforms for education, accessing information, gaming, and entertainment. They are more willing to accept technological change than their predecessors. Indeed, they expect these changes since they have grown up with the Internet, electronic gaming, online research, instantly accessing data and videos. Lectures and linear learning are things of the past to these digital natives. • Today’s upper management and middle management decision-makers were the first generation of “tech-savvy” personnel during the 1990s and early 2000s. They recognized the cost-benefits of simulation in selection and training. They are now positioned to facilitate/mandate administrative changes favoring simulation. Consider agencies such as such as the FAA, which now accepts time spent on qualified training devices as contributing to time spent flying the actual aircraft (https://www​.faa​.gov​/news​/updates/​ ?newsId​=85426). The US Navy is now certifying submarine bridge crews in a Submarine Bridge Trainer (SBT) (https://www​.navaltoday​ .com ​/2017​/06​/06​/us​-navy​-opens​-new​-submarine​-bridge​-trainer/, https:// www​. dvidshub​ . net ​ / news ​ / 236453​ / realistic ​ - submarine ​ - bridge ​ - trainer​ -opens​-trident​-training​-facility​-bangor). AI-based Learning Management Systems (LMS) and eLearning allow on-site and remote employees to learn online. Properly designed simulations increase employee engagement and performance (https://elearningindustry​.com ​/ai​-based​-humanlike​-trainers​ -change​-corporate​-learning). Innovations in education and training: • The changing nature of education: • Increased use of electronic platforms and software as educational tools. • Acceptance of distance learning, which was, in part, imposed by the COVID-19 pandemic. • Focus on performance and demonstration of learning as opposed to class hours. • Recognition of individual differences in learning rates. • Gradual introduction of avatar tutors into the learning process. • Immersion and deliberate practice recognized as key contributors in the development of expertise. • Today’s teachers are being trained in how to effectively utilize emerging simulation technologies and incorporate them into hybrid teaching, which involves some online and some in-person students. • “Training anytime, anyplace” has become a mantra. Many simulations are now portable across platforms (including handheld devices), albeit with some loss of capability and resolution, as they connect wirelessly to cyberspace. A good example would be the US Navy’s RRL (Ready, Relevant Leaning) program (https://www​.netc​.navy​.mil​/ RRL), which is designed to

5

6

Human Factors in Simulation and Training

deliver “the right training in the right way.” The teaching tools used range from: “YouTube-like videos to more complex, immersive simulators and virtual trainers.” BFTT (Battle Force Tactical Training) provides stimulation/simulation of shipboard tactical equipment to facilitate team training. This allows crews to practice on the actual equipment they would be using in real combat. Multiple vessels at a variety of locations can participate in coordinated training exercises (AN/USQ-T46. January 15, 2020). • Simulation has blended training, mission planning, rehearsal, and execution into one continuum. • Educators in high-risk environments such as the medical, chemical, nuclear, and aerospace industries now have tools that better enable them to deliver realistic training in safe environments, which increases training transfer while reducing overall risk. These tools are cost-effective since they require fewer resources to serve multiple learners. They also allow for experimentation and testing, which could not ethically be performed on humans or animals. Global acceptance of simulation/gaming: • Higher-level simulations are no longer the sole domain of technologists; with increased accessibility to more capable and affordable personal computers, simulations are now used daily by less technical people. Businesses use simulations for selection, training, and evaluation (Langley et al., 2016). In the workplace, simulations such as weather and climate forecasts, expected aircraft delays, stock market projections, mortgage cost estimates, and retirement projections are common tools. Models predicting COVID-19 hotspots and optimizing vaccine distribution have been accepted by the public. • Gaming has become more common. Gaming simulations range from individual “first person” games such as Halo and Grand Theft Auto to massive multiplayer online (MMO)/MMO Role-Playing Games (MMORPG) games such as PlayerUnknown’s Battle Grounds, Fortnite, Minecraft, World of Warcraft, etc. Students in grades 3–12 use Minecraft to learn about environmental challenges, food security, and global issues affecting agriculture (https://www​.nasef​.org​/learning​/farmcraft/​?et​_rid​ =40892865​&et​_cid​=3725425). • Self-contained gaming systems which integrate controls, displays, and a workstation into an area with a small footprint are emerging. Consider the concept gaming chair introduced at CES 2021. This chair provides self-contained, panoramic visuals in a 60 “rollout display, tactile feedback, and a transformable table for PC and console gaming” (https:// www​.razer​.com ​/ca​- en ​/concepts​/razer​-project​-brooklyn). It goes beyond the current motion-simulating devices (inflatable seat pan cushions and backs, rumble motors, “butt kickers” and harness tighteners) described by Alexis (2020).

Human Factors in Simulation and Training

7

HUMAN FACTORS IN SIMULATION AND TRAINING: A BRIEF HISTORY Where would society be if humans did not have the ability to simulate? Where would the world be if individuals such as Lincoln, Roosevelt, Churchill, Gandhi, Martin Luther King, Walt Disney, Bill Gates, and Steven Spielberg had not had dreams and visions (simulations) of a different world? Indeed, one could speculate that at some level the ability or drive to simulate is an essential part of “proactive” evolution. This ability is essential to human creativity, as Hofstadter (1979, p. 643) opined: “How immeasurably poorer our mental lives would be if we did not have this creative capacity for slipping out of the midst of reality into soft ‘what ifs’!” Schrage (2000, p. 13) stated, “We shape our models, and then our models shape us.” To a large extent our internal models determine our perceptions, which in turn can influence and perhaps determine our behaviors in the real world. As individuals, we routinely utilize simulation when we engage in what is colloquially described as “wishful thinking” and in the psychological literature as “counterfactual thinking” (Roese & Olson, 1995). Counterfactual thinking is often characterized by conditional statements with an antecedent such as, “if only” and a consequence statement such as, “then” (Roese & Olson, 1995). Sternberg and Gastel (1989a, 1989b) describe counterfactual thinking as an important component of intelligence. Indeed, Tetlock and Belkin (1996) described counterfactual thinking in world politics, including the rise of Hitler, Western perceptions of Soviet politics, and the Cuban missile crisis. Both today’s computer-based simulation and the ubiquitous Internet evolved from Department of Defense (DoD) efforts, as have many other innovations, including emergency medical services, blood plasma, and lasers. During World War II, simulation supported part task flight training provided as a “blue box” (Figure 1.3), which used instruments within an enclosed cockpit, and had minimal physical displacement. During the 1970s, 1980s, and 1990s, the word simulation invoked images of moving-based simulators tossing cockpits about atop platforms supported by giraffe-like hydraulic legs. These full six degrees-of-freedom (DOF) hexapod motion simulators were housed in special air-conditioned buildings and supported by a highly skilled technical staff. But research at the end of the 20th century and in the early part of the 21st century led to the decline of full six DOF hexapod motion simulators. While airlines and the DoD had invested heavily in motion-based simulators, the belief that “since the airplane has motion, flight simulators must also have motion” was severely challenged as researchers re-examined assumptions regarding requirements for motion. Boothe (1994) argued that in simulators and flight training devices (FTDs), the emphasis is not just on accomplishing the required task; but for maximum “transfer of behavior,” the task must be performed exactly as it would be in the aircraft. Thus, the same control strategies and control inputs must be provided in both the aircraft and the simulator. He believed that the emphasis should be on appropriate cues, as identified by pilots, who are the subject-matter experts. To achieve this end, Boothe emphasized replication of form and function, flight and operational performance, and perceived flying (handling) qualities. He noted that government and industry working groups utilize realism as their reference and safety as their justification.

8

Human Factors in Simulation and Training

However, Roscoe (1991) argued that “qualification of ground-based training devices for training needs to be based on their effectiveness for that purpose and not solely on their verisimilitude to an airplane” (p. 870). Roscoe concluded that pilot certification should be based on demonstrated competence, not hours of flight experience. Lintern et al. (1990, p. 870) argued further that for “effective and economical training, absolute fidelity is not needed nor always desirable, and some unreal-worldly training features can produce higher transfer than literal fidelity can.” Caro (1988, p. 239) added, “The cue information available in a particular simulator, rather than stimulus realism per se, should be the criterion for deciding what skills are to be taught in that simulator.” This position was reinforced in a 2020 study on the use of lethal force by Blacker, et al. Participants with varying levels of experience participated in both marksmanship and shoot/ don’t shoot scenarios were trained on both a videogame and military-grade shooting simulator. Results supported the notion that shooting accuracy and decision making are independent components of performance. Individuals with firearms expertise outperformed novices on the military-grade simulator, but only with respect to shooting accuracy, not unintended casualties. Individuals with video game experience outperformed novices in the video game simulator, but again only on shooting accuracy.

The authors demonstrated that the capabilities of the less-expensive computer game simulation were adequate when evaluating the lethal-force-decision-making process. For an additional illustration of the importance of selecting the appropriate technology for training, see the parachute/water entry trainer in the mixed reality (MR) section of this chapter. Thus, there are important differences of opinion regarding the specification of a simulator’s capabilities. Scientists began to systematically re-examine assumptions regarding the requirements for motion. Burki-Cohen et al. (2007) at the Volpe National Transportation Systems Center, in a proof-of-concept study, evaluated the training value of a fixed-base flight simulator with a dynamic seat. They concluded: “This research did not find operationally relevant differences in performance or behavior of pilots tested in the FFS with motion after having been trained in the same FFS [Full Flight Simulator] with the motion system turned on or off—despite selection of maneuvers that require motion cues, at least theoretically.” It made no difference whether the FFS represented a small turboprop “power house” or a sluggish four-engine jumbo jet, or whether the training in question was initial or recurrent training. Sparko et al. (2010), working with pilots with fewer than 500 hours of experience, concluded that there were no differences in the training success of pilots trained with either simulator type. In 2011. Bürki-Cohen et al. reported on a series of studies which addressed the following questions: 1) Are there maneuvers in airline-pilot training where platform motion cues, in addition to the visual cues from a wide-FOV OTW (Field-of- View/Out-the-Window) view and instruments, result in an operationally relevant improvement of transfer between the simulator and the airplane? 2) Do airline pilots need to be trained to avail themselves of motion cues? 3) Are motion cues from a hexapod platform representative of

Human Factors in Simulation and Training

9

those experienced in the airplane? 4) Can alternative systems provide onset cues and perception of realism?

They concluded that the answer to the first three questions was No and the answer to the fourth question was a resounding Yes, based not only on their findings but also on the training provided on fixed base platforms by multiple smaller airlines. In addition to the science just discussed, the lower cost of training in simulators is also a driving factor. According to Alexis (2020): A turnkey purchase price of a brand new and advanced fixed base simulator is in the region of £200,000 ($250,000, €220,000), whereas an Airbus A350 or Boeing 787 Level D simulator is approximately £15m ($18.5m, €16.5m). 1 hour in a fixed base simulator is approximately £120 ($150, €135) compared to 1 hour in a full motion simulator at £450 ($560, €500).

For an interesting comparison of commercial fixed base flight simulators visit https:// simulatorreview​.com/ In 2018, Losey described how the USAF is replacing legacy simulators, costing $4.5 million each, with VR simulators costing $15,000 each. For details on use of MR simulators, see Section Extended Reality: Augmented Reality, Virtual Reality, and Mixed Reality. Within the past 20 years, the criticality of human factors in training and simulation has become more apparent in system development: from the conceptual phase to test and evaluation and deployment phase. Human factors impact both effectiveness and efficiency. As discussed above, military aviation was an early focus and lessons learned in that domain have transferred to domains such as civil aviation, energyrelated industries, automotive systems, and healthcare. A quick perusal of the chapter titles in this book testifies to that fact. For a broad overview of the role of simulation in training, see Gawron, 2019, who describes: (1) the role of simulation in a training curriculum, (2) how to measure transfer of training from the simulator to the real world, and (3) the types of simulations and simulators used for training. Her report provides information on applications in environments ranging from medical facilities, driving, and aircraft to weightlessness. Let us now discuss the why of simulation, clarify the distinction between simulation and modeling, and describe the modeling and simulation (M&S) process.

WHY SIMULATE? Before examining simulation in more detail, it is appropriate to ask, “Why did simulation evolve?” Simulation may have evolved for a variety of reasons, but its origins may be best attributed to the organism’s strategy of accomplishing tasks with the least amount of effort thus conserving energy, avoiding overload, and maintaining homeostasis. This strategy is apparent even on a physiological basis, when our perceptual system filters data as we transform it into information. Thus, we may simulate because it is both effective and efficient.

10

Human Factors in Simulation and Training

The reasons we simulate may be reflected in an examination of our uses of simulations and simulators. Currently, simulators are frequently used for training of operators (aircrew, physicians, bus and truck drivers, ship navigators, nuclear power plant operators, etc.) and maintainers (troubleshooting by technicians). They are also used to maintain and evaluate individuals’ proficiency level and allow individuals to qualify for a particular position or rating such as the pilot of a single-seat aircraft, surgeon, or nuclear power plant operator. Manufacturers routinely use digital prototypes for product research and development (e.g., automobile and aircraft cockpit prototypes). Systems engineers routinely use M&S as a prime means for managing the cost of developing, building, and testing increasingly complex systems. Test and evaluation experts rely on M&S to help design test scenarios/experiments and to support T&E execution and post-test analysis and reporting (see https://www​.dau​.edu​ /guidebooks​/Shared​%20Documents​/Test​_and​_Evaluation​_Mgmt​_Guidebook​.pdf and https://acqnotes​.com​/acqnote​/careerfields​/modeling​-and​-simulation​-support). According to the Lindenberger Group (2017), businesses use simulations because they provide experiential learning with immediate feedback, increase knowledge retention, facilitate cooperation and encourage competition; provide a risk-free environment in which to fail, provide quantifiable training, and are less costly in terms of time and money. Educators use simulation to “demonstrate abstract concepts, allow interaction between users and simulated equipment, and provide users with feedback that allow users to improve their knowledge and skills. They are also cost-effective over the long-term” (Electronics Technician Training, Nov 16, 2018).

SIMULATION VERSUS MODELING What is simulation? What is a model? Is there a difference between simulation and modeling? In common usage, the terms are interchangeable. Schrage (2000, p. 7) commented: Just what are the differences between models, simulations and prototypes? Once upon a time there was a fighting chance to answer this question simply and clearly. Today, technologies have conspired to turn any answer into a confusing jumble of semantics that obscure understanding.

Indeed, the terms are linked so closely that DoD Directive 5000.59-M (1998) provided a joint definition for M&S as “the use of models, including emulators, prototypes, simulators, and stimulators, either strategically or over time, to develop data as a basis for making managerial or technical decisions.” Raser (1969, p. 6) describes a simulation as “a special kind of model, and a model is a special way of expressing theory.” A “theory is a set of statements about some aspect of reality such as the past reality, present reality, or predicted reality. A theory attempts to describe the components of that reality and to specify the nature of the relationships among those components” (p. 6). Thus, for Raser, a model is a specific form of a theory, whereas a simulation is “an operating model that displays processes over time and that thus may develop dynamically” (p. 10).

Human Factors in Simulation and Training

11

It may be helpful to compare the usage of the terms modeling and simulation as nouns and verbs. According to DoD Instruction 5000.61 (Department of Defense, 2009), simulation (the noun) is defined as an executable implementation of a model, or execution of an implemented model, or a body of techniques for training, analysis, and experimentation using models, whereas model (the noun) is defined as a “physical, mathematical, logical, or other representation of a system, entity, phenomenon, or process.” This distinction is also reflected in differentiating between the verbs modeling and simulating. According to DoD Directive 5000.59 August 8, 2007, Incorporating Change 1, October 15, 2018, simulating is defined as “a method for implementing a model over time.” Modeling is the “application of the standard, rigorous, structured methodology to create and validate a physical, mathematical, or otherwise logical representation of a system, entity, phenomenon, or process.” According to the above definitions, the distinguishing feature is that a model is used to produce a simulation, whereas simulations implement models. Within the training community, which focuses on the outcome of modeling, the terms are often seen as interchangeable. But individuals who perceive modeling as a descriptive process see the model as the end product rather than the simulation of that model as the end product. However, as the terms are more commonly used interchangeably, the authors have also used the terms interchangeably throughout the remainder of this chapter. The DoD, a leading developer and procurer of models and simulations, has developed fairly concise pragmatic definitions of simulation and modeling terms. Other professional organizations, such as IEEE and Military Operations Research Society (MORS), have also made efforts to standardize the terminology. Appendices A (Modeling) and B (Simulation) are provided to help clarify ambiguity associated with these terms as well as facilitate the reading and understanding of the material contained in the following chapters and related readings. An updated online modeling and simulation glossary is maintained at https://www​.msco​.mil​/ MSReferences​/ Glossary​/ MSGlossary​.aspx

The Modeling and Simulation Process: Verification, Validation, and Accreditation A computer model can exist and be valid only in a virtual world (e.g., an antigravity vehicle created for a video game) or can mimic some portion of the real world (e.g., motorcycle being designed for manufacture following the rules of physics). The model development process and the software development process have a similar structure: define the end state and the objectives (criteria for acceptance of the model), and follow a consistent system engineering process. In 2010, Johns Hopkins University Applied Physics Laboratory (JHU/APL) published a guide defining a systems engineering framework focused on the best practices for the development of credible stand-alone models and simulations (Morse et al., 2010). Just as computers have evolved into faster computational, storage, and visualization tools, so too have software development processes modernized and become faster. Terms such as Agile, Lean, DevSecOps, computer factories, continuous

12

Human Factors in Simulation and Training

delivery, and human-centered design appear as part of commercial best practices to rapidly develop, field, and continually upgrade software products, including simulations for training, experimentation, and the like (DoD Enterprise DevSecOps Reference Design Version 1.0 12 August 2019). As the pace of developing, building, and deploying training and simulation systems continues to accelerate, the credibility of these models and simulations must be addressed. This is important for investors who use models and simulations to guide their purchase of stocks and bonds to realize profits and to aircrew who depend on transferring what they learn in the simulated flight environment of a six degreeof-freedom flight simulator to the real-world flight environment of a F-35 fighter aircraft. To develop credible products, the process of developing a model must be scrutinized in parallel through the verification, validation, and accreditation (VV&A) process. Each of these terms is discussed in the following paragraphs. Verification is the process for determining if the model does what the customer intended. That is, it is the process that examines the model implementation and its associated data to determine how accurately it represents the customer’s conceptual description and specification for the model. It answers the question, “Does the model actually represent the design intent and accurately reproduce the model specifications?” This is typically done by examining the code and logic of the software to verify that it implements the conceptual model and that the model produces the correct results when needed. Validation is the process for determining how accurately the model and its associated data match the real world within the context of the intended use of the model. It answers the question, “Does the model adequately depict the ‘real’ world that it was designed to represent?” Subject matter experts routinely “run the model through its paces” to test whether it acts and interacts in a simulated world as expected. Thus, a car test driver may be the ideal expert to determine whether the simulated handling characteristics of a new car model that only exists on a computer is an acceptable representation. Accreditation is an official certification that there is sufficient evidence to show that the model is credible and suitable for a particular purpose. It answers the question, “Does the model provide an acceptable answer to my particular questions, or can it be used for my specific purpose?” Accreditation confirms that the simulation can meet the specific requirements developed in response to the objectives established at the beginning of the M&S development process. VV&A are three intertwined processes that increase the credibility of the model, reduce risk, and increase the user’s confidence in the M&S tool. These processes verify the model was made right, validate the right model was made, and accredit the model is right for the intended application (Balci, 1998). In essence, VV&A provides risk reduction, by ascertaining that the simulation supports the users’ objectives. It calibrates the credibility of the model or simulation for an intended use. There is no one cookie-cutter VV&A process that can be applied to the development of all models and simulations. VV&A processes must be tailored to the nature of the problem. How much and what parts of VV&A are employed (i.e., resources

Human Factors in Simulation and Training

13

expended) are highly dependent on the importance of the decision that will be made based on the simulation. The VV&A process can make the model “transparent” enough for people to understand its assumptions and its limitations. As the demand for M&S continues to grow, the VV&A process continues to evolve and must become as agile as the software development processes are. As these techniques evolve, so does the VV&A lexicon, portions of which are provided for the interested reader in Appendix C. The Department of Defense Modeling and Simulation Enterprise VV&A site (https://wwa​.msco​.mil) has standardized documentation templates for V&V plans, the accreditation plans, and the reports for both in accordance with MIL-STD 3022. It also has a recommended best practice guide to assist those planning to do VV&A on their model or simulation. The site supports the DoD modeling and simulation VV&A instruction DoDI 5000.61. For a general overview on VV&A processes, see Youngblood et al. (2000). There are many V&V methods available that range from informal reviews and walkthroughs to formal methods of Lambda Calculus and Proof of correctness (Balci, 1998; Balci et al., 2002; Petty, 2010).

ADVANTAGES AND DISADVANTAGES OF SIMULATION Since M&S is so essential to many human endeavors, its advantages and disadvantages are described in the following sections.

Advantages Cost-Effectiveness Simulation and simulators are used primarily because they are cost-effective. They save both time and money while achieving some desired end. One objective of effective training systems is to provide the required training at the lowest possible cost. Simulation is a means for achieving that objective. Baudhuin (1987, p. 217) stated, “The degree of transfer from the simulator to the system often equates to dollars saved in the operation of the real system and in material and lives saved.” Within the aviation community, the effectiveness of simulators is accepted as an article of faith. Indeed, the aviation industry could not function without simulators and FTDs, whose existence is mandated by FAA regulations (1991). As early as 1949, Williams and Flexman documented that even crude training devices (e.g., early simulators) produced payoffs with fewer trials, fewer flight hours, and fewer errors to qualify in the aircraft. The simulators were both cost- and training-effective devices (Flexman et al., 1972). In a very detailed analysis of cost-effectiveness, Orlansky and String (1977) reported that flight simulators for military training can be operated at between 5% and 20% of the cost of operating the aircraft being simulated; median saving is approximately 12%. They also reported that commercial airlines can amortize the cost of a simulator in less than nine months and the cost of an entire training facility in less than two years. Commercial airlines that use simulators accrue additional savings since they do not incur the loss of revenue associated with using aircraft for in-flight training.

14

Human Factors in Simulation and Training

In 1994, Beringer noted that since the cost of simulation has decreased as the capabilities of simulators have increased, today’s question is more often phrased as, “If we can get more simulation for the same investment, what is the ‘more’ that we should ask for?” Thus, according to Beringer, cost is seen as facilitating, rather than prohibitive. The cost of computer-based simulation has decreased exponentially in the past 30 years, while processing speed and input/output capability have increased. In addition, the requirements for supporting infrastructure have decreased. The simulators described in the extended reality section of this chapter have a minimalist footprint and do not require special air-conditioning facilities or specially trained maintenance personnel. Today’s effectiveness questions are focused on how the required skills can be taught rapidly and inexpensively. In the healthcare area, Beal et al. (2017)’s metaanalysis of studies teaching medical students critical care medicine indicated that simulation was significantly more effective than other educational interventions or no intervention. While simulation improved skill acquisition, it was no better than other teaching methods in knowledge acquisition. They reported that their review was unable to address differences between types of simulation technology, the effect of duration or frequency of simulation teaching (the “dose” of simulation), the optimal timing by year of study, or retention of skills poststimulation. Further work is also needed to categorize the cost effectiveness of simulation-based teaching, because equipment and operational costs are high. (p. 113)

An earlier meta-analysis by Cook et al. (2012) reported that only five studies reported cost with comparison training. They concluded that simulation training was more costly (in money or faculty time) but more effective. Availability Many simulations are available 24/7, and do not require the physical presence of the objects being simulated or physical access to a simulator site. Many virtual reality/ augmented reality simulations can be provided on self-contained systems such as Microsoft’s HoloLens, Oculus Quest, and Varjo. The Extended Reality section of this chapter describes multiple training systems which could be available 24/7. Simulators provide immediate access to a simulated location (described in the latitude and longitude) under specified environmental conditions (day, night, fog, sea state, etc.). Thus, the need for an actual physical presence at a location under specific environmental or operational conditions can be achieved. For example, simulators allow a student to complete an instrument landing system (ILS) approach and return immediately to the final approach fix (FAF) to commence the next ILS approach, without consuming time and fuel. Indeed, because simulators allow the instructor to “control reality,” conflicting traffic in the landing approach can be eliminated to further increase the number of approaches flown per training session. In short, simulators provide more training opportunities than could be provided by an actual aircraft

Human Factors in Simulation and Training

15

during the same time. Simulators also provide training in nonexistent aircraft or in aircraft where an individual’s first performance in a new system is critical (consider the first flight in a single-seat aircraft and landings on asteroids or Mars). Some surface vehicles and vessels are embedding simulations that allow training to take place in the actual vehicle (see https://www​.netc​.navy​.mil​/ RRL https:// www​.tecom​.marines​.mil​/ Units​/ Divisions​/ Range​-and​-Training​-Programs​-Division​/ sandbox​/ DVTE/ and the capability described in the reduced environmental impact section in this chapter). In the case of remotely piloted air vehicles, a practice that is likely to become ever more prevalent is the use of the same operational terminal used for mission execution to perform training, as in the case of the US Navy’s MQ-4 Triton Unmanned Aerial System (Lutz et al., 2017; Naval Technology, 2021). Safety Simulation provides a means for experiencing normal conditions in a safe and nonthreatening environment. It also allows individuals to be exposed to controlled critical conditions that they hope they would never encounter, such as loss of control of a vehicle or egress during a fire. Simulation also provides an opportunity for initial qualification or requalification in a variety of workplaces such as control rooms of nuclear power plants and vehicles such as the space shuttle. For some tasks the ability to control the simulated environment is critical (e.g., hyperbaric chambers). Within the aviation domain, due to safety concerns, simulators may be the only way to teach some flight maneuvers or to expose aircrew to conditions that they are unlikely to experience under actual flight conditions (e.g., engine separation at takeoff, wind shear, loss of hydraulic systems, and engine fire). While UAV (unmanned air vehicle) controllers are usually located remotely from the vehicles under their control, they are responsible for safety of flight. Ribeiro et al. (2021) recommend creating virtual obstacles on Augmented Reality (AR) devices to reduce their fear of crashing in real environments. Simulation is routinely used for crew resource management (CRM) research and training. CRM evolved from cockpit resource management (Wiener et al., 1993). The original objective of CRM was to reduce the number of aviation accidents and incidents by increasing the effectiveness of cockpit crew coordination and flight deck management. Since its introduction in 1978, CRM training has been expanded to include not just the cockpit flight crew but also cabin crew members, dispatchers, maintenance, and security personnel (Helmreich et al., 1990; Weigmann & Shappell, 1999; Salas et al., 2000; Weigmann & Shappell, 1999). CRM has also been incorporated into the training of medical teams, particularly in operating rooms (Helmreich & Schaefer, 1998). Gross et al. (2019) reviewed the use of CRM training in healthcare. The effectiveness of CRM was perhaps best demonstrated on January 14, 2009, when Capt. Chesley (Sully) Sullenberger landed a disabled aircraft in the Hudson River and the crew safely evacuated all passengers (Muhlenberg, 2011). In addition, automation has increased the need for simulators, as Wiener and Nagel (1988, p. 453) commented, “It appears that automation tunes out small errors and creates the opportunities for larger ones.” In automated glass (cathode-ray-tubeladen) cockpits, improvements in system reliability have reduced the probability and

16

Human Factors in Simulation and Training

frequency of system problems, thus inducing a sense of complacency among the aircrew. However, when an unanticipated event occurs, the crew must be trained to respond rapidly and correctly. Simulators provide an opportunity for that type of training. The need for simulator training became apparent in the October 2018 and March 2019 losses of two 737Max aircraft carrying 436 people (https://transportation​ .house​.gov​/download​/endsley​-testimony, Kitroeff & Gelles (Jan. 8, 2020). Surrogate Value Simulation also reduces the usage of the actual system. Thus, with transportation systems, it reduces the exposure and number of hours on the actual vehicle, which in turn reduces mechanical wear and tear, as well as acquisition, sustainment, operations, and maintenance costs. To put that in perspective, in 2019, the estimated cost per flight hour of the F-35, USAF’s most advanced fighter, was $35,000 (Insinna, 2019). Simulation also reduces infrastructure load (viz., highways or the National Airspace System). From an economic perspective, commercial airlines can use aircraft not required for training purposes on revenue-producing flights. The US Navy has a Blue and a Gold team which rotate on the same submarines. The use of submarine bridge trainers allows one crew to qualify bridge teams ashore while the other team is at sea (Gray, June 8, 2017). Reduced Environmental Impact Today, we have an emphasis on environmental concerns and the vehicles being simulated do not pollute, consume fuel, create noise, or leak hazardous substances (e.g., radiation). Neither do they damage people or property; indeed, simulated patients and bomb-damaged areas are repaired at the flick of a switch. Magnuson (2019) described a demonstration of how a live, virtual, constructive (LVC) training environment was used to “expand” the test range at Nellis Air Force Base. Advanced technologies allowed the pilot of a live airborne aircraft operating within the confines of the test range to participate in an exercise against virtual threats and targets that were presented on the aircraft’s displays. These threats and targets behaved realistically and could be presented beyond the physical boundaries of the range. None of the simulated adversary aircraft consumed fuel and simulated ordinance did no physical damage, while the pilot was challenged in the simulated threat environment. Improved Training Environment Simulators incorporate instructional features that enhance student learning and facilitate instructor intervention. Gawron (2019) describes (1) the role of simulation in a training curriculum, (2) how to measure transfer of training from the simulator to the real world, and (3) how to know the types of simulations and simulators that are being used for training. Moroney and Moroney (2010, Table 19.1) provide a detailed listing of instructional features incorporated into modern simulators, including: • Simulator instructor options such as preset/reset, crash/kill override, playback/replay, motion, and sound.

Human Factors in Simulation and Training

17

• Task conditions such as time of day, seasons, weather, and wind direction and velocity, as well as realism, aircraft stability, and instrument malfunction. • Performance analysis/monitoring features such as automated performance measurement, debriefing aides, warnings, and advisories that a preset parameter is about to be or has been exceeded. Standardized Training Environments Simulators can provide identical flight dynamics and environmental conditions from training session to training session. Thus, the same task can be repeated until the required criteria are attained, and, indeed, until the task is overlearned (automated). Unlike the airborne instructor, the simulator instructor (SI) can focus on teaching the task without safety of flight responsibilities or concerns about violations of regulations. Thus, the instructor may deliberately allow a student to make mistakes such as illegally entering a terminal control area or flying below an assigned altitude. Provide Data Simulation provides opportunities for data collection, which are not available in the real world. Although the critical issue of collecting the right data is beyond the scope of this chapter, data collection permits the following: • Performance comparison: As part of the diagnosis process, the student’s performance can be compared with the performance criteria, as well as the performance of other students at the same stage of training. • Performance and learning diagnosis: Having evaluated the student’s performance, a teacher or instructor can gain some insight into the student’s learning process and suggest new approaches in problem areas. • Performance evaluation: Performance measurement can be used to evaluate the efficacy of different approaches to training a particular task. Lack of Realism Despite the emphasis on high fidelity and “realism,” simulators are not realistic; rather they allow instructors to manipulate reality. In a sense, the lack of realism may contribute to their effectiveness. Lintern (1991, p. 251) stated that transfer could be enhanced by “carefully planned distortions of the criterion task.” In addition, most instructional features found in simulators do not exist in the cockpit or the device being simulated. Indeed, if real cockpits had the same features as simulators, the “RESET” button would be used routinely, as it can undo and defy the laws of physics. The entertainment industry relies on this ability to suspend reality to move audiences to nonexistent worlds. Viewers readily suspended reality when the USS Enterprise on Star Trek goes to Warp Speed and when “repulsion technology” is used to power the “Land–Speeders” in Star Wars.

18

Human Factors in Simulation and Training

Disadvantages of Simulation Does Not Necessarily Reflect Real-World Performance Although performance in a simulation with reasonable fidelity is probably indicative of an individual’s expected performance in the real world, we must recognize that performance in a simulator does not necessarily reflect how an individual will react in the real world. There are at least three reasons for this: • Because there is no potential for an actual accident in a simulator, it would seem reasonable to expect that a trainee’s stress level would be lower in a simulator. However, the stress level may be increased when an individual’s performance is being evaluated or when they are competing for a position or a promotion. • To the extent that team or individuals being evaluated or seeking qualification-in-type expect an emergency or unscheduled event to occur during their time in the simulator, their performance in a simulator may not reflect their in-flight performance. The trainee(s) would, in all probability, have reviewed operating procedures before the start of their period in the simulator. Nonetheless, it should be recognized that the student’s review of procedures even in preparation for an evaluation is of value. Consider how difficult it would have been to induce the startle effect that contributed to the loss of Air France Flight 447 on a flight from Rio de Janeiro to Paris. Frozen airspeed sensors deprive crew and flight computers of airspeed data, and the crew induced an aerodynamic stall which resulted in the loss of 228 lives (https://humanfactors101​.com ​/incidents​/air​-france​ -flight​- 447/, Geiselman et al., 2013). • Performance (particularly vigilance and situational awareness) in a simulator, when being evaluated, rarely reflects the fatigue or boredom common to many workplaces. Therefore, performance in a simulator may be better than actual in-flight performance. Surrogate Value A possible downside to surrogate value within the military is that reduced utilization of the hardware will lead to fewer and less experienced maintenance personnel and reduced supply chain requirements. These apparent savings may also create personnel shortages and logistic problems when the operational tempo rises beyond the training level. To offset the reduced experience among maintainers, AR-enabled maintainer solutions have been developed to support refresher and deployed training needs (N193-D01, n.d.). Also see the AR for Aircraft Maintainers section later in this chapter. User Acceptance The acceptance and use of simulators is subject to the attitudes of simulator operators, instructors, trainees, and management. The increased use of computers, from preschool to college, has raised their acceptance in the education and training process.

Human Factors in Simulation and Training

19

However, trainees often voice the attitude that “they expected to be using the real system and not a simulation.” User acceptance can be significantly influenced by a management policy that monitors the appropriate use of simulation. A critical element in this process is determining the appropriate performance metrics. Measuring effectiveness is a fairly complicated process that has performance measurement at its core. Lane’s (1986) report is a “must read” for individuals interested in measuring performance in both simulators and the real world. Mixon and Moroney (1982) provided an annotated bibliography of objective pilot performance measures in both aircraft and simulators, whereas Gawron (2000) provided a compilation of measurement techniques. Readers interested in measuring transfer effectiveness are referred to Gawron (2019) and Boldovici (1987) for a discussion of sources of error and inappropriate analysis for estimating transfer effectiveness. Overall, the advantages significantly outweigh any real or perceived disadvantages as evidenced by the general acceptance of simulators by management, trainees, and regulatory agencies.

A SAMPLING OF PROGRESS IN SIMULATION Considering the many facets of simulation, we realized that any detailed discussion of all the areas in which simulation has been applied is beyond the scope of this chapter. Therefore, we examined two different human-related domains that have played a major role in the history of simulation—namely, wargaming and aviation. Wargaming, which originally focused on decision-making and strategy development, has evolved into distributed mission training at multiple locations on a distributed simulation network. Aviation has pushed the development of simulation technology from manually manipulated simulators to multiple live and virtual aircraft at different physical locations engaged in simulated air combat mission (ACM) training simulations. Manipulation of reality is perhaps the area in which the greatest growth has occurred in the past two decades. Therefore, we have devoted a section to the discussion of extended reality, which includes a section on the application of augmented reality in training recognition skills. We conclude this section with a discussion of unique simulators which will not be discussed in other chapters of this book.

War-Gaming The ubiquitous Internet began as a Defense Advanced Research Project Agency (DARPA) effort to ascertain critical information that could successfully pass over multiple communication paths despite the destruction of some nodes. Many other innovations, including emergency medical services, blood plasma, and lasers, have warfare as their origin. Modern simulation is no exception, as it evolved from warfighting strategies. Simulating battles through games can be traced back to the Hindu game Chaturanga and the Eastern game GO (Allen, 1989). Within the Western culture, chess is sometimes considered an early instantiation of simulation. Consider the elements involved in chess: each of the actors (knights, kings, queens, rooks, castles,

20

Human Factors in Simulation and Training

etc.) has a role, predetermined start points, and specific rules of engagement with which they must comply. Chess involves both deterministic elements (rule compliance) and probabilistic elements (anticipation of the opponent’s next move). The earliest known American chess book, Chess Made Easy by J Humphreys was published in 1802 and the first American Chess Congress in 1857 heralded a chess craze in the United States. No doubt, military officers also shared in that excitement for chess. In 1882, Army Major William R. Livermore wrote “The American Kriegsspiel, A Game for Practicing the Art of War on a Topographical Map,” which introduced war-gaming to the US military. Look-up tables were used to determine some outcomes (e.g., casualty rates were determined as a function of range, firepower, and duration of exposure), whereas a roll of the dice added the element of chance to the probability of the outcomes. The US Navy also adopted war-gaming. Starting almost 100 years ago between World War I and II, the Naval War College conducted 136 war-games (http://news​.usni​.org​/2013​/09​/24​/ brief​-history​-naval​-wargames, https:// www​.youtube​.com​/watch​?v​=PzwDWX8oQn8). The Naval War College (Figure 1.1) developed wargaming without the benefit of computer hardware and software. The faculty had to rely on “human computers” who had a diversity of education and naval operational experience. The games were not mechanically nor electronically operated. Rather, they relied on paper to pass orders and messages and physical markers and ship models on floors to track progress through different theaters of battle. What it lacked in sophisticated modeling

FIGURE 1.1  War-gaming floor in Pringle Hall circa 1950 (from the U.S. Naval War College Archives) that looks like the games before World War II.

Human Factors in Simulation and Training

21

and simulation was made up for with brain power supplied by the faculty, umpires, and students. The immersion into the war was limited only by the imagination of the gamers who could visualize the battlespace. Although the process was basic, the insights gained from these simple models were useful. “Simulations” helped the War College students do the intellectual heavy lifting as they worked through the war games. The insights from the students and instructors were useful to the Navy leadership that had to plan for a potential war across the expanse of the Pacific. This easily reconfigurable gaming laboratory provided some of the intellectual heavy lifting for the pre–World War II Navy’s strategy and tactics and to understand emerging technological threats (e.g., aircraft and submarines). The insights gained were worked out in the more expensive fleet exercises where ideas were put to the reality test. The exercises allowed experimentation with and the refinement of the most promising tactics, techniques, and procedures using operational systems. The strategic and operational use of the new Navy aviation technology evolved on the floors of the Naval War College. Carrier aircraft evolved from spotters for battleship guns into an effective offensive weapon. A successful island-hopping strategy evolved from the original unworkable strategy of sailing an armada from the west coast to engage the Japanese fleet in a decisive battle at sea. Naval officers had the freedom to think “out of the box” and conduct tests themselves as strategists, operators, and tacticians. Officers took real concerns they were faced with and experimented with different methods to overcome the challenges of the sheer size of the Pacific Ocean, the Japanese geographic advantages, and the Japanese technological advantages in aircraft and torpedoes. Even this noncomputerized wargaming helped identify the shortfalls in how the fleet operated and how it would be vulnerable during resupply in the face of a modern peer adversary. The lessons learned were articulated as new requirements for the fast oilers, ammunition, and supply ships which had to keep up with the fleet of combatant ships. The war games showed the vulnerability of ports that could fall easy prey to the Japanese fleet and aircraft. Although the bombing at Pearl Harbor forced and aided the change, the war games were part of the evolution from a battleship gun club-centric fleet to an aircraft carrier offensive fleet. The Navy transformed itself from what worked in the past with battleship duels into the new dimension of offensive naval aviation. The United States did not have a monopoly on war-gaming. In September 1941, the Japanese Naval War College played a war game for 11 days that ended with a surprise attack on Pearl Harbor. Allen (1989) reports that Vice Admiral Nagumo, who subsequently led the attack on Pearl Harbor, played himself during the game. Allen also reports that games played by Japanese war planners were used to convince Japanese politicians that Japan would win a war with the United States. An intriguing discussion regarding the Battle of Midway and Japanese war games is provided by Perla (1990). On May 1–4, 1942, aboard the flagship Yamato, the invasion of Midway was simulated during a war game. As part of the simulation, US airplanes from Midway attacked the Japanese carriers and a die toss resulted in the sinking of two Japanese carriers. This loss was overruled by the presiding officer, Rear Admiral Ugaki, who determined that only one aircraft carrier was sunk and the other was only slightly damaged. After the conclusion of the war game, Ugaki

22

Human Factors in Simulation and Training

inquired about a contingency plan in case a US Navy carrier task force might oppose the Japanese invasion. He received an ambiguous reply, which led him to caution that such a possibility should be considered (Perla, 1990). A month later, the real battle of Midway was fought, and the land-based planes did little damage to the Japanese fleet, although the unexpected US carrier task force effectively destroyed Japanese carrierbased airpower. With respect to the decision by Ugaki, Perla notes that the landbased aircraft did not score a single hit on Japanese carriers, thus validating Ugaki’s inquiry. Regarding the Battle of Midway, Perla (1990) emphasizes that the outcome of the Japanese war game might have been different if the individual representing the commander of the American forces had behaved in a manner more characteristic of an American commander. The “fidelity” of play is an essential element so gamers will see it as a valid war game. How the gamers assess that fidelity of the game design and rules should not be overlooked. The authors of the RAND study on the next-generation war game for the Marine Corps, Wong et al. (2019) have observed there are generational differences in the game design. Different experiences in civilian game technology shaped the players’ expectations and their comfort level with gaming fidelity. A key concept in their study was that gamers developed different paradigms in adulthood largely based on the commercial games available in adolescence. This affects how they view game design, immersion, and gameplay. The baby boomers had exposure to mostly social board games which gave them an understanding of the rules, patterns of hex games, and other manual tabletop war games with little need for further explanation. Little of their gaming experience was digital during their adolescence and early adulthood. Generation X and millennials had more common experience with personal computer (PC), console, and social media games. They came of age on PC and console games. Generation X popular consoles were made by companies like Atari and Nintendo, which had games like Space Invaders (1983), Super Mario Brothers (1983), and The Legend of Zelda (1986). These adolescents saw the beginnings of handheld consoles like the Nintendo Gameboy. Most of their gaming experience was less social interacting than the board games of the Baby Boomers and the technology the Millennials would use. Millennials had access to gaming consoles with increased video animation and computational power such as the Xbox and the PlayStation along with sophisticated PC games that use Internet connectivity. Millennials played in real time in massively multiplayer online games such as Everquest (1999) and the World of Warcraft (2004). The latest generation (Generation Z, iGen, or digital natives) are forming their own construct which will most likely reflect the characteristics of mobile, handheld but networked gaming systems. These are increasingly using augmented and virtual reality systems in games such as Pokemon Go (2016). The computation power of the Nintendo Switch provides a hybrid platform that is both a portable handheld and a networked game console. As the Marine Corps and other military departments develop their next-generation war games, it is most likely they will move to an increased level of immersion moving away from the baby boomer dominant board game-style format. That strong

Human Factors in Simulation and Training

23

immersive experience is one of the hallmarks of educational war games whose priorities are to train and educate the gamers. The military will continue to leverage emerging technologies such as AR, VR, and gaming networks to construct a highly immersive gaming environment. Scenario, decision, adjudication, and visualizations tools will accelerate the achievement of rapidly constructed immersive education and training environment all based on basic war-gaming design and analysis concepts and lessons learned from over a hundred years of gaming both in the military and in the commercial sectors. Readers interested in more details on war games can consult the 32-page war games bibliography prepared by Aegis Research Corporation (2002), http://citeseerx​ .ist​.psu​.edu​/viewdoc​/download​?doi​=10​.1​.1​.473​.2503​&rep​=rep1​&type​=pdf and the RAND Corporation website on wargaming (https://www​.rand​.org​/topics​/wargaming​ .html) and the University of Toronto bibliography (https://www​.utm​.utoronto​.ca ​/asc​/ game​-enhanced​-learning​-bibliography). Caffrey’s (2019) book, On Wargaming: How Wargames Have Shaped History and How They May Shape the Future, provides many insights; also see https://usnwc​.libguides​.com ​/wargaming.

Online Gaming When the World Wide Web was made available to the public in 1993, it could only be accessed on a dial-up modem on a phone line with a 56-kbps data transfer rate. So slow that a single song could take 10 minutes to download. Cisco’s 2020 Annual Broadband report estimated the average bandwidth in North America is greater than 70 Mbps (over 12,500 times faster); and that bandwidth is expected to have doubled by the end of 2023. Online gaming has kept up with the advances in the computational power of personal computers, gaming consoles, and increased bandwidth. Developers can now offer games and experiences for an individual or for a group of people, who can be collocated or separated geographically. The online experience can be quests or challenges where you can win or lose or the opportunity to explore and carry out a virtual life without the pressures of a combat scenario. Some platforms enable gamers (without programming skills) to create characters, structures, machines, and even their own environment. One example, The Sims, available since 2000, has ten million active monthly users. Gamers can interact with other persons in the game world through avatars they create. These avatars can start as infants, teenagers, or adults with different personalities, lifestyles, and physical traits that can gain skills and “mature” from their experiences in the game world. The Sims user experience (UX) development team goes well beyond writing code and addresses systems issues such as branding, design, usability, and function. They emphasize pleasure, aesthetics, and fun while making the in-game experience as similar as possible to real-life experiences. Players can try different or idealized versions of themselves in a no-pressure environment, where there are no repercussions for their actions (Mueller, 2021). Educators have experimented using online virtual worlds as educational tools. One such tool Second Life (SL), has been used to create popular virtual worlds for education and for the study of virtual worlds in education research.

24

Human Factors in Simulation and Training

For instance, Sweigart and Hodson-Carlton (2013) used SL avatars to improve nursing student interview skills; Esteves et al. (2011) taught computer programming; and Nestler et al. (2013) taught physical security aspects of cybersecurity. Texas A&M Veterinary Medicine had a class role-play a Veterinary Emergency Team’s hurricane disaster response using professors, students, and avatars working in the SL virtual learning environment (https://www​.youtube​.com​/watch​?v​ =I5VXf5Ky5ms). For education, digital games are being used extensively in kindergarten through high school to teach “critical 21st-century skills” (http://glasslabgames​.org/). In a review of the effectiveness, Young et al. (2012) reported some positive effects on language learning, history, and physical education (exergames) but no positive effects on science and math learning. However, Weiss et al. (2006) reported that kindergarten students taught with multimedia in cooperative learning or with multimedia in individual learning did better in math than students without either type of training. Using similar technology, hands-on virtual labs are becoming prevalent and providing unique learning opportunities. One example is the Computer Emergency Response Team (CERT) Simulation, Training, and Exercise Platform (STEPfwd) (https://stepfwd​.cert​.org​/vte​.lms​.web). Tang et al. (2012) described using Sustain City, for undergraduate-level training in science and engineering. Hiskins et al. (2011) discussed virtual simulation to train high school students to design electric cars. Basu et al. (2013) described an enactment, or E-World, a multi-agent simulation, to teach science to high school students. MIT’s Open Courseware site (http://ocw​.mit​.edu​/ index​.htm) is an excellent source of multiple educational simulations. Moreno-Ger et al. (2009) described several low-cost platforms for educational game development. Ulrich et al. (2017) developed a gamified nuclear power plant simulation. However, Picciano et al. (2012) expressed concerns with the quality of instruction, policies for funding such courses, and attendance requirements for online learning. Eshet-Alkalai et al. (2010) also discussed challenges in delivering different instructional technologies effectively. Educational researchers have noted that the explosive growth in online learning has not been followed by many studies on their effectiveness. Carnahan (2012) compared a traditional classroom setting where there is face-to-face contact between the student and the teacher, with learning in a virtual world where the student interacted with the teacher and other students via an avatar that represented him. The research showed no significant differences in academic achievement by the seventh-grade STEM students in either environment. Student satisfaction was higher with the virtual classroom. This could encourage greater engagement by the students in a game-like entertaining environment. D’Angelo et al. (2013) metanalyses of 59 articles showed overall that computer-based interactive simulations improved training more than instruction without the simulations; however, the findings were restricted to science, math, and engineering (STEM) students. More studies for different age groups from kindergarten to graduate education and for non-STEM students are needed. Reduction in equipment cost and increased Internet connectivity are expected to accelerate the adoption of virtual reality/augmented reality systems for online gaming and education.

Human Factors in Simulation and Training

25

Aviation US Army Signal Corps’ Specification No. 486 (1907) for its first “air flying machine” has one very straightforward user-centric requirement: it should be sufficiently simple in its construction and operation to permit an intelligent man to become proficient in its use within a reasonable time. Apparently, the “intelligent man” needed help or the flying machine was not simple enough; less than three years later, Haward (1910), as quoted in Rolfe and Staples (1986), described an early flight simulator as “a device which will enable the novice to obtain a clear conception of the workings of the control of an aeroplane, and of the conditions existent in the air, without any risk personally or otherwise” (p. 15). As early as 1910, the Wright Brothers used a “kiwi bird” flight simulator in their training school at Huffman Prairie (Bernstein, 2000). Students learned rudimentary flight control while seated in a defunct Wright Type B Flyer mounted atop a trestle. Motion was induced by means of a motor-driven cam. With respect to cost, Bernstein reports Bernard Whelan as saying It was the only thing that I know that cost more then than it does today. It cost sixty dollars an hour to take flight training at the Wright School. That’s a dollar a minute. And they didn’t sign you up for anything less than four hours. (p. 123)

A similar device is described by Adorian et al. (1979). The Antoinette trainer (Figure 1.2) required a student to maintain balanced flight while seated in a barrel (split the long way) equipped with short “wings.” The barrel, with a universal joint at its base, was mounted on a platform slightly above shoulder height so that instructors could push or pull on these “wings” to simulate “disturbance” forces. The student’s task was to counter the instructors’ inputs and align a reference bar with the horizon by applying appropriate control inputs through a series of pulleys.

FIGURE 1.2  One of the earliest rudimentary trainers, an Antoinette Trainer (ca. 1910).

26

Human Factors in Simulation and Training

A more dynamic simulator was utilized by the French Foreign Legion. They realized that an airframe with minimal fabric on its wings would provide trainees with insight into the flight characteristics of the aircraft while limiting damage to the real aircraft and the student (Caro, 1988). In 1917, Winslow, as reported in Rolfe and Staples (1986), described this device as a “penguin” capable of hopping at about 40 mi (65 km) per hour. Although of limited use, it was a considerable improvement from the earlier flight training method of self-instruction in which trainees practiced solo until they learned basic flight maneuvers. Instructors would participate in the in-flight training after the trainees had, through trial and error, learned the relationship between input and system response (Caro, 1988). Apparently, the legionnaires understood the value of a skilled flight instructor. The origins of modern flight simulators can be traced to 1929, when Edward A. Link received a patent for his generic three-degrees-of-freedom (yaw with limited pitch and roll), ground-based flight simulator (Figure 1.3). His initial trainer with stubby wings and rudders was designed to demonstrate simple control surface movements and make them apparent to an instructor. Later a hood would be added over the cockpit which, with the appropriate instruments in the cockpit, allowed the device to be used for instrument flight training. Link based his design on the belief that the trainer should be as analogous to the operational setting as possible. Through the use of compressed air that actuated bellows, the trainer could pitch, yaw, and roll, enabling student pilots to gain insight into the relationship between stick inputs and movement in three flight dimensions. A patent for an aviation-training machine

FIGURE 1.3  “Blue Box” link trainer and associated instrument table.

Human Factors in Simulation and Training

27

also using “an air-operated motor” was granted to Levitt L. Custer of Dayton, Ohio, in 1930. Link’s invention was originally marketed as a coin-operated amusement device (Fischetti & Truxal, 1985); however, the value of Link’s simulator was recognized when the US Navy and Army Air Corps began purchasing trainers in 1934. Flight instructors, watching from outside the “Blue Box,” would monitor the movements of the ailerons, elevator, and rudder to assess the student’s ability to make the control inputs necessary for various flight maneuvers. A plotting table (shown in Figure 1.3) was also developed that allowed instructors to observe the ground track and speed of the aircraft. When the United States entered World War II, there were over 1,600 trainers in use throughout the world. The number of trainers increased as the Allied forces rushed to meet the demand for pilots. During the war years, approximately 10,000 link trainers were used by the US military (Caro, 1988; Stark, 1994). In 1943, an operational flight trainer for the US Navy’s PBM-3 aircraft was produced by Bell Telephone Laboratories (Pohman & Fletcher, 1999). This trainer used analog circuitry to solve flight equations in real time and presented the results using the actual controls and instruments available in the aircraft. During World War II, considerable assets were devoted to the development of electronic digital computers. In 1944, with US Navy funding, the Massachusetts Institute of Technology (MIT) undertook the development of an electronic flight simulator (Waldrop, 2001). Until then, development efforts had focused on batch-processing-type computers, which solved an equation and then waited for the next equation. However, the MIT developers realized that unlike previous computers, the one named Whirlwind had to respond in real time to the constantly changing pilot’s inputs and the dynamic response of the simulated aircraft. Thus, they built the first real-time digital computer, which was the basis of the modern PC. Its performance in 1951 was equivalent to the 1980 TRS80 PC; however, its vacuum-tube electronics required an area approximately the size of a small house, with unique electrical and cooling requirements. After the war, simulations developed for military use were adapted by commercial aviation. Loesch and Waddell (1979) reported that by 1949, the use of simulation had reduced airline transition flight training time by half. Readers interested in details on the intriguing history of flight simulation may consult the excellent three-volume history entitled 50 Years of Flight Simulation (Royal Aeronautical Society, 1979). Jones et al. (1985) provide an excellent overview of the state of the art in simulation and training through the early 1980s. Moroney and Moroney (2010) update that literature. Following World War II and throughout the 1950s, aircraft diversity and complexity created the need for aircraft-specific simulators, that is, simulators that represent a specific aircraft in instrument layout, performance characteristics, and flight handling qualities. Successful representation of instrument layout and performance characteristics was readily accomplished; however, the accurate reproduction of flight-handling qualities was a more challenging task (Loesch & Waddell, 1979). Exact replication of flight is based on the unsupported belief that higher fidelity simulation would result in greater transfer of training from the simulator to the actual aircraft. This belief has prevailed for many years and continues today. However,

28

Human Factors in Simulation and Training

almost 70 years ago, researchers were questioning the need for duplicating every aspect of flight in the simulator (Miller, 1954; Stark, 1994). Within the context of education, Spannaus (1978) lists three characteristics of simulations: (1) they are based on a model of reality, (2) the objectives must be at the level of application, and (3) the participants must deal with the consequences of their decisions. He believes that for an activity to be called a simulation, students cannot be just observers but must also be involved. Today’s educators and trainers call this process “active learning and immersion.” Ricci et al. (1996) provide experimental support for the advantages of active learning, specifically computer-based gaming, in knowledge acquisition and retention. Caro (1979, p. 84) emphasized that a flight-training simulator’s purpose was “to permit required instructional activities to take place.” However, from his 1979 examination of existing simulators, simulator design procedures, and the relevant literature, Caro concluded that “designers typically are given little information about the instructional activities intended to be used with the device they are to design and the functional purpose of those activities” (p. 84). Fortunately, some progress has been made in this area. Today, as part of the system development process, designers (knowledgeable about hardware and software), users and instructors (knowledgeable about the tasks to be learned), and trainers and psychologists (knowledgeable about skill acquisition and evaluation) interact as a team in the development of training systems (Stark, 1994). The objective of this development process is to maximize training effectiveness while minimizing the cost and time required to reach the training objective (Stark, 1994). The XR section of this chapter contains some good examples of this process. Salas et al. (1998) propose that, to fully exploit the tremendous progress in simulation technology, we should reduce the emphasis on fidelity and realism, and focus on enhancing the learning of complex skills by bridging the gap between training research findings and the use of training technology. They challenge the following assumptions: (1) simulation is all you need, (2) more fidelity is better, and (3) if the aviators like it, it is good. They propose that the emphasis should shift from technology to learning, specifically, to a trainee-centered approach with a more holistic consideration of the training process. Modern flight simulators are used for initial and advanced training as well as proficiency maintenance and evaluation. They have become an integral part of the research, development, test, and evaluation (RDT&E) cycle. With the development of the glass and digital cockpit with its onboard computers, and associated recording media (including the black box), simulators are now used in both reconstructing aircraft accidents and in some cases identifying appropriate recovery procedures. During the 1990s, flight simulation was a worldwide industry with many competitors (Sparaco, 1994), and sales of $3 billion per year for commercial airlines and $2.15 billion per year for the DoD. Individual simulators ranged in price from $3000 for a basic PC-based flight simulation with joystick controls up to an average of $10–13 million for a motion-based simulator (down from $15–17 million in the early 1990s). Today, the number of competitors has been reduced as the aerospace industry has consolidated. This consolidation has been attributed to advances in technology

Human Factors in Simulation and Training

29

and the emphasis on leaner and more efficient core-business-focused organizations (Wilson, 2000). The Flight Simulator Market is expected to reach $8.03 billion by 2026, according to Global News Reports and Data (https://www​.globenewswire​.com​ /news​-release​/2019​/07​/11​/1881533​/0​/en ​/ Flight​-Simulator​-Market​-to​-Reach​-USD ​-8​ -03​-Billion​-By​-2026​-Reports​-And​-Data​.html). The highest growth rate is expected to be in the area of UAVs (unmanned air vehicles). The reduction in hardware cost for display technologies and the reduced emphasis on simulating motion has changed the traditional focus from hardware to software. Major suppliers in this industry are now focusing on how to incorporate the Internet and XR into training systems. For details on the use of XR, see the following section.

EXTENDED REALITY: AUGMENTED REALITY, VIRTUAL REALITY, AND MIXED REALITY Introduction to XR This section provides a brief overview of XR technologies and how they are reshaping the way we both do our jobs and train to do our jobs. Extended Reality is a blanket term encompassing VR, AR, and MR. XR technology has wide applicability in sectors, including, but not limited to, medical, aeronautical, maritime, and automotive industries, and disparate applications ranging from engineering design, manufacturing, training, operational applications, maintenance, and sustainment. Its suitability will necessarily depend on the fidelity requirements of the use case (i.e., specific application) and the capability of the XR technology to support those requirements. While there is not complete agreement over the definitions of VR, AR, and MR technologies, Milgram and Kishino (1994) developed a Reality-Virtuality (RV) Continuum, which is helpful for conceptualizing their relations. A variation of their model, referred to here as the XR Reality-Virtuality Continuum, has been adapted to include XR and VR (Figure 1.4). While the term Real Environment (RE) is clear intuitively, other terms in the XR Continuum require some definition to ensure a common understanding. First, Virtual Environment (VE) can be defined as “the representation of a computer model or database which can be interactively experienced and manipulated by the virtual

FIGURE 1.4  Extended Reality (XR) Reality–Virtuality (RV) continuum. (Adapted from Milgram and Kashino, 1994, and Doolani et al., 2020.)

30

Human Factors in Simulation and Training

environment participant(s)” (Barfield & Furness, 1995, p. 4). Mazuryk and Gervautz (1996, p. 5) explain that in a VE “system a computer generates sensory impressions that are delivered to the human senses.” So, a virtual environment requires (1) a computer model, (2) a representation of that model which stimulates the user’s senses (e.g., visual, auditory, haptic), (3) a user or users, and (4) a way the user(s) can interact with the computer model. A perfect VE would result in complete immersion of the user within the VE such as in the book Ready Player One and the movie by the same name. This is not technically feasible, at least not today. Thus, there is an XR Reality–Virtuality Continuum between the real and virtual worlds. The terms VR and VE are often used interchangeably; however, as indicated above, current technology does not allow perfect immersion in a VE. Therefore, it is useful to distinguish between the two. VR has been defined in various ways as described in the Kaplan et al. (2020) meta-analysis. Boud et al. (1999, p. 32) define VR as a “three-dimensional computer-generated environment, updating in real time, and allowing human interaction through various input/output devices.” This definition aligns very closely with that of a VE. Heim (1998, p. 221) defines VR as a “technology that convinces the participant that he or she is actually in another place by substituting the primary sensory input with data produced by a computer.” The key distinction between the two definitions is Heim’s (1998) emphasis on the technology that allows immersion in the VE. This is a useful distinction. A final component to bear in mind when considering VR technology is that the focus is on the VE and intrusion of the RE is a distraction to be overcome. So, stated succinctly, VR technology seeks to immerse the user(s) in a VE and detach the user(s) from the RE. AR, Augmented Virtuality (AV), and MR technologies seek to blend the RE and the VE to achieve specific purposes. Milgram et al. (1995, p. 283) defined AR as “augmenting natural feedback to the operator with simulated cues.” Drascic and Milgram (1996, p. 123) expanded on this definition stating that “AR describes that class of displays that consists primarily of a real environment, with graphic enhancements or augmentations.” So, AR technologies enable the user to interact with the RE and overlay or otherwise add information from a VE to enhance the users experience with the RE. AV can be conceptualized as the inverse of AR. That is, the user’s experience is primarily that of the VE (i.e., computer-generated), which is augmented in some way by the real world (Drascic & Milgram, 1996; Milgram & Kishino, 1994). For example, including a representation of the user’s hands in the image to aid in interactions with the VE. As a second example, a challenge in the adoption of VR technology has been users tripping over real-world objects while in the VE. Recent head-mounted displays (HMD) are working to overcome this by enabling users to map the RE in the VE (e.g., create borders in which the user can move without running into physical objects). In both instances the primary emphasis is on the VE, which has been enhanced with information from the RE (e.g., hand position/ orientation, definition of the physical space and barriers). Milgram et al. (1995, p. 283) define an MR environment as “one in which realworld and virtual-world objects are presented together within a single display.” There is clearly significant overlap between this definition and those of AR and

Human Factors in Simulation and Training

31

AV environments in the preceding paragraphs. This is intentional as Milgram et al. (1995) conceptualized MR as a continuum in which RE and VE objects are displayed in a common view with greater or lesser emphasis on the RE or VE. Recently MR has begun to be used by some as distinct from AR and AV, while others continue to use it based on its earlier definition. This has created some confusion in terminology. For the present discussion, it is argued that while MR might include applications ranging from AR to AV, its value is in describing those instances in which the terms AR and AV do not adequately describe the use case of interest. That is, MR can be conceptualized as including a more balanced mix of the RE and the VE in which neither predominates and allowing purposeful interaction between the two. This view is consistent with the Reality-Virtuality Continuum proposed by Flavian et al. (2019). They distinguished MR from AR and AV referring to it as Pure MR and describing it as the complete merging of the VE and the RE. It is important to note that there is still significant overlap in the use of these terms and a review of the specific application is key to ensuring sufficient understanding of the technical approach being utilized. Using this framework, the next section will briefly illustrate the history of VR in training. Then, we will review recent real-world applications of XR technologies in Training and Operational Support. Following this section, potential future use cases will be explored. Finally, technical challenges inhibiting widespread adoption will be highlighted.

Historical VR Devices Space limitations preclude a detailed discussion of the evolution of XR devices; however, many of the advances in XR technology today were built on the innovations of decades of VR research. Before we turn to recent advances in XR technology, we are providing a comparison of three VR systems which illustrate the state-ofthe-art in the 1960s, early 2000s, and 2020s. In the mid-1960s, Tom Furness of the Armstrong research laboratory at Wright-Patterson Air Force Base (Voices of VR Podcast, 2015) began the development of a large field of view HMD capable of providing partially overlapped fields to each eye. Known as the visually coupled airborne systems simulator (VCASS), it was the first virtual reality panoramic display ever constructed and tested. The innovative Farrand Optical Pancake Window™ optics provided an adjustable horizontal FOV of 100°–160° with a binocular overlap of 20°–60° and a constant vertical FOV of 60°. The display’s instantaneous presentation was controlled by a state-of-the-art AC magnetic helmet-mounted tracker that could determine azimuth, elevation, and roll orientation, as well as x, y, and z helmet position information to a then unprecedented 14-bit resolution. The VCASS system was among the earliest virtual reality systems and could produce simulated air-toair and air-to-ground mission scenarios, simulated cockpits, enhanced stereoscopic imagery effects, multiple same-scene views from different simulated altitudes/distances, and other special visual effects. Because of its physical appearance, VCASS (Figure 1.5) was also known as the “Bug That Ate Dayton.” In 1986, Furness, who is sometimes called the “grandfather of VR,” described its role in the Air Force’s Super

32

Human Factors in Simulation and Training

FIGURE 1.5  Frontal view of VCASS. Photo by Dean Kocian, approved for public release. (AFRL-2021-1547, 05-19-2021.)

Cockpit program and provided a list of human factor challenges inherent to such a device. Despite major technological progress, many of the challenges described by Furness still apply today. In the late 1990s, researchers at the Naval Air Warfare Center Training Systems Division (NAWCTSD) developed a VR technology demonstration system for training ship handlers, called the Virtual Environment for Submarine Ship Handling Training (VESUB). This system was designed for integration with existing submarine piloting and navigation training simulators, and for evaluation of the effectiveness of VR as a training tool. Hayes et al. (1998) completed a training effectiveness evaluation on the tool and found it improved training in areas including checking range markers, issuing correct turning commands, contact management skills, reaction time for man overboard drills, and in using correct commands during emergency events. By 2001, the trainer (Figure 1.6) was deployed at the Naval Submarine School, and later expanded to all six Navy submarine training sites. It featured COTS hardware: the nVisor SX from NVIS, Inc, which provided a 60– diagonal field of view and was paired with an Intersense IS-900 Head Tracker. Images were managed by a system

FIGURE 1.6  Early VESUB system with headset; Bottom: VESUB system view from the virtual bridge. (From Wendland & Holland, 2006; Hayes, Vincenzi, Seamon, & Bradley, 1998 NAWCTSD PAO 1998071010, approved for public release.)

Human Factors in Simulation and Training

33

FIGURE 1.7  U.S. Air Force Tech. Sgt. Orson Lyttle participates in a virtual reality assignment with the Inter-European Air Forces Academy onboard Naval Air Station (NAS) Sigonella. (210506-N-GK686-1025 Naval Air Station (NAS) Sigonella, Italy. May 6, 2021.)

of four 500 MHz CPUs, which rendered up to 21,000 polygons at a refresh rate of 30 Hz, which is much lower than the common standard of 90 Hz, which is generally accepted as of this writing as a minimum for mitigating sim-sickness (VR Headset Authority, 2021), though no problems were reported by Hayes et al. (1998). VESUB was also equipped with a voice recognition system. Today’s commercial VR devices have become far more capable. VR can be used effectively on stand-alone systems or external computers that can power popular VR headsets such as Oculus Rift, HTC Vive, and Lenovo Mirage Solo. Current VR headset systems (Figure 1.7) are often paired with hand controllers or sensorequipped gloves to improve how the user interacts with the virtual environment. Commercial headsets can also support wireless connections, dramatically increasing the range of possible use cases for such devices.

XR Applications in Training Recent advances in XR technology have increased their viability in a range of potential applications. This section describes how this technology is already revolutionizing training. VR in Medical Training Medical domain training applications relying on XR technology vary widely. Tang et al. (2021) reported a VR blood-type and screen-training application designed to train blood-typing techniques that yielded evidence of enhanced readiness for trainees in comparison to a video-based training condition. Selvander and Asman (2012) performed a comparison of the effectiveness of a VR-based training module with a video-based training module for training ophthalmology students on the

34

Human Factors in Simulation and Training

capsulorhexis procedure, one of the most difficult steps in cataract surgery involving the maneuver of instruments in and around the lens of the eye. They found that VR did not improve over video-based training in student maneuver of surgical instruments, but they also found that after over ten iterations in their experiment, students learning without the aid of VR were still lagging behind their peers trained in the VR training condition in avoidance of injury to the lens area of simulated patients. Anderson et al. (2018) evaluated the impact of the structure and distribution of VR training as a component of otorhinolaryngology training. These researchers compared performance of a medical student group that offered VR rehearsal as distributed practice prior to the course with a group that received the VR training during the course only and found that the additional distributed practice improved reaction times significantly and yielded higher overall scores on cadaveric dissection assessments. These results highlight the potential of VR training solutions to improve training outcomes. AR in Medical Training Nausheen and Bhupathy (2020) incorporated the Microsoft HoloLens AR device into Point of Care Ultrasound (POCUS) training, and found that it helped students track organ locations between the depictions on POCUS devices and the patient’s anatomy. Training applications are not limited to surgical or procedural training; both Bork et al. (2020) and Kiourexidou et al. (2015) reported the use of AR to train different aspects of human anatomy. Bork and colleagues performed a comparison of an AR system with text-based training accompanied by three-dimensional fixed models, and found similar knowledge gains at course conclusion from both groups; these researchers found that collaborative AR systems yielded training advantages for mastery of topographic anatomy over single-user experiences, suggesting that system’s ability to allow students to share visual perspectives with each other during instruction using the AR tools was likely to yield educational benefits relative to single-user systems. AR in DoD Tactical Combat Casualty Care The military is expanding the use of AR in medical training. For example, AR is beginning to play a key role in training for Tactical Combat Casualty Care (TCCC), which refers to care provided to warfighters at the point of injury by the personnel closest to them. These personnel will have far more limited equipment and options for treatment than are available at a traditional hospital. TCCC basics include the application of tourniquets to stop bleeding, nasal pharyngeal airway management, and treatment of tension pneumothorax, a condition in which air gets trapped in the chest cavity and collapses the lung, by needle decompression (Office of the Under Secretary of Defense for Personnel and Readiness, 2018). Two examples of recent Army-developed AR applications are provided in Figures 1.8 and 1.9. The Army has developed TCCC training for battlefield responders using AR, which is used to simulate wounds, and a progressive physiology model representing changes in patient outcomes based on treatment provided. The use of AR in these devices makes it possible to model multiple types of wounds without modifying a

Human Factors in Simulation and Training

35

FIGURE 1.8  The Combat Casualty Care Augmented Reality Intelligent Training Systems (C3ARESYS) tool uses AR to improve learning outcomes in TCCC training. (Soar Technology, Inc. Used with permission.)

FIGURE 1.9  A visual overlay of a virtual watch/timer used in AUGMED training (Design Interactive, Inc.) https://www​.dvidshub​.net​/image​/6368782. The appearance of U.S. Department of Defense (DoD) visual information does not imply or constitute DoD endorsement.

mannequin, allows the trainee to see the results of their treatment, such as bleeding reductions based on the application of a tourniquet, and even more importantly, allows trainees to see how the simulated patient’s condition is changing based on those treatments and their timing. Specific wounds can be changed between trainees where necessary without replacement of a specific mannequin. The use of AR in such a setting affords trainees the opportunity to interact with physical first aid kit tools or replicas, yielding critical experience at placing the devices with their own hands and applying appropriate pressure. The Combat Casualty Care Augmented Reality Intelligent Training Systems (C3ARESYS) tool (see Figure 1.8) is also capable of recognizing trainee interaction with physical devices like tourniquets as

36

Human Factors in Simulation and Training

well as trainee-stated intentions recognized through natural language processing (A16-076, n.d.; Tanaka, Craighead, & Taylor, 2019; Tanaka, Craighead, Taylor, & Sottilare, 2019; Taylor et al., 2018). The US Army also funded the development of AUGMED (see Figure 1.9), an AR-based TCCC trainer that accommodates both HoloLens and tablet-based interfaces, which incorporates explicit survival timelines as student prompts. This tool has been evaluated as part of the Combat Life Saver (CLS) course at Army Command Fort Indiantown Gap (FTIG) (Bolling, 2020). A third tool designed to enhance TCCC training by incorporating virtual overlays over a physical training manikin is the Virtual Patient Immersion Trainer (VPIT) described by Sushereba and Militello (2020). This device incorporates physiological injury progression and accompanying indicators, such as displacement of the trachea to one side and jugular vein distension due to trapped air in tension pneumothorax, allowing learners to track how these injury indications appear over time. The VPIT tool also uses eyetracking to train gaze patterns, and to support performance evaluation to verify whether learners appear to have noticed a series of injury- and treatment-relevant cues before intervening. Empirical evidence of this trainer’s effectiveness is not yet available. As discussed above, tools like these are receiving increased attention, and the integration of AR in DoD medical training courses seems likely to increase. These applications are important as examples of AR being evaluated in a live training environment, though additional evidence of their training effectiveness is still needed (Kaplan et al., 2020). VR Flight Training Devices The DoD has used simulator-based training to support aviation training in one form or another since military flight began. Research has demonstrated that simulation-based training reduces the amount of flight time required to reach required performance levels (Kaplan et al., 2020). Today the DoD continues to demonstrate interest in the use of VR applications for aviation training. These devices offer portability, lower cost, easier maintenance, and afford an effective 360– field of view for the trainee, which is a feature of particular interest in jet training. A recent meta-analysis indicates that XR training has not yet demonstrated improved student performance following use of XR training devices; however, similarly it has not demonstrated a performance decrement (Kaplan et al., 2020). Given the other potential benefits of XR training devices, there is likely to be continued interest in efforts to develop and use XR devices in aviation training applications. The US Navy recently completed an initial training evaluation of VR FTD for Primary Fixed-Wing training and Intermediate Strike training (McCoy-Fisher et al., 2019). The evaluation included the study of a T-6B VR FTD and two T-45C VR FTD variants (Figures 1.10 and 1.11). While specific components varied, in general each system was composed of a VR HMD, computer, throttle and control stick (referred to as Hands on Throttle and Stick (HOTAS), rudder pedals, monitor, and base with integrated seat. The device enabled student pilots to fly a range of maneuvers and missions leveraging VR technology in a VE. The results indicated that the devices

Human Factors in Simulation and Training

37

FIGURE 1.10  T-45C VR FTD at NAS Kingsville. (Courtesy of Defense Technical Information Center. U.S. Government Work, 17 USC §105.)

FIGURE 1.11  T-6B VR FTD at NAS Corpus Christi. (Courtesy of Defense Technical Information Center. U.S. Government Work, 17 USC §105.)

were best suited to support specific stages of training such as Contacts training (i.e., takeoff, aerobatics, and landing phases). The results also indicated that comfort with technology, prior use of VR devices, and general trust in automation were related to overall ratings of perceived utility of the technology. While VR sickness was an issue for some participants, their symptoms resolved within 30 minutes of completing the training evolution. Finally, the results indicated the importance of an intuitive user interface, the ability to interact with the virtual cockpit reliably in a natural

38

Human Factors in Simulation and Training

manner, and VR HMD visual fidelity that approaches natural vision (e.g., field of view, visual acuity). MR Flight Training Device Traditionally, FTD rely on a large dome for their visual system. Figure 1.12 shows the US Army Aeromedical Research Lab (USAARL) Blackhawk helicopter full motion, visual simulator. Such devices provide a high-fidelity physical cockpit and a visual system that approximates the real environment; however, this drives a large device footprint and high costs. Recent advances in MR technology have the potential to change this paradigm. As discussed above, MR systems mix the RE with a VE on a common display. One notable limitation of VR systems is the inability to interact reliably with the real world or in this case a physical cockpit. MR systems use one of two approaches to merge the RE and VE in the user’s sight picture (Rokhsaritalemi et al., 2020). Video passthrough integrates a video camera with a traditional VR HMD and blends the RE and VE on the user’s screen, while optical passthrough (AKA See Through) uses a transparent lens to allow the user to view the RE and projects the VE onto the lens. Both have potential strengths and weaknesses and as in most cases, the appropriate solution depends on the problem to be solved. In conjunction with the VR FTD evaluation detailed above, the US Navy also conducted an initial evaluation of an MR FTD (McCoy-Fisher et al., 2019). In this case the MR device was used with an existing T-45C Operational Flight Trainer (OFT) (Figure 1.13). MR solutions that allow video passthrough rely on a pair of video cameras that are

FIGURE 1.12  USAARL Blackhawk Helicopter full-motion, full-visual flight simulator (the large footprint is evident when contrasted with the office chair in the lower-left corner). (https://www​.dvidshub​.net ​/image​/6407391​/usaarls​-nuh​- 60fs​-black​-hawk​-simulator​-upgrade.) The appearance of U.S. Department of Defense (DoD) visual information does not imply or constitute DoD endorsement.

Human Factors in Simulation and Training

39

FIGURE 1.13  T-45C MR FTD at NAS Kingsville—An MR HMD integrated with existing T-45 Operational Flight Trainer that relies on traditional dome-based technology. (Courtesy of Defense Technical Information Center. U.S. Government Work, 17 USC §105.)

integrated into the external face of the HMD and provide stereoscopic video data that can be viewed and mixed with the VE on the interior display of the HMD. This allows the user to see the interior of the cockpit and interact naturally with it while displaying a virtual representation of the out-the-window view. The results were mixed but promising enough to encourage future development to address noted deficiencies. AR Trainer for H-60R Preflight Procedures Prior to flight, aircrew preflight the aircraft to ensure there are no discrepancies that could interfere with safe flight operations. Traditionally, training aircrew to conduct a preflight requires a qualified pilot and a significant amount of time to fully orient trainees on how to recognize different aircraft components and verify that they are safe for flight. The Navy MH-60R helicopter community recently completed a study to develop and evaluate the effectiveness of an AR for preflight training solution leveraging an iPad (NAWCTSD PAO, 2018). Figure 1.14 shows an MH-60R helicopter and Figure 1.15 shows the AR training device in use. This device allowed student aviators to practice the preflight on their own, with or without the aircraft, prior to the formal training evolution. Using the iPad’s camera to view different aircraft components the system augmented the RE by overlaying labels, descriptions, and checklist items of components. The technology was viewed as very promising, but specific shortcomings with the initial design were as follows: • The initial prototype is too cumbersome to be used comfortably by all trainees for preflight tasks performed on the ground. • Important portions of the preflight checklists are performed by climbing on top of the aircraft, necessitating a hands-free solution for future implementation.

40

Human Factors in Simulation and Training

FIGURE 1.14  A MH-60R helicopter on the flight deck of the guided-missile cruiser USS Mobile Bay (CG 53). https://www​.dvidshub​.net​/image​/4371029​/mh​- 60r. The appearance of U.S. Department of Defense (DoD) visual information does not imply or constitute DoD endorsement.

FIGURE 1.15  H-60R Augmented Reality Preflight Checklist Trainer: A tablet-based AR preflight checklist trainer for aircrew and student pilots. NAWCTSD PAO, 2018. The appearance of U.S. Department of Defense (DoD) visual information does not imply or constitute DoD endorsement.

XR Applications in Operational Support Applications of XR for post-training operational support continue to expand across a variety of venues (Bond et al., 2018). Notable examples include the incorporation of XR into future Army tactical operations, medical procedures, and equipment maintenance operations.

Human Factors in Simulation and Training

41

AR in Army Tactical Operations The Army is in the process of developing and evaluating a new tool for building and maintaining tactical situational awareness through a system called the integrated visual augmentation system (IVAS), based on HoloLens 2 hardware technology. The IVAS is designed for use in the operational environment as well as for training. IVAS is expected to incorporate low light and thermal vision, as well as sensors distributed across squad or platoon members to generate an integrated display, increasing engagement, awareness, unit effectiveness, and improved decision-making (Lang, 2021). While it must be acknowledged that this is purely a system in development, and not a fielded capability with documented measures of effectiveness, the Army is investing heavily in this technology. VR in Operational Medicine The use of VR to treat post-traumatic stress disorder (PTSD) has an extensive history (Rizzo & Shilling, 2017). The visual immersion yielded by a VR environment is a critical aspect of such treatment and helps PTSD and phobia sufferers gradually mitigate their fear and physiological responses to the environments that trigger these symptoms (Rizzo et al., 2019). AR in Operational Medicine AR tools have been used in operational medical procedures to directly improve patient outcomes. What these examples all have in common is the use of AR to help the practitioner better understand the spatial relationships between a surgical display and the patient’s body, or to provide a visual aid to help the practitioner develop and maintain more accurate understanding of the locations of anatomical structures under the skin. Gregory et al. (2018) described the use of AR in operational surgical practice using a Microsoft HoloLens during a scapula replacement to help the provider visualize the location of specific bones and anatomical structures not visible to the naked eye during the procedure. Others have used AR overlays as operational aids for supplementing needle insertions (Heinrich et al., 2020). Liu et al. (2021) reported that the incorporation of an AR helmet-mounted display in a spinal surgery procedure limited radiation exposure and procedure time, while yielding similar patient outcomes to conventional surgery. In addition, Dias et al. (2021) and Qian et al. (2019) described the development, use, and effectiveness assessment of an AR trainer for endotracheal intubation training for providers learning to perform laryngoscopy. This device projected a depiction of the patient’s airway onto the visual field of an HMD, giving the provider improved visualization of relevant anatomy without eliminating direct line of sight view of the point of insertion, and was found to provide improved training efficacy relative to unassisted rehearsal. AR for Aircraft Maintainers The development of job aids for maintainers has been of significant interest to the military for decades. For example, Kancler et al. (1998) investigated the use of HMDs to display maintenance procedures on maintainer performance and found that task

42

Human Factors in Simulation and Training

completion was quicker and error rate was lower when using HMDs with augmented information. To illustrate military’s continued interest in this area, consider that Class C mishaps between 2008 and 2017 increased from 7.5 mishaps per 100,000 flight hours to about 22 mishaps per 100,000 flight hours and reductions in the experience of maintainers was identified as a contributor to this increase (Eckstein, JUN 13, 2018). Class C mishaps include those mishaps that result in $50,000– $500,000 in damage and/or a non-fatal injury. The Navy is actively engaged in steps to address the experience level of maintainers (Eckstein, JUN 22, 2018). Developing AR-enabled maintainer solutions to support refresher and deployed training needs may be part of the solution (N193-D01, n.d.). The Navy is also interested in leveraging AR technology to support all maintenance levels, including complex or unusual maintenance activities (N201-024, n.d.). Appropriately, this effort specifies the need to ruggedize AR HMDs to operate in diverse military environments ranging from direct sunlight to moonless night.

Future Directions A review of available resources provides insight into how industry and the US military are looking to leverage XR technologies in the future. For example, the US Army recently awarded Microsoft a contract to develop AR headsets to enable soldiers to map the battlespace and integrate intelligence to maintain situational awareness (Gregg & Greene, 2021). This section provides an overview of relevant innovations on the horizon for XR technologies. XR HMD Enhancements The DoD is encouraging the private sector to innovate XR HMD enhancements beyond commercially available capabilities and to address the fidelity requirements of military training applications (N192.087, n.d.). For example, the Headset Equivalent of Advanced Display Systems (HEADS) SBIR project specifies capabilities that Naval Aviation Training desires: (1) allows at least two pilots to interact with one another during any mission in any simulator, (2) full motion tracking of the HMD, (3) real-time imagery and/or accurate virtual representation of the cockpit, pilot’s hands, and other pilots, (4) enhancements to instantaneous horizontal field of view (FOV), vertical FOV, binocular overlap, frame rate, refresh rate, and static spatial resolution, and (5) compatibility with existing Navy simulators. Such enhancements have the potential to address the limitations of existing XR HMDs described above. AR for Red Air Military strike pilot training requires that pilots be able to train against adversary aircraft typically referred to as Adversary Air or Red Air. Adversary aircraft are difficult to procure and expensive to maintain and operate. The USAF is investigating the use of AR to help solve this problem (Underwood, 2020). Specifically, the Air Force recently funded an effort to integrate AR technology into pilot helmets that will enable aircrew to reliably see the RE when virtual aircraft are injected onto

Human Factors in Simulation and Training

43

their display. While there are several technical challenges such as the need to safely integrate the technology with various aircraft systems, this technology offers the potential to significantly reduce the requirement for adversary aircraft and to support a range of training needs, including airborne refueling and aerial combat. VR for Spatial Disorientation Training Spatial disorientation (SD) has long been recognized as a significant risk to aviation. Gibb et al. (2011) argue that nearly one-third of all military aviation mishaps are due to spatial disorientation. Bellenkes et al. (1992) found that SD was a causal factor in 33 Class A Naval Aviation mishaps between 1980 and 1989. Class A mishaps include those mishaps that result in $1 million or more in damage and/or an injury that results in a fatality or permanent total disability. Poisson and Miller (2014) found that over a 21-year period, SD was a causal factor in 72 US Air Force mishaps, resulting in 101 lost lives and an equipment cost of $2.32 billion. Given the cost both in lives and material it makes sense that there is significant interest in identifying methods to mitigate the risk of SD. Historically, training aircrew to recognize the symptoms of SD has relied on slide presentations, computer-based training, and/or the use of simulators. All have limitations. Recent advancements in VR technology provide a potential avenue to provide more immersive training on various visual illusions that can result in SD allowing student pilots to practice recognizing and responding in a timely fashion to these illusions. The US Navy is conducting an effort to explore VR solutions to address the need for more immersive SD training capabilities (N172-117, n.d.). Pilot Training Next (PTN) and Naval Aviation Training Next (NATN) In part motivated by recent pilot shortfalls, the USAF and USN are embracing the idea that new technology can revolutionize the training of military pilots. The USAF PTN program emphasizes the adoption of new VR FTDs, tailorable student-focused training, and VR-enabled self-study technologies (Tadjdeh, 2020). A recent CostBenefit Analysis (CBA) of the USAF PTN program estimated that over ten years the program can save a total of $12.92 billion and yield a 77% increase in the number of students trained (Pope, 2019). While this is quite promising, there are questions regarding the impact on pilot performance and quality. Similarly, the USN NATN program seeks to use new technologies in its aviation training programs (Shelbourne, 2020). These new technologies range from the use of VR FTDs (adapted from the USAF PTN program) to the incorporation of VR-enabled 360– videos that support self-study to MR solutions integrated with existing Instrument Flight Trainers (IFT) to provide a capability like that described as an MR FTD above. To illustrate how far training technology has evolved, consider the SNJ-5 Cockpit Checkout Recordings Device 12-AR-9a (Figure 1.16) that was developed by the Office of Naval Research in 1950. The SNJ-5 was a training aircraft used by many services, including the US Air Force and the US Navy. Student aviators could listen to these recordings to get an explanation of the location of all SNJ-5 instruments/controls and learn a variety of aircraft checklists such as start-up, takeoff, and landing. Today students have access to VR-enabled 360– videos.

44

Human Factors in Simulation and Training

FIGURE 1.16  SNJ-5 Cockpit checkout recordings student training device developed by the Office of Naval Research (1950).

The Navy is investigating additional technologies to support these innovations. For example, a recent effort is developing the Virtual Instructor Pilot Referee (VIPER), which uses artificial intelligence (AI) to provide adaptive, student-specific feedback on flight maneuvers that can be practiced without instructor involvement (Natali et al., 2020; SB052-009, n.d.). This technology can be integrated with the VR FTDs discussed above (McCoy-Fisher et al., 2019). Another tool is being used by the US Air Force to develop VR-enabled immersive training content, which can be used to support training in a range of areas, including how to respond to emergency situations (AF191-004, n.d.). A key to successful integration will be the use of open architecture. While XR applications in military aviation and medicine were the primary focus of this discussion, numerous other markets are likely to benefit from recent advances in XR technology. The retail industry can offer online shoppers the opportunity to “see” their purchases in their home prior to purchase. The automotive repair industry can benefit from advances like those described for aircraft maintainers. The tourism industry can offer potential customers immersive VR experiences before their trip to decide what they want to see. Educators can provide students shared VR experiences to increase student engagement or give students access to additional information using AR technology. As an additional illustration of the applications of MR, consider the use of AR in decis​ion-m​aking​disc​ussed​late​r in t​ his c​ hapt​er.

Challenges to the Adoption of XR Technology There are numerous barriers to the adoption of XR technology for training and other applications. To illustrate this, examples in the following areas will be briefly discussed: XR device fidelity, user fatigue, integration with existing

Human Factors in Simulation and Training

45

and new operational and training technologies, user resistance, safety, and regulations. XR Device Visual Fidelity The appropriateness of XR solutions for a given use case depends on the capability of the XR device(s) to meet the visual fidelity requirements of the proposed application. While current XR devices do a fairly good job of meeting the needs of the gaming community, current-generation XR devices still have a way to go to meet the needs of commercial, medical, and military users. For example, current generation XR HMDs cannot replicate the human horizontal or vertical field of view, which limits applications which rely heavily on use of peripheral vision and can induce unrealistic head movements in the training environment that can lead to negative training. Further, continued improvements to visual fidelity are necessary as numerous applications require the ability to replicate 20/20 vision. Current HMDs can lead to users “leaning in” to read displays. The integration of video (RE) and VE to create an MR environment presents its own challenges. For example, the visual fidelity of the RE and VE needs to be consistent and the transition between the two areas should be seamless and needs to be responsive to head movement such that the demarcation between the RE and VE is imperceptible. User Fatigue Using XR devices for an extended length of time can lead to user fatigue. The impacts of long-term use are still being investigated. A factor contributing to user fatigue and slowing the wider adoption of XR for sustained training operations is the vergence–accommodation conflict. The vergence–accommodation conflict results from the HMD wearer focusing for long periods on a focal point extremely close to the eye to track objects that appear to be at much greater distances in the visual display, in the case of VR. XR HMDs, including those that use AR instead, can also require the wearer’s eyes to adapt to highly conflicting visual cues. This results in greater difficulty in resolving binocular images as well as visual fatigue resulting from extended use (Kramida, 2016). Progress is being made on these issues, largely through the development of HMDs that present visual cues at different optical distances (Lu et al., 2019). Integration XR devices may be stand-alone or integrated into an existing training system. Integration of XR devices with stand-alone systems can be reasonably expected to be simpler than integration with existing training systems as the stand-alone system is built with a specific XR device in mind. However, it is important that such systems be built recognizing that XR HMDs are being updated at a rapid pace. To illustrate this point, consider that the Oculus Quest was released in November 2019 and the Oculus Quest 2 was released in October 2020. Similarly, the Varjo VR-1 was released in February 2019, Varjo VR-2 was released in October 2019, and the Varjo VR-3 was released in early 2021.

46

Human Factors in Simulation and Training

Integrating XR devices with existing training systems presents a more formidable challenge. To illustrate this, consider two key issues. The XR device must be able to communicate with the training system. Depending on the age of the system, it may use a programming language or standard different from that commonly used today. Further, rendering a high-fidelity image in an XR HMD requires a significant amount of visual data. The legacy training system must both have the data at the fidelity required and be able to pass it to the XR HMD quickly enough to deliver the information to the user’s display. McCoy-Fisher et al. (2019) described some of the challenges associated with integrating the MR HMD with the existing OFT (Operational Flight Trainer) that was not built to accommodate such technology. MR displays have the added challenge of having to use both virtual and real-world images on a single display. For example, they noted a recurring problem with aligning the display of VE and RE images on the display and indicated that in most instances this calibration had to occur prior to each event (McCoy-Fisher et al., 2019). User Resistance User resistance to new technology may be a barrier to adoption and widespread use of XR technology. Research has demonstrated that perceived usefulness and perceived ease of use predict intent to use technology (Davis et al., 1989; Venkatesh et al., 2002). That is, if users perceive the technology as instrumental to some desired outcome (e.g., improved performance) and the technology is intuitive and easy to use, then they are more likely to want to use it. Therefore, it is important to consider the perspective user community’s experience with XR technology and attitudes toward it. It is equally important to develop a good user interface and test it prior to implementation. A negative first impression of the XR system will likely be difficult to overcome. Safety The use of XR technology also needs to consider the potential safety risks. In the medical domain this would refer to patient safety while in the aviation domain this would refer to safety of flight. Similarly, other domains must consider safety as they consider utilization of XR technology. For example, research has demonstrated that prolonged wear of head-mounted displays with a forward center of gravity (CG) rather than neutral is more likely to cause fatigue and injury, which highlights the need to consider the weight and CG of the HMD (Albery et al., 2008). Further, research has demonstrated that some individuals are susceptible to simulator sickness. McCoyFisher et al. (2019) found that mean oculomotor and disorientation scores increased from baseline following the use of an XR device though symptoms resolved shortly after the conclusion of the training event. This highlights the importance of research investigating characteristics of XR HMDs and individuals that might increase the likelihood and severity of simulator sickness. As a final example, consider the AR use case described above in which synthetic entities are displayed on a pilot’s HMD. Integrating AR into a live flight environment drives the need to carefully consider risks to safe flight such as the need to distinguish between real and synthetic entities and to be able to quickly exit “training mode.”

Human Factors in Simulation and Training

47

Regulations Various agencies govern the certification and ongoing training requirements of professional communities. For example, the Federal Aviation Administration (FAA) regulates the initial and recurring training requirements of pilots. It also maintains standards for flight simulation training devices. Currently, XR devices are not certified to support mandatory training requirements of aircrew, which limits adoption and necessitates reliance on traditional simulators and live flight. Fully realizing the benefits of XR technology in pilot training will require establishing XR training device requirements and a certification process. XR Is Not Always the Optimal Training Solution Finally, it is important to note that extended reality technologies are not appropriate for every application (at least not with today’s technology). Consideration of any instructional medium’s potential as a learning aid must be grounded in a theoretical model that addresses the differences among learning outcomes, cognitive processes, and the conditions that distinguish among them. Gagne and Medsker’s (1996) model of instruction is widely recognized, and holds that learning is based on three elements: a taxonomy of learning outcomes, conditions of learning, and nine possible events of instruction. Driscoll (2000) defined instructional theory as “identifying methods that will best provide the conditions under which learning goals will most likely be attained (p. 344).” This model holds that instructional methods should be matched to required conditions, to yield desired learning outcomes. Any evaluation of the effectiveness of different instructional media for achieving different learning outcomes must consider the nature of the knowledge, skills, and attitudes being taught, and match the strengths of the media under consideration to the targeted curricular tasks. To illustrate this point, consider the following evolution. The US Navy has used a VR-enabled parachute descent training device to teach aircrew how to recognize and resolve problems encountered during their parachute descent. IROK, the mnemonic memory aid taught to the students, stands for: Inflate the life preserver, Release the raft, Options, and Koch connector release. The decision-making training presents the trainee with simulated malfunctions and depending on the situation which options he/she should take such as when to jettison the attached seat kit. Execution of these tasks requires both visual and tactile access to the body-worn equipment (parachute release, life preserver, seat pan). Unfortunately, the VR device shown in Figure 1.17 occludes visual access to this equipment. To address this issue, the US Navy worked with small businesses to develop a solution that supports both aviators’ capability to execute the IROK procedures, identify and resolve parachute canopy issues, and navigate to a safe landing spot (N161-007, n.d.). This upgraded device is already in use at the Navy Aviation Survival Training Centers in Pensacola. Like its predecessor, this training device is designed to teach naval aviation students how to respond to parachute malfunctions, get and maintain control of the parachute, and perform operations needed to prepare for landing. Students hang suspended from a parachute harness, handle control lines, and interact with their safety equipment. The parachute canopy and physical environment are

48

Human Factors in Simulation and Training

FIGURE 1.17  Legacy parachute descent trainer using VR goggles that inhibited trainees’ ability to see and interact with flight gear such as parachute release, life preserver, and seat pan. (https://www​.dvidshub​.net ​/image​/653875​/aviation​-survival​-training​- center​-jacksonville.) The appearance of U.S. Department of Defense (DoD) visual information does not imply or constitute DoD endorsement.

rendered on the four monitors in the Unity game environment. The incorporation of monitors in lieu of VR goggles as depicted (Figure 1.18) remedied some mismatches between the training tasks and the visual medium. Specifically, using the previous VR system: • Students could not see their hands, their gear, or the risers they had to grasp to control their direction during descent. • Inflation of the survival vest during descent occluded the trainee’s field of view, eliminating any training advantage otherwise yielded by the 360– FOV provided by VR headset. • The time required to don/doff and calibrate VR headsets significantly reduced the throughput (number of students trained per unit time). These mismatches reduced the students’ ability to master the intellectual and motor skills required for proper parachute decent. Replacement of the VR headset with the flat panel screens remedied these limitations and increased student throughput. Additional situations in which AR tools are likely to yield training efficacy are discussed in Bond et al. (2018) and Militello et al. (2023). This section of the chapter provided an overview of XR technology and highlighted applications for both training and operational use with an emphasis on aviation and medical domains. To be clear, this is not intended to be an exhaustive discussion of potential future applications of XR technology, but instead to provide an overview of opportunities on the horizon for XR to change the way we work. The next section will extend this discussion to the use of AR technology in decision-making.

Human Factors in Simulation and Training

49

FIGURE 1.18  Upgraded Immersive Parachute Descent Procedure, Malfunction, & Decision-Making Training System in use at ASTC Pensacola (PAO Clearance 21-ORL023). The appearance of U.S. Department of Defense (DoD) visual information does not imply or constitute DoD endorsement.

AUGMENTED REALITY IN DECISION-MAKING In this section, we describe how augmented reality can facilitate the development of recognition skills for improved decision-making. Simulations of cognitive processes emerged in the 1980s. Kahneman et al. (1982) developed a simulation heuristic describing how decision-makers develop mental simulations. In fact, Daniel Kahneman, a psychologist, won the Nobel Prize in Economics in 2002 for his work in decision-making. He and his colleagues described how individuals use rules of thumb as opposed to a more rational analysis and the importance of properly framing the question. Subsequently, Klein and Crandall (1995, p. 324) defined mental simulation as “the process of consciously enacting a sequence of events.” They describe how urban fire ground commanders (FGCs) use mental simulation to allocate resources and direct their firefighters. According to Klein and Crandall, 72% of the accounts of mental simulation in their database referred explicitly to visual imagery. Recognition skills require the participant to rapidly size up the situation and know what actions to take. Examples include organizational commanders and individuals making decisions under stress, firefighters locating individuals trapped in burning buildings, individuals learning how to escape from flooding compartments, and

50

Human Factors in Simulation and Training

SWAT personnel engaged in hostage rescue events. Sushereba et al. (2019) describe how an experienced US Army Apache helicopter pilot recognized a deviant wind pattern in the desert as a precursor to a larger sandstorm and diverted to a safer landing site, as well as how, during a routine assessment of a premature infant, a neonatal intensive care nurse recognized a potentially deadly infection based on a combination of nonthreatening symptoms. Both are examples of individuals whose prior experiences allowed them to identify relevant cues and meaningfully integrate them prior to making decisions. Recognition skills training is perhaps best accomplished by providing immersive and engaging scenario-based learning experiences. Militello et al. (2023, p. 49) specify the following three components of recognition skills: 1. Knowing “what to attend to, including which cues are relevant in which contexts.” 2. Recognizing a situation and knowing how to act, “learners must be able to create meaning from the cues available.” 3. Having “sophisticated mental models allows learners to evaluate potential interventions before acting by mentally playing them out, also known as mental simulation.” Based on their experience in applying recognition skills in multiple domains, Militello et al. (2023, p. 6) developed a handbook to “support training technology developers, training designers, and trainers in leveraging the strengths of augmented reality for training recognition skills.” The nine design principles detailed in their handbook and listed in Table 1.1 are intended to facilitate the learning of relevant cues and clusters of cues, interpreting the situation correctly, and anticipating problems using mental simulation. Space limitations preclude expansion on each of the principles listed in Table 1.1, so we refer the reader to the author’s handbook (Militello et al., 2023) for a detailed discussion of the pedological principles involved in leveraging AR in designing training systems.

THE PERFECT STORM REVISITED: THE FUTURE OF HUMAN FACTORS IN SIMULATION AND TRAINING We return to the perfect storm analogy presented at the beginning of this chapter to forecast the major challenges and opportunities as perceived by the authors.

Technological Trends • The variety and quality of training devices will continue to increase, as will the hype. Therefore, we need to constantly verify that the technology is appropriate for the training objectives, as or more effective than traditional training, safe, and cost effective.

51

Human Factors in Simulation and Training

TABLE 1.1 Principles for Designing AR-based Recognition Skills Training Principle

Definition

1. Fidelity and Realism Principles Sensory fidelity principle Scaling fidelity principle Assessment-action pairing principle 2. Scenario-Building Principles Periphery principle

Perturbation principle

Realistic cues are needed to support perceptual skill development Virtual props should be at a scale close to the real world It is important to create a learning experience that allows the learner to both assess the situation and act Effective scenarios should include critical cues that are not obvious; rather, the learner must know to look for them and correctly interpret them Training scenarios should expose trainees to novel conditions requiring adaptation and performance under non-routine conditions

3. Building Mental Models Principles Mental model articulation Training techniques that require the learners to articulate what they principle are noticing, how they are assessing the situation, and predictions about how the situation will evolve and aid the learner in developing coherent mental models Many variations principle Training techniques that expose the learner to many variations with different levels of difficulty support development of robust mental models 4. Scaffolding and Feedback Principles Scaffolding principle Scaffolding can be used to promote recognition skill development at different developmental stages Reflection principle

Learners who reflect on the training experience are better able to extract insights from the experience and apply them to future performance

Source: Adapted with permission from Militello et al. (2023)

• Trainees will be equipped with wearable computing technologies which will record their physiological data as they train and provide tactile and kinesthetic cues. • The use of artificial neural networks and AI will become more common, as will the use of avatars as intelligent personalized tutors. • The availability of commercial off-the-shelf (COTS) computers, displays, and networking technologies will increase. • Simulations using models of individual, team, and organizational behavior will become more common. These models will facilitate the understanding of cross-cultural differences.

52

Human Factors in Simulation and Training

• Training devices will rely more heavily on open architecture, making it easier to integrate new training technologies. • Content development and integration/evaluation of XR technologies will continue to be the key cost drivers for XR training applications. • Technology costs will continue to decrease. • Market-driven innovations in XR technology will resolve the device fidelity and user fatigue concerns discussed in the XR section of this chapter. • Integrating XR devices into existing, legacy, closed-architecture training systems will be much more challenging than either integrating with an open-architecture training system or building a stand-alone training system with a specific XR device in mind. • As with all new technology, managing the users’ initial experience to ensure it is positive will continue to be essential to user acceptance of new XR technologies.

Computation Power • The fusion of inexpensive computation power, affordable broadband, and wireless networks will continue to increase, as costs, size, and power requirements decrease. • The Department of Energy’s exascale computing project (www​ .exascaleproject​.org) has as the next milestone 1018 operations per second. This will be 1,000 times faster than the current fastest petascale (1015) supercomputers enabling more realistic simulation of real-world processes and interactions. Udoh (2011) provides a point of reference, today’s desktop personal computer’s processor (e.g., i7 core) can execute less than 300 gigaflops per second. A gigaflop is 109 operations. Current affordable supercomputers are over one million times faster per second (106). • The continued development of high-performance computing (HPC)/ supercomputers will make it easier to design, access, and integrate simulations into virtual environments. • The proliferation of cloud computing, maturation of artificial intelligence, and open-source data analytics tools are rapidly changing both the scope and nature of data analyses performed in applied settings and in organizations. Use and interpretation of “big data” analyses requires more sophisticated tools and an ever-expanding combination of coding and visualization skillsets. • DOE initiative to make AI models and data that adhere to FAIR data principles (Findable, Accessible, Interoperable, and Reusable) to produce data reusable both by researchers and machines with little human intervention. Exascale high-performance computing will accelerate the development of AI (Huerta et. al., 2020). • Agent-based simulations and avatars will become more accurate computational representations of “self,” “group,” “organization,” “city,” and

Human Factors in Simulation and Training

“society.” These simulations can alter their behavior through learning and their interactions to meet programmed goals. Their continued development will make it easier to design, access, and integrate simulations into virtual environments. This interactive environment will improve the education of individuals and teams that in the future will have access to patient, intelligent, adaptive tutors and coaches. • Students and trainees will have 24/7 access to virtual training ranges to learn how to partner with each other and their AI team mates. • The UAV commercial and military applications market will increase. The commercial drone market is expected to grow from $14 billion in 2018 to over $43 billion by 2024 (Research and Markets, March, 2019). The demand for simulators, trained operators, and maintainers will increase proportionately. • Managers, educators, and learners will expect bigger and better simulations in the future. Innovations in Education and Training • As working from home becomes more commonplace, so will the expectancy to be educated and use simulators at home. • Training anyplace, anytime will increase, particularly as “working from home” becomes more acceptable. The use of simulators for training by businesses will increase. • As was demonstrated during the COVID-19 pandemic, hybrid learning is feasible but, while it is a viable mode of learning/training, significant investigation is needed to identify and remove the barriers to the effective use of hybrid learning in educating future learners. • Regulatory agencies will need to establish processes for specifying XR training device requirements and certification. The Changing Nature of Education • Simulation will continue to blend training, mission planning, rehearsal, and execution into one continuum. • Educators in high-risk environments such as the medical, chemical, nuclear, and aerospace industries will expect more tools that better enable them to deliver realistic training in safe environments which increase training transfer while reducing overall risk. • Cohort training will decrease, while AI will facilitate the accommodation of individual differences in learning styles and rates. • The criteria for advancement will be demonstrated performance to a criterion rather than the number of class hours completed. • More individuals will be certified as qualified based on their demonstrated performance on simulators.

53

54

Human Factors in Simulation and Training

Acceptance of Simulation/Gaming • Cloud gaming is the new reality. Cloud gaming stores everything about the game in the cloud and streams all the data to the gaming device. Thus, image quality and response lag are expected to be minimized. • Not only has gaming been accepted by the public but it is also expected to grow exponentially. In 2019, the global gaming market was valued at $151.55 billion; and it’s expected to increase to $206 billion by 2025 (https://www​ .gameindustry​.com​/news​-industry​-happenings​/gaming​-industry​-growth​ -trends​-and​-forecast​-2020​-2025/). • While we are a long way from the virtual environment matching the fidelity of the real environment, the fidelity of the virtual environment will continue to increase. This increase needs to be matched with a commensurate increase in the consideration of the safety of users immersed in these rich virtual environments. Strategies for combating simulator sickness, ensuring the physical safety of those using AR devices, and distinguishing the real and virtual environments quickly and reliably in an MR application must be developed and validated. When considering the challenges/opportunities for human factors in simulation and training, we should keep John Miles’s system design question in mind. Miles asked: “Can this person with this training perform these tasks to these standards under these conditions?” To answer that question, the simulation/training designer needs to adequately describe each of the italicized nouns which are expanded on briefly below: • What are the characteristics of the trainee/trainees? • What training have they received? • What task(s) do they need to be trained for and when do they need to be trained? • Exactly what are the standards to which they need to perform? • Under what conditions is/are the trainee/trainees expected to perform?

ACKNOWLEDGMENTS The authors acknowledge the assistance of friends in the modeling and simulation community, who not only shared material from their archives with us but reviewed portions of the chapter. The authors also thank their wives Kathy Moroney, Darlene Lilienthal, Shannon Foster, and their families for their patience and support.

DISCLAIMER The views presented are those of the authors and do not necessarily represent the views of the DoD or its components.

Human Factors in Simulation and Training

55

REFERENCES A16-076. (n.d.). Combat casualty care augmented reality intelligent training systems (C3ARESYS). https://www​.sbir​.gov​/sbirsearch​/detail​/1514421 Adorian, P., Staynes, W. N., & Bolton, M. (1979). The evolution of the flight simulator. Proceedings of Conference: 50 Years of Flight Simulation, Vol. 1. London: Royal Aeronautical Society, pp. 1–23. Aegis Research Corporation. (2002). War games bibliography. In International Conference on Grand Challenges for Modeling and Simulation, Lunceford, W. H. & Page, E. H., Eds. Society for Modeling and Simulation (www​.scs​.org). AF191-004. (n.d.). A Virtual Reality Content Creation, Training and Assessment Tool for the Department of Defense. https://www​.sbir​.gov​/sbirsearch​/detail​/1861567 Albery, C. B., Gallagher, H. L., & Caldwell, E. (2008). Neck muscle fatigue resulting from prolonged wear of weighted helmets. DTIC. https://apps​.dtic​.mil ​/dtic​/tr​/fulltext ​/u2​/ a491626​.pdf Alexis. (2020, June 26). Fixed base versus full-motion simulators. Simulator review. Retrieved June 23, 2021, from https://simulatorreview​.com ​/fixed​-base​-vs​-full​-motion​-simulators/ Allen, T. B. (1989). War Games: The Secret World of the Creators, Players, and Policy Makers Rehearsing World War III Today. Berkley Publishing Group. AN/USQ-T46 Battle Force Tactical Training (BFTT). Retrieved June 23, 2021, from https:// www​.navy​.mil ​/ Resources​/ Fact​-Files​/ Display​-FactFiles​/Article​/2166789​/anusq​-t46​ -battle​-force​-tactical​-training​-bftt ​/no/ Andersen, S. A. W., Konge, L., & Sorensen, M. S. (2018). The effect of distributed virtual reality simulation training on cognitive load during subsequent dissection training. Medical Teacher, 40, 684–689. Balci, O. (1998). Verification, validation, and testing. Handbook of Simulation, 10(8), 335–393. Balci, O., Nance, R. E., Arthur, J. D., & Ormsby, W. F. (2002, December). Expanding our horizons in verification, validation, and accreditation research and practice. In Proceedings of the Winter Simulation Conference (Vol. 1, pp. 653–663). IEEE. Barfield, W., & Furness, T. A. III (Eds.). (1995). Virtual Environments and Advanced Interface Design. Oxford University Press. Basu, S., Dickes, A., Kinnebrew, J. S., Sengupta, P., & Biswas, G. (2013, May). CTSiM: A computational thinking environment for learning science through simulation and modeling. In CSEDU (pp. 369–378). Baudhuin, E. S. (1987). The design of industrial and flight simulators. In Transfer of Learning, Cormier, S. M. & Hagman, J. D., Eds. San Diego, CA: Academic Press, pp. 217–237. Beal, M. D., Kinnear, J., Anderson, C. R., Martin, T. D., Wamboldt, R., & Hooper, L. (2017). The effectiveness of medical simulation in teaching medical students critical care medicine: A systematic review and meta-analysis. Simulation in Healthcare, 12(2), 104–116. Bellenkes, A., Bason, R., & Yacavone, D. W. (1992). Spatial disorientation in naval aviation mishaps: A review of class A incidents from 1980 through 1989. Aviation, Space, and Environmental Medicine, 63(2), 128–131. Beringer, D. B. (1994). Issues in using off-the-shelf PC-based flight simulation for research and training: Historical perspective, current solutions and emerging technologies. Proceedings of the Human Factors and Ergonomics Society 38th Annual Meeting. Santa Monica, CA: Human Factors and Ergonomics Society, pp. 90–94. Bernstein, M. (2000). Grand Eccentrics: Turning the Century: Dayton and the Inventing of America. Wilmington, OH: Orange Frazer Press.

56

Human Factors in Simulation and Training

Blacker, K. J., Pettijohn, K. A., Roush, G., & Biggs, A. T. (2020). Measuring lethal force performance in the lab: The effects of simulator realism and participant experience. Human Factors. DOI: 10.1177/0018720820916975 Boldovici, J. A. (1987). Measuring transfer in military settings. In Transfer of Learning, Cormier, S. M. & Hagman, J. D., Eds. San Diego, CA: Academic Press, pp. 239–260. Bolling, E. (2020, September 25). Beyond reality: New AUGMED tool pushes limits of medical simulation. https://www​.dvidshub​.net​/news​/378712​/ beyond​-reality​-new​ -augmed​-tool​-pushes​-limits​-medical​-simulation Bond, A., Neville, K., Mercado, J., Massey, L., Wearne, A., & Ogreten, S. (2018). Evaluating training efficacy and return on investment for augmented reality: A theoretical framework. Proceedings of the 2018 Interservice/Interagency Training, Simulation, and Education Conference. Boothe, E. M. (1994). A Regulatory View of Flight Simulator Qualification, Flight Simulation Update, 10th ed. Binghamton, NY: SUNY Watson School of Engineering. Bork, F., Lehner, A., Eck, U., Navab, N., Waschke, J., & Kugelmann, D. (2020). The effectiveness of collaborative augmented reality in gross anatomy teaching: A quantitative and qualitative pilot study. Anatomical Sciences Education. https://doi​-org​ .proxy​.lib​.fsu​.edu​/10​.1002​/ase​.2016 Boud, A. C., Haniff, D. J., Baber, C., & Steiner, S. J. (1999). Virtual reality and augmented reality as a training tool for assembly tasks [Conference session]. IEEE International Conference on Information Visualization (Cat. No. PR00210). London, England (pp. 32–36). Burki-Cohen, J., Sparko, A., & Go, T. (2007, August). Training value of a fixed-base flight simulator with a dynamic seat. In AIAA Modeling and Simulation Technologies Conference and Exhibit (p. 6564). Caffrey, M. B. (2019). On Wargaming: How Wargames Have Shaped History and How They May Shape the Future (Vol. 43). Naval War College Press. Carnahan, C. D. (2012). The Effects of Learning in an Online Virtual Environment on K-12 Students. Indiana University of Pennsylvania. Caro, C.W. (1979). Development of simulator instructional feature design guides.In Proceedings of Conference: Fifty Years of Flight Simulation (p. 75). London: Royal Aeronautical Society. Caro, P. W. (1988). Flight training and simulation. In Human Factors in Aviation (pp. 229– 261). New York: Academic Press. Cisco, U. (2020). Cisco Annual Internet Report (2018–2023) White Paper. https://www​.cisco​ .com​/c​/en​/us​/solutions​/collateral​/executive​-perspectives​/annual​-internet​-report​/white​ -paper​-c11​-741490​.html Cook, D. A., Brydges, R., Hamstra, S. J., Zendejas, B., Szostek, J. H., Wang, A. T., Erwin, P. J., & Hatala, R. (2012). Comparative effectiveness of technology-enhanced simulation versus other instructional methods: A systematic review and meta-analysis. Simulation in Healthcare, 7(5), 308–320. doi: 10.1097/SIH.0b013e3182614f95. https://journals​ .lww​.com ​/sim​ulat​ioni​nhea​lthcare​/ Fulltext ​/2012​/10000​/Comparative​_ Effectiveness​_of​ _Technology​_ Enhanced​.6​.aspx Custer, L. L. (1930). Aviation training machine, U. S. Patent No. 481,831. Washington, DC: United States Patent Office. New York: Academic Press, pp. 229–261. D’Angelo, C., Harris, C., & Rutstein, D. (2013). Systematic Review and Meta-Analysis of STEM Simulations. Menlo Park, CA: SRI International. Davis, F. D., Bagozzi, R. P., & Warshaw, P. R. (1989). User acceptance of computer technology: A comparison of two theoretical models. Management Science, 35, 982–1002. Department of Defense. (2009). DoD Instruction on DoD Modeling and Simulation VV&A (DoDI 5000.61 December 9, 2009, Incorporating Change 1, October 15, 2018) DoDI

Human Factors in Simulation and Training

57

5000.61, December 9, 2009, Incorporating Change 1, October 15, 2018. Retrieved from https://www​.esd​.whs​.mil​/ Portals​/54​/ Documents​/ DD​/issuances​/dodi​/500061p​.pdf Dias, P. L., Greenberg, R. G., Goldberg, R. N., Fisher, N., & Tanaka, D. T. (2021). Augmented reality-assisted video laryngoscopy and simulated neonatal intubations: A pilot study. Pediatrics, 147(3), 1–11. doi: https://doi​.org​/10​.1542​/peds​.2020​- 005009 DoD Enterprise DevSecOps Reference Design Version 1.0 12 August 2019. https://dodcio​ .defense​.gov​/ Portals​/0​/ Documents​/ DoD​%20Enterprise​%20DevSecOps​%20Reference​ %20Design% DoD Directive 5000.59 August 8, 2017 Incorporating Change 1, October 15, 2018, DoD Modeling and Simulation (M&S) Management, Retrieved from DoDD 5000.59 – Executive Services Directorate. Doolani, S., Wessels, C., Kanal, V., Sevastopoulos, C., Jaiswal, A., Nambiappan, H., & Makedon, F. (2020). A review of Extended Reality (XR) technologies for manufacturing training. Technologies, 8, 77. Drascic, D., & Milgram, P. (1996, April). Perceptual issues in augmented reality. Stereoscopic Displays and Virtual Reality Systems III, 2653, 123–135. Driscoll, M. P. (2000). Gagne’s theory of instruction. In Psychology of Learning for Instruction, Allyn & Bacon (Eds.). Florida State University Press, pp. 341–372. Eckstein, B. M. (2018, June 13). Naval Safety Center Standing Up Data Analytics Office Amid Surface, Aviation Mishap Increases. USNI News. https://news​.usni​.org​/2018​/06​ /13​/naval​-safety​- center​-standing​-data​-analytics​-office​-amid​-surface​-aviation​-mishap​ -increases Eckstein, B. M. (2018, June 22). Less Experienced Maintainers Contribute to Rise in Naval Aviation Mishaps. USNI News. https://news​.usni​.org​/2018​/06​/22​/ less​-experienced​ -maintainers​-contribute​-rise​-naval​-aviation​-mishaps Electronics Technician Training. (2018, November 16). How Simulation Tools are Transforming Education and Training. Retrieved from https://www​.etcourse​.com​/ simulation​-tools​-transform​-education​-and​-training​.html Eshet-Alkalai, Y., Caspi, A., Eden, S., Geri, N., Tal-Elhasid, E., & Yair, Y. (2010). Challenges of integrating technologies for learning: Introduction to the IJELLO special series of Chais Conference 2010 best papers. Interdisciplinary Journal of E-Learning and Learning Objects, 6(1), 239–244. Esteves, M., Fonseca, B., Morgado, L., & Martins, P. (2011). Improving teaching and learning of computer programming with the Second Life virtual world. British Journal of Educational Technology, 42(4), 624–637. Fischetti, M. A. & Truxal, C. (1985). Simulating “The right stuff.” IEEE Spectrum, 22(3), 38–47. Flavian, C., Ibanez-Sanchez, S., & Orus, C. (2019). The impact of virtual, augmented, and mixed reality technologies on the customer experience. Journal of Business Research, 100(July), 547–560. Flexman, R. E., Roscoe, S. N., Williams, A. C., & Williges, B. H. (1972). Studies in pilot training: The anatomy of transfer. Aviation Research Monographs, 2(1). Champaign, IL: University of Illinois Aviation Research Laboratory. Gagne, R. M., & Medsker, K. L. (1996). The conditions of learning: Training applications. Fort Worth: Harcourt Brace College Publishers. Gawron, V. J. (13 May 2019). Simulation applications in training. Presented at the Flight and Ground Simulation Update, Binghamton University. Gawron, V. J. (2000). Human Performance Measures Handbook. Hillsdale, NJ: Lawrence Erlbaum Associates. Geiselman, E. E., Johnson, C. M., Buck, D. R., & Patrick, T. (2013). Flight deck automation: A call for context-aware logic to improve safety. Ergonomics in Design, 21(4), 13–18.

58

Human Factors in Simulation and Training

Gibb, R., Ercoline, B., & Scharff, L. (2011). Spatial disorientation: Decades of pilot fatalities. Aviation, Space, and Environmental Medicine, 82(7), 717–724. https://doi​.org​/10​.3357​ /asem​.3048​.2011 Gray, A. (2017, June 8). New realistic submarine bridge trainer opens at Trident Training Facility Bangor. https://www​.militarynews​.com​/norfolk​-navy​-flagship​/news​/ quarterdeck ​/new​-realistic​-submarine​-bridge​-trainer​-opens​-at​-trident​-training​-facility​ -bangor​/article​_88eba425​-df65​-5003​-8cc6​-d50df80e0a51​.html Gregg, A., & Greene, J. (2021). Microsoft wins $21 billion Army contract for augmented reality headsets. The Washington Post. https://www​.washingtonpost​.com​/ business​ /2021​/03​/31​/microsoft​-army​-augmented​-reality/ Gregory, T. M., Gregory, J., Sledge, J., Allard, R., & Mir, O. (2018) Surgery guided by mixed reality: Presentation of a proof of concept. Acta Orthopaedica, 89(5), 480–483. Gross, B., Rusin, L., Kiesewetter, J., Zottmann, J. M., Fischer, M. R., Prückner, S., & Zech, A. (2019). Crew resource management training in healthcare: A systematic review of intervention design, training conditions and evaluation. BMJ Open, 9(2), e025247. Hays, R. T., Vincenzi, D. A., Seamon, A. G., & Bradley, S. K. (1998) Training effectiveness evaluation of the VESUB technology demonstration system. Technical Report 98-003, Naval Air Warfare Center Training Systems Division. https://apps​.dtic​.mil​/sti​/pdfs​/ ADA349219​.pdf Heim, M. (1998). Virtual Realism. Oxford: Oxford University Press. Heinrich, F., Schwenderling, L., Joeres, F., Lawonn, K., & Hansen, C. (2020) Comparison of augmented reality display techniques to support medical needle insertion.  IEEE Transactions on Visualization and Computer Graphics, Visualization and Computer Graphics, 26(12), 3568–3575. doi:10.1109/TVCG.2020.3023637 Helmreich, R. L., & Schaefer, H. G. (1998). Team performance in the operating room. In Human Error in Medicine, Bogner, M. S., Ed. Hillsdale, NJ: Lawrence Erlbaum Associates. Helmreich, R. L., Wilhelm, J. A., Gregorich, S. E., & Chidester, T. R. (1990). Preliminary results from the evaluation of cockpit resource management training: Performance ratings of flight crews, Aviation, Space, and Environmental Medicine, 61, 576–579. Hiskens, I. A., Peng, H., & Fathy, H. K. (2011, July). Transportation electrification education for k-12 students. In 2011 IEEE Power and Energy Society General Meeting (pp. 1–5). IEEE. Hofstadter, D. R. (1979). Godel, Escher, Bach: An Eternal Golden Braid. New York: Vintage Books. Huerta, E. A., Khan, A., Davis, E., Bushell, C., Gropp, W. D., Katz, D. S., ... Saxton, A. (2020). Convergence of artificial intelligence and high-performance computing on NSF-supported cyberinfrastructure. Journal of Big Data, 7(1), 1–12. Insinna, V. (2019, May 2). Defense news. One of the F-35′s cost goals may be unattainable. https://www​.defensenews​.com ​/air​/2019​/05​/02 ​/one ​- of​-the ​-f​-35s​- cost​-goals​-may​-be​ -unattainable/ Jones, E. R., Hennessy, R. T., & Deutsch, S. (1985). Human Factors Aspects of Simulation: Report of the Working Group on Simulation. National Research Council Washington DC Committee on Human Factors. Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgment under Uncertainty: Heuristics and Biases. Cambridge: Cambridge University Press. Kancler, D. E., Quill, L. L., Revels, A. R., Webb, R. R., & Masquelier, B. L. (1998). Reducing cannon plug connector pin selection time and errors through enhanced data presentation methods.  Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 42(18), 1286–1290. https://doi​.org​/10​.1177​/154193129804201802

Human Factors in Simulation and Training

59

Kaplan, A. D., Cruit, J., Endsley, M., Beers, S. M., Sawyer, B. D., & Hancock, P. A. (2019, November). Transfer of training from virtual reality and augmented reality: A metaanalysis extended abstract. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 63, No. 1, pp. 2142–2143). Los Angeles, CA: SAGE Publications. Kaplan, A. D., Cruit, J., Endsley, M., Beers, S. M., Sawyer, B. D., & Hancock, P. A. (2020). The effects of virtual reality, augmented reality, and mixed reality as training enhancement methods: A meta-analysis. Human Factors. https://doi​.org​/10​.1177​/0018720820904229 Kiourexidou, M., Natsis, K., Bamidis, P., Antonopoulos, N., Papathanasiou, E., Sgantzos, M., & Veglis, A. (2015). Augmented reality for the study of human heart anatomy. International Journal of Electronics Communication and Computer Engineering, 6(6), 658–662. Kitroeff, N., & Gelles, D. (2020, January 8). In reversal, Boeing recommends 737 Max simulator training for pilots. New York Times. https://www​.nytimes​.com ​/2020​/01​/07​/ business​/ boeing​-737​-max​-simulator​-training​.html Klein, G. A., & Crandall, B. W. (1995). The role of mental simulation in naturalistic decision making. In Local Applications of the Ecological Approach to Human-Machine Systems, Hancock, P., Flach, J., Caird, J., & Vicente, K., Eds. Mahwah, NJ: Lawrence Erlbaum Associates, Vol. 2, pp. 324–358. Kramida, G. (2016). Resolving the Vergence-accommodation conflict in head-mounted displays. IEEE Transactions on Visualization and Computer Graphics, 22(7), 1912– 1931. https://ieeexplore​.ieee​.org​/document​/7226865 Lane, N. E. (1986). Issues in Performance Measurement for Military Aviation with Applications to Air Combat Maneuvering. Orlando, FL: ESSEX CORP. Lang, B. (2021, March 31). Microsoft Signs $22B Contract with US Army to Bring HoloLens 2 Tech to the Battlefield. U.S. Naval Institute. https://www​.roadtovr​.com​/microsoft​ -hololens​-2​-us​-army​-contract​-production​-phase/ Langley, A., Lawson, G., Hermawati, S., D’cruz, M., Apold, J., Arlt, F., & Mura, K. (2016). Establishing the usability of a virtual training system for assembly operations within the automotive industry. Human Factors and Ergonomics in Manufacturing & Service Industries, 26(6), 667–679. Lindenberger Group. (2017). 8  top benefits of training simulations in the workplace. Retrieved from https://lindenbergergroup​.com ​/8​-top​-benefits​-training​-simulations​-workplace/ Lintern, G. (1991). An informational perspective on skill transfer in human-machine systems. Human Factors, 33(3), 251–266. Lintern, G., Roscoe, S. N., & Sivier, J. E. (1990). Display principles, control dynamics, and environmental factors in augmentation of simulated visual scenes for teaching air-toground attack, Human. Factors, 32, 299–371. Liu, X., Sun, J., Zheng, M., & Cui, X. (2021). Application of mixed reality using optical see-through head-mounted displays in transforaminal percutaneous endoscopic lumbar discectomy. BioMed Research International, 1–8. https://www​.ncbi​.nlm​.nih​.gov​/pmc​/ articles​/ PMC7902133/ Loesch, R. L., & Waddell, J. (1979). The importance of stability and control fidelity in simulation. Proceedings of Conference: 50 Years of Flight Simulation. London: Royal Aeronautical Society, pp. 90–94. Losey, S. (2018, September 30). The Air Force is revolutionizing the way airmen learn to be aviators. Air Force Times. https://www​.airforcetimes​.com ​/news​/your​-air​-force​/2018​/09​ /30​/the​-air​-force​-is​-revolutionizing​-the​-way​-airmen​-learn​-to​-be​-aviators/ Lu, Y., Deng, B., Yan, Y., Qie, Z., Li, J., & Gui, J. (2019, December 9). Vergence-Accommodation Conflict Potential Solutions in Augmented Reality Head Mounted Displays. AIP Publishing. https://aip​.scitation​.org​/doi​/abs​/10​.1063​/1​.5137845​?journalCode​=apc.

60

Human Factors in Simulation and Training

Lutz, R. R., Frederick, P. S., Walsh, P. M., Wasson, K. S., & Fenlason, Nl. L. (2017). Integration of unmanned aircraft systems into complex airspace environments. Johns Hopkins APL Technical Digest, 33(4), 291–302. www​.jhuapl​.edu​/techdigest Magnuson, S. (2019, Jan 2). Services declare breakthrough in LVC training. National Defense: The Business and Technology Magazine of NIDA, 103(782), 12. Retrieved from http:// www​.nat​iona​ldef​ense​magazine​.org ​/articles​/2019​/1​/2​/services​- declare​-breakthrough​ -in​-lvc​-training Mazuryk, T., & Gervautz, M. (1996). Virtual Reality: History, Applications, Technology, and Future. Vienna: Institute of Computer Graphics Vienna University of Technology. McCoy-Fisher, C., Mishler, A., Bush, D., Severe-Valsaint, G., Natali, M., Riner, B., & Naval Air Warfare Center Training Systems Division. (2019). Student Naval Aviation Extended Reality Device Capability Evaluation (NAWCTSD-TR-2019-001). DTIC. https://apps​.dtic​.mil​/sti​/citations​/AD1103227 Mearian, L. (2017). CW@ 50: Data storage goes from $1 M to 2 cents per gigabyte. Computerworld. https://www​.computerworld​.com ​/article​/3182207​/cw50 ​-data​-storage​ -goes​-from​-1m​-to​-2​-cents​-per​-gigabyte​.html) Milgram, P., Takemura, H., Utsumi, A., & Kishino, F. (1995). Augmented reality: A class of displays on the reality virtuality continuum. SPIE: Telemanipulator and Telepresence Technologies, 2351, 282–292. http://etclab​.mie​.utoronto​.ca​/publication​/1994​/ Milgram​ _Takemura​_ SPIE1994​.pdf Milgram, P., & Kishino, F. (1994). A taxonomy of mixed reality visual displays. IEICE Transactions on Information and Systems, 77, 1321–1329. Militello, L. G., Sushereba, C. E., & Ramachandran, S. (2023). Handbook  of Augmented Reality Training Design Principles. Cambridge: Cambridge University Press. https:// unveilsystems​.com/ Miller, R. B. (1954). Psychological Considerations in the Design of Training Equipment (Tech. Rep. No. 54-563). Wright Patterson AFB, OH: Wright Air Development Center. Mixon, T. R., & Moroney, W. F. (1982). An Annotated Bibliography of Objective Pilot Performance Measures. Orlando, FL: Naval Training Equipment Center. Moreno-Ger, P., Burgos, D., & Torrente, J. (2009). Digital games in eLearning environments: Current uses and emerging trends. Simulation & Gaming, 40(5), 669–687. Moroney, W. F., & Moroney, B. W. (2010). Flight simulation. In Handbook of Aviation Human Factors, Wise, J., Hopkin, V. D., & Garland, D., Eds. Boca Raton, FL: Taylor and Francis Group. Morse, K. L., Coolahan, J., Lutz, R., Horner, N, Vick, S., & Syring, R. (2010). Best Practices for the Development of Models and Simulations. John Hopkins University – Applied Physics Laboratory, NSAD-R-2010-037. Final Report, June 2010. https:// www​.acqnotes​.com ​/Attachments​/JHU​%20APL​%20Best​%20Practices​%20for​%20the​ %20Development​%20of​%20M​&A,​%20June​%2010​.pdf Mueller, S. (2021, April). Our Sims. Ourselves Wired, 20–21. Mulenburg, J. (2011). Crew resource management improves decision making. ASK Magazine is published by NASA, Washington DC, 11–13. N161-007. (n.d.). SkyFall. https://www​.navysbir​.com​/16​_1​/12​.htm N172-117. (n.d.). Mishap awareness scenarios and training for operational readiness responses. https://www​.sbir​.gov​/node​/1254605 N192.087. (n.d.). Headset Equivalent of Advanced Display Systems (HEADS). https://www​ .navysbir​.com​/n19​_2​/ N192​- 087​.htm N193-D01. (n.d.). On demand training solutions for maintenance technicians. https://www​ .navysbir​.com​/n19​_3​/ N193​-D01​.htm N193-D01. (n.d.). On demand training solutions for maintenance technicians. https://www​ .navysbir​.com​/n19​_3​/ N193​-D01​.htm

Human Factors in Simulation and Training

61

N201-024. (n.d.). Augmented reality headset for maintainers. https://www​.navysbir​.com​/n20​ _1​/n201​- 024​.htm Natali, M., Mercado, J., & Foster, C. (2020, Aug, p 19–22). Integrating AI with aviation training: A preview of the virtual instructor pilot referee. Call Signs. https://navyaep​ .com ​/wp​-content​/uploads​/2020​/08​/Call​-Signs​-9​.1​.pdf Nausheen, F., & Bhupathy, R. (2020). Disrupting clinical education: Point‐of‐care ultrasound merged with HoloLens augmented reality during early medical training.  Clinical Teacher, 17(2), 146–147. https://doi​-org​.proxy​.lib​.fsu​.edu​/10​.1111​/tct​.13155 Naval Air Warfare Center Training Systems Division (NAWCTSD) Public Affairs Office (PAO). (2018). H-60R AR preflight checklist trainer: A tablet-based Augmented Reality (AR) preflight checklist trainer for aircrew and student pilots. Project Summary Presented at the 2018 Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC). https://www​.navair​.navy​.mil​/nawctsd​/sites​/g​/files​/jejdrs596​/ files​/2019​- 01​/iitsec​_2018​_fact​_sheet​_ h​- 60r​_ar​_dist​_statement​_a​.pdf Naval Technology. (2021). MQ-4C Triton Broad Area Maritime Surveillance (BAMS) UAS: MQ-4C Triton is a new broad area maritime surveillance (BAMS) unmanned aircraft system (UAS) unveiled by Northrop Grumman for the US Navy. https://www​.naval​ -technology​.com ​/projects​/mq​- 4c​-triton​-bams​-uas​-us/ Nestler, V., Moore, E. L., Huang, K. Y. C., & Bose, D. (2013). The use of second life® to teach physical security across different teaching modes. In Information Assurance and Security Education and Training (pp. 188–195). Berlin, Heidelberg: Springer. https:// www​.youtube​.com​/watch​?v​=I5VXf5Ky5ms Office of the Under Secretary of Defense for Personnel and Readiness. (2018, March 16 with Change 1 Effective February 15, 2022). Department of Defense Instruction 1322.24: Medical Readiness Training (MRT). Retrieved from: https://www​.esd​.whs​.mil​/ Portals​ /54​/ Documents​/ DD​/issuances​/dodi ​/132224p​.pdf ​?ver ​=pDa​e3SN​8brd ​n RhN ​UZFrztw​ %3D​%3D Orlansky, J., & String, J. (1977). Cost-Effectiveness of Flight Simulator for Military Training (Rep. No. IDA NO. HQ 77-19470). Arlington, VA: Institute for Defense Analysis. Perla, P. P. (1990). The Art of Wargaming: A Guide for Professionals and Hobbyists. Naval Institute Press. Petty, M. D. (2010). Verification, validation, and accreditation. Modeling and Simulation Fundamentals: Theoretical Underpinnings and Practical Domains, 325–372. Picciano, A. G., Seaman, J., Shea, P., & Swan, K. (2012). Examining the extent and nature of online learning in American K-12 education: The research initiatives of the Alfred P. Sloan Foundation. The Internet and Higher Education, 15(2), 127–135. Pohman, D. L., & Fletcher, J. D. (1999). Aviation personnel selection and training. In Handbook of Aviation Human Factors, Daniel, T. J., Garland, J. W., & Hopkin, V. D., Eds. Mahwah, NJ: Lawrence Erlbaum Associates. Poisson, R. J., & Miller, M. E. (2014). Spatial disorientation mishap trends in the U.S. Air Force 1993–2013. Aviation, Space, and Environmental Medicine, 85(9), 919–924. https://doi​.org​/10​.3357​/ASEM​.3971​.2014 Pope, T. M. (2019). A cost-benefit analysis of pilot training next. Theses and Dissertations. 2314. https://scholar​.afit​.edu​/etd​/2314 Qian, M., Nicholson, J., Tanaka, D., Dias, P., Wang, E., & Qiu, L. (2019). Augmented reality (AR) assisted laryngoscopy for endotracheal intubation training. In Proceedings from the International Conference on Human-Computer Interaction; July 26–31, Orlando, FL. Raser, J. R. (1969). Simulation and Society: An Exploration of Scientific Gaming. Boston: Allyn & Bacon. https://eric​.ed​.gov/​?id​=ED043220

62

Human Factors in Simulation and Training

Research and Markets. (2019, March). The drone market report 2019: Commercial drone market size and forecast (2019–2024). https://www​.researchandmarkets​.com ​/reports​ /4764173​/the​-drone​-market​-report​-2019​-commercial​-drone Ribeiro, R., Ramos, J., Safadinho, D., Reis, A., Rabadão, C., Barroso, J., Pereira, A., & Web, A. R. (2021). Solution for UAV pilot training and usability testing. Sensors, 21, 1456 https://doi​.org​/10​.3390​/s21041456 Ricci, K. E., Salas, E., & Cannon-Bowers, J. A. (1996). Do computer-based games facilitate knowledge acquisition and retention? Military Psychology, 8(4), 295–307. Rizzo, A. A., & Shilling, R. (2017). Clinical virtual reality tools to advance the prevention, assessment, and treatment of PTSD. European Journal of Psychotraumatology, 8(Suppl 5), 1414560. Rizzo, A., Koenig, S. T., & Talbot, T. B. (2019). Clinical results using virtual reality. Journal of Technology in Human Services, 37(1), 51–74. https://doi​.org​/10​.1080​/15228835​.2019​ .1604292 Roese, N. J., & Olson, J. M. (1995). What Might Have Been: The Social Psychology of Counterfactual Thinking. Mahwah, NJ: Lawrence Erlbaum Associates. Rokhsaritalemi, S., Sadeghi-Niaraki, A., & Choi, S.-M. (2020). A review on mixed reality: Current trends, challenges and prospects.  Applied Sciences,  10(2), 636. MDPI AG. https://doi​.org​/10​.3390​/app10020636 Roscoe, S. N. (1991). Simulator qualification: Just as phony as it can be. The International Journal of Aviation Psychology, 1(4), 335–339. Roscoe, S. N. & Williges, B. H. (1980). Measurement of transfer of training. In Aviation Psychology, Roscoe, S. N., Ed. Ames, IA: Iowa State University Press, p. 182. Royal Aeronautical Society. (1979). Proceedings of Conference: 50 Years of Flight Simulation, I, II, and III. London: Royal Aeronautical Society. Ruben, P., & Gray, J. (2020). The WIRED guide to virtual reality. https://www​.wired​.com​/ story​/wired​-guide​-to​-virtual​-reality/ Salas, E., Bowers, C. A., & Rhodenizer, L. (1998). It is not how much you have but how you use it: Toward a rational use of simulation to support aviation training. The International Journal of Aviation Psychology, 8(3), 197–208. Salas, E., Rhodenizer, L., & Bowers, C. A. (2000). The design and delivery of crew resource management trainee: Exploiting available resources. Human Factors, 42, 490–511. SB052-009. (n.d.). Virtual instructor pilot exercise referee for CNATRA. https://www​.sbir​ .gov​/sbirsearch​/detail​/1851489 Schrage, M. (2000). Serious Play. Boston: Harvard Business School Press. Schultz, C. (2014). A military contractor just went ahead and used an Xbox controller for their new giant laser cannon. Smithsonian Magazine. Retrieved June 23, 2021, from https:// www​.smithsonianmag​.com​/smart​-news​/military​-contractor​-just​-went​-ahead​-and​-used​ -xbox​-controller​-their​-new​-giant​-laser​-cannon​-180952647 Selvander, M., & Asman, P. (2012). Virtual reality cataract surgery training: Learning curves and concurrent validity. Acta Ophthalmologica, 90, 412–417. Shelbourne, M. (2020). Navy harnessing new technology to restructure aviation training. USNI News, September 14, 2020, https://news​.usni​.org​/2020​/09​/14​/navy​-harnessing​ -new​-technology​-to​-restructure​-aviation​-training Spannaus, T. W. (1978). What is a simulation? Audiovisual Instruction, 235, 16–17. Sparaco, P. (June, 1994). Simulation acquisition nears completion. Av. Week Space Technol., 71. Sparko, A., Burki-Cohen, J., & Go, T. (2010). Transfer of training from a full-flight simulator vs. a high-level flight-training device with a dynamic seat. In AIAA Modeling and Simulation Technologies Conference (p. 8218)

Human Factors in Simulation and Training

63

Stark, E. A. (1994). Training and Human Factors in Flight Simulation, Flight Simulation Update, Vol. 10. Binghamton, NY: SUNY Watson School of Engineering. Sternberg, R. J., & Gastel, J. (1989a). Coping with novelty in human intelligence: An empirical investigation. Intelligence, 13(2), 187–197. Sternberg, R. J., & Gastel, J. (1989b). If dancers ate their shoes: Inductive reasoning with factual and counterfactual premises. Memory & Cognition, 17, 1–10. Sushereba, C., Militello, L., & Patterson, E. (2019, June). The potential of augmented reality for training macrocognitive skills in combat medics. In Proceedings from The International Conference on Naturalistic Decision Making 2019, San Francisco, CA. Sushereba, C. E., & Militello, L. G. (2020, December). Virtual patient immersive trainer to train perceptual skills using augmented reality. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 64, No. 1, pp. 470–472). Los Angeles, CA: SAGE Publications. Sweigart, L., & Hodson-Carlton, K. (2013). Improving student interview skills: The virtual avatar as client. Nurse Educator, 38(1), 11–15. Tadjdeh, Y. (2020, November 25). Air Force embracing new tech to solve pilot shortage. https://www​.nat ​iona ​ldef​ense​magazine​.org ​/articles​/2020​/11​/25​/air​-force​- embracing​ -new​-tech​-to​-solve​-pilot​-shortage Tanaka, A., Craighead, J., Taylor, G., & Sottilare, R. (2019, July). Adaptive learning technology for AR training: Possibilities and challenges. In International Conference on Human-Computer Interaction (pp. 142–150). Springer, Cham. Tanaka, A., Craighead, J., & Taylor, G. (2019, December). The application of augmented reality for immersive TC3 training. Proceeding at the Interservice/Industry Training, Simulation, and Education Conference. Tang, Y. M., Ng, G. W. Y., Chia, N. H., So, E. H. K., Wu, C. H., & Ip, W. H. (2021). Application of virtual reality (VR) technology for medical practitioners in type and screen (T&S) training. Journal of Computer Assisted Learning, 37(2), 359–369. https://onlinelibrary​ .wiley​.com​/doi​/abs​/10​.1111​/jcal​.12494 Tang, Y., Shetty, S., Jahan, K., Henry, J., & Hargrove, S. K. (2012). Sustain city – A cyberinfrastructure-enabled game system for science and engineering design. Journal of Computational Science Education, 3(1), 57–65. Taylor, G., Deschamps, A., Tanaka, A., Nicholson, D., Bruder, G., Welch, G., & GuidoSanz, F. (2018, July). Augmented reality for tactical combat casualty care training. In International Conference on Augmented Cognition (pp. 227–239). Springer, Cham. Tetlock, P. E., & Belkin, A. (1996). Counter-Factual Thought Experiments in World Politics: Logical, Methodological and Psychological Perspectives. Princeton, NJ: Princeton University Press. Udoh, E. (Ed.). (2011). Cloud, Grid and High-Performance Computing: Emerging Applications: Emerging Applications. Hershey, PA: IGI Global. Ulrich, T. A., Lew, R., Werner, S., & Boring, R. L. (2017, September). Rancor: A gamified microworld nuclear power plant simulation for engineering psychology research and process control applications. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 61, No. 1, pp. 398–402). Los Angeles, CA: SAGE Publications. Underwood, K. (2020, April). Augmented reality goes airborne. Signal AFCEAs International Journal. https://www​.afcea​.org​/content ​/augmented​-reality​-goes​-airborne Venkatesh, V., Speier, C., & Morris, M. G. (2002). User acceptance enablers in individual decision making about technology: Toward an integrated model.  Decision Sciences, 33(2), 297–316. Voices of VR Podcast. (2015, November 17). No.245: 50 Years of VR with Tom Furness: The Super Cockpit, Virtual Retinal Display, HIT Lab, & Virtual World Society [Audio

64

Human Factors in Simulation and Training

Podcast]. http://voicesofvr​.com ​/245​-50 ​-years​-of​-vr​-with​-tom​-furness​-the​-super​-cockpit​-virtual​-retinal​-display​-hit​-lab​-virtual​-world​-society/ VR Headset Authority. (2021). Why is refresh rate and field-of-view important for VR headset? Retrieved June 23, 2021, from https://vrheadsetauthority​.com​/why​-is​-refresh​ -rate​-and​-fov​-important​-for​-a​-vr​-headset/ Waldrop, M. M. (2001). The origins of personal computing. Scientific American, 285(6), 84– 91. https://www​.scientificamerican​.com ​/article​/the​-origins​-of​-personal​-computing/ Weigmann, D. A., & Shappell, S. A. (1999). Human error and crew resource management failures in naval aviation mishaps: A review of U. S. Naval Safety Center data, 1990– 96. Aerospace Medicine and Human Performance, 70, 147–1151. Weiss, I., Kramarski, B., & Talis, S. (2006). Effects of multimedia environments on kindergarten children’s mathematical achievements and style of learning. Educational Media International, 43(1), 3–17. Wendland, W., & Honold, P. (2006). Virtual Reality (VR) Research and Demonstration Project: Virtual Environment for Submarine Ship Handling Training (VESUB). Unpublished technical report. Wiener, E. L., & Nagel, D. C. (Eds.). (1988). Human Factors in Aviation. Gulf Professional Publishing. Wiener, E. L., Kanki, B. G., & Helmreich, R. L. (1993). Cockpit Resource Management. San Diego, CA: Academic Press. Williams, A. C., & Flexman, R. E. (1949). An Evaluation of the Link SNJ Operational Trainer as an Aid in Contact Flight Training (SDC-71-16-3). Navy Special Devices Center, Port Washington, New York. Wilson, J. R. (2000). Technology brings change to simulation industry. Interavia Bus Technology, 55(643), 19–21. Wilson, J. R. (2018, September 1). The increasing role of COTS in high-fidelity simulation. Military Aerospace Electronics Magazine. Retrieved June 23, 2021, from https:// www​.militaryaerospace​.com​/ home​/article​/16707147​/the​-increasing​-role​- of​- cots​-in​ -highfidelity​-simulation Wong, Y. H., Bae, S. J., Bartels, E. M., & Smith, B. (2019). Next Generation Wargaming for the US Marine Corps. Santa Monica, CA: Rand National Defense Research Inst. Young, M. F., Slota, S., Cutter, A. B., Jalette, G., Mullin, G., Lai, B., ... Yukhymenko, M. (2012). Our princess is in another castle: A review of trends in serious gaming for education. Review of Educational Research, 82(1), 61–89. Youngblood, S. M., Pace, D. K., Eirich, P. L., Gregg, D. M., & Coolahan, J. E. (2000). Simulation verification, validation, and accreditation. Johns Hopkins APL Technical Digest, 21(3), 359–367.

2

Justification for Use of Simulation Meredith Carroll, Summer Rebensky, Maria Chaparro Osman, and John Deaton

CONTENTS Introduction............................................................................................................... 65 Purposes....................................................................................................................66 Training............................................................................................................ 67 Systems Engineering Evaluation..................................................................... 67 Research........................................................................................................... 68 Recreation........................................................................................................ 69 Domains of Application............................................................................................ 70 Aviation............................................................................................................ 70 Military............................................................................................................ 71 Medical............................................................................................................ 71 Driving............................................................................................................. 72 Emergency Response....................................................................................... 72 Education and STEM....................................................................................... 73 Entertainment................................................................................................... 73 Maintenance..................................................................................................... 74 Achievable Outcomes............................................................................................... 74 Cost Benefit..................................................................................................... 75 Safety ............................................................................................................. 77 Data ............................................................................................................. 78 Intervention...................................................................................................... 79 Flexibility and Availability.............................................................................. 79 Realism............................................................................................................80 Conclusion................................................................................................................ 82 References................................................................................................................. 82

INTRODUCTION Simulation is used to provide an effective means of conducting training, system evaluations, and research, as well as for enjoyment and connection, while balancing elements of capability, cost-efficiency, and safety to achieve these goals. To sufficiently justify and validate simulator usage and applicability within these contexts, DOI: 10.1201/9781003401360-2

65

66

Human Factors in Simulation and Training

advantages and disadvantages must be considered. Although disadvantages exist, they are surpassed by the magnitude of overall benefits simulation can provide in achieving a range of goals. Simulation is an interactive system that represents the operational system through artificial duplication or replication of the system and its equipment, environment, and capabilities (Jones, Hennessy, & Deutsch, 1985). Advancements in computer hardware and software technology allow for the creation, manipulation, and control of complex, realistic situations and environments (McGrath et al., 2017). Advancement in virtual reality (VR), internet connectivity, and artificial intelligence have expanded the range of applicability of simulation usage and associated benefits (Fletcher et al., 2017). For example, today an individual interested in buying a home may take a simulated, virtual tour of a prospective home on a real estate website and even evaluate the fit of new furniture in the rooms through the use of simulation and augmented reality (AR) applications. Traditionally, most simulators were classified as training devices; however, their use now extends to engineering simulators used for design, development, and evaluation processes (Lee et al., 2013); research simulators (Webster et al., 2014); and simulators used for recreational purposes such as enjoyment and social connection (Hromek & Roffey, 2009). Simulation is widely accepted in a vast array of domains, in part due to the opportunities provided to the user, which may not be safe or feasible in a real-world or operational environment. The value of this safety feature is priceless, with respect to cost-efficiency and human safety, both illustrated in terms of the ability to safely practice emergency procedures, and avoiding equipment or system damage (Moroney & Moroney, 2009). Acceptance of the simulation as a training device is demonstrated through the reliance and confidence of the transfer of training to the real-world environment. This acceptance is such that licenses, certifications, and qualifications may be received from simulator training alone. Simulation used for systems engineering and research purposes often facilitates analysis related to system design, development, testing, and evaluation, and the analysis and evaluation of standards and procedures for licensing and certification (Webster et al., 2014). Simulation is also used for research across domains to recreate task environments, collect rich data, and improve performance (Deterding et al., 2015). The use of simulation for enjoyment has benefited organizations such as the military by allowing them to adopt the simulators once issues have been resolved by the companies in the entertainment industry (Balinova, 2020). The following sections will provide a discussion of the primary purposes for which simulation is currently utilized, domains in which their use is most prominent, and achievable outcomes which justify their widespread use.

PURPOSES There are four primary purposes for which simulation is typically utilized: (a) training, (b) systems engineering evaluation, (c) research, and (d) recreation.

Justification for Use of Simulation

67

Training Training reshapes behavior using practice, direction, measurement, and feedback to teach task performance at a level of skill not previously possible (Caro, 1988, p. 229). As a training device, simulators make their contribution in skill development, maintenance, and assessment while allowing for enjoyment (Morgan et al., 2002; Rolfe & Staples, 1986). Serious games, educational simulations, and edutainment have all emerged as a way to facilitate trainees and learners developing required skills in a more interactive and stimulating environment (Breuer & Bente, 2010; Djaouti et al., 2011). Serious games and educational gaming developed for educational purposes engage learners and have commonly been found to lead to learning gains and improved performance (Vlachopoulos & Makri, 2017). For example, a virtual board game originally designed for fun is now utilized for military training to teach strategy and principles of tactical warfare (Breuer & Bente, 2010). Simulations also allow trainees to practice their skills while allowing for more autonomy and involvement during multiple phases of training (Landon-Hays et al., 2020; Crea, 2011). For instance, flight simulations allow trainees to practice specific aspects of flight for which they need more training (e.g., landing), without spending the time to conduct an entire flight. Simulated training can more flexibly facilitate learning across different problem domains and skills compared to traditional training (Hoffman et al., 2014). For example, low-fidelity desktop simulators can be used in initial training to familiarize a trainee with a system, low-medium fidelity simulators that provide realism in switches and controls can facilitate procedural training, and high-fidelity trainers that integrate full motion and affective cues can be used in the final stage of training in which skills are consolidated and proficiency is assessed. Research has shown that training tasks requiring procedural knowledge and skills are better trained within a simulated environment than through lectures (Nestel et al., 2011). Simulation is also effective for facilitating the consolidation of skills under realistic and stressful situations (Andreatta et al., 2010). Simulations can additionally be used to put trainees through multi-player, team-building exercises, and cooperation-building challenges in a more enjoyable and realistic environment (De Gloria et al., 2014). However, the utilization or simulation is not ideal for all training tasks. For example, utilizing simulators to train declarative knowledge may not be as effective as learners can grasp how to obtain a desired outcome such as a passing score rather than the knowledge the system is intended to teach (Douglas et al., 2019). This can make it difficult to ascertain whether the user is learning the intended knowledge or how to “game the system.” The enjoyment experienced with simulations can lead to issues such as recall of irrelevant material, if the user is focusing on aspects which are not the focus of the training (Chowdhury et al., 2017).

Systems Engineering Evaluation Although training is the principal application of simulators, simulation is also extremely valuable as a systems engineering tool. Simulations are utilized for system

68

Human Factors in Simulation and Training

design and development processes involving a multitude of applications. Modeling can be used to analyze complex systems in use cases for manufacturing, transportation, and healthcare to determine the impact of different designs and aid in selecting the best design (Lee et al., 2013). Simulations are also used for test and evaluation purposes of system and subsystem performance, and operational capabilities and limitations (Jones et al., 1985). Simulators for these purposes, and even to determine system maintenance scheduling, staffing, inventory, cost, and policies, have been largely used for manufacturing systems, airline hubs, and highway design and maintenance (Alabdulkarim et al., 2013; Liu et al., 2019). Simulation can also be used to predict and model human behavior such as energy consumption in a household to facilitate the design of houses in an energy-efficient manner (Hong et al., 2016), and human performance in aircraft to evaluate flight deck technologies for the NextGeneration Air Transport System (NextGen; Gore, 2011). The average consumer can even use simulation and AR technology to determine if new furniture fits in their home (Tang et al., 2014). In the automotive industry, simulation can be used to recreate crashes and replicate similar scenarios with newer safety settings to evaluate effectiveness (Alvarez et al., 2017). In aviation and aeronautics, simulation can be used to test and certify aircraft before physical development and even explore the potential impacts on vehicle structures in the event of a water landing (Hughes et al., 2013). It can also be used to test, model, and validate features, including methods for vertical takeoff and landing (VTOL) for emerging unmanned aircraft vehicle concepts (Yuksek et al., 2016), and collision avoidance technology for FAA certification (Webster et al., 2014). Modeling simulation can be used to assess the fit of humans in various systems such as pilots in aircraft seats to ensure proper reach and spine support (Lindsey et al., 2019). However, conclusions drawn from simulation-based analyses are only as accurate as the simulation itself. Many simulations assume systematic operations, such as in the case of human energy consumption in the home, but such an approximation can sometimes be off by as much as 30% from actual life consumption (Hong et al., 2016). As a result, modern advances in these fields have looked at methods to improve model accuracy, including ways to quantify uncertainty, novel or different modeling methods, combining multiple modeling methods, and incorporating more data for consideration into the model (Wang & Zhai, 2016; Lee et al., 2013).

Research Simulators are often used as research instruments and within research settings as they make data collection and manipulation easier, and make study replication with minimal changes between participants possible (Deterding et al., 2015; Lukosch et al., 2018). Due to their prevalence, both researchers and participants often know how to interact with this technology, making their use in research straightforward (Alexander et al., 2005). Further, the addition of supporting internet capability/connection allows for the potential to gain access to a larger pool of participants, as participants who are not local can connect via the internet from their location. The use of simulations in research applications often involves investigation of human–computer

Justification for Use of Simulation

69

interactions, visual/motion systems, and human performance assessments, such as measurements of workload, decision-making skills, psychomotor abilities, multitasking abilities, situation awareness, stressor effects, and spatial abilities (Blodgett et al., 2018; Stroosma et al., 2003; Woda et al., 2017; Vlakveld et al., 2018; Zhang et al., 2017). The ability of simulators to examine human performance phenomena is unmatched as they afford researchers control over the environment while limiting ambient influences (Baubien & Baker, 2004). Researchers are able to examine a wide range of conditions and characteristics (e.g., performer reactions to different types of decision events) while holding other aspects of the environment constant (e.g., information available, time to make the decision), thereby increasing the validity of the research (Beaubien & Baker, 2004). Additionally, modernization of simulators has led to the ability to pair emerging technology such as heads-up displays (HUDs), AR, and VR technology. The use of simulators in conjunction with VR and AR has led to advances in research across disciplines such as healthcare, education, and training (Cook et al., 2011; Radianti et al., 2020). For example, research involving simulated surgery software utilizing VR technology has shown support for improved training performance (Kim et al., 2017). These advancements allow for a more immersive environment and can increase the authenticity of participants’ responses (Radianti et al., 2020; Kronqvist et al., 2016). When an individual uses VR, they are presented with fewer visual distractions from their physical location as well as a 3D view of the virtual environment, making the experience much more realistic (Cecil et al., 2014; Kim et al., 2017). Simulators are also widely used in studies exploring team process and team performance measures. More recently, the use of simulators to examine human–agent collaboration has gained traction as simulators can be used to simulate interaction with the autonomous agent. The use of simulators in this area has provided insight into how multiple virtual agents affect operator performance, requirements, and training needs (Fraune et al., 2021; Mairaj et al., 2019). Simulation has also been used as a means of evaluating situation awareness associated with certain system displays. There are drawbacks to the use of simulation for research purposes. Human performance simulators, although similar to the intended task, do not always elicit the same responses present in the natural domain. For example, a flight simulator may have vibration and utilize VR to make the experience immersive, but the operator is aware that a simulated crash will not lead to the same outcome as a true crash. Emotional and environmental stressors are not easily replicable in simulators (Patterson et al., 2008). Additionally, data and results from research simulators can be difficult to interpret, at times requiring additional analysis software (Sena et al., 2012).

Recreation A somewhat new and still emerging use of simulation is for recreational purposes such as enjoyment and social connection. Simulations within the entertainment industry are used for gaming, arts, theme park rides, sports, and film, among other reasons (Bouyer et al., 2017; Smith, 2010). Given that simulators provide immersion and enjoyment, now at an affordable cost, their popularity has increased

70

Human Factors in Simulation and Training

dramatically, leading them to be purchased for home entertainment (Granic et al., 2014; Klimmt, 2003; Rui, 2020). For example, golfing simulators have become a popular item for golfers to enjoy the sport from the convenience of their homes. Operational simulations can also be used in a more informal context for fun. For example, aviation enthusiasts often invest in commercial, desktop flight simulators for recreational purposes. Enjoyment associated with simulator use has been tied to the use of suspense to cultivate curiosity, control leading to a sense of empowerment, and escapism through immersion, among other factors (Granic et al., 2014; Klimmt, 2003; Ritterfeld et al., 2009). For example, simulators which provide a storyline cultivate curiosity, while the ability to immediately observe the results of their own decisions within a game has also been found to provide players with enjoyment (Klimmt et al., 2007). Simulations can also foster communication between individuals as they can mimic complex social systems (Lukosch et al., 2018). For example, many games provide chat rooms which allow users to communicate using personalized avatars. Users are able to both join and create groups based on similar interests and goals, as well as work in teams in certain simulators like the Eurogamer Flight Simulator for PC. Simulations which enable connection have been linked to counteracting loneliness, promoting positive feelings and physical activity in older adults (Kahlbaugh et al., 2011). Simulations commonly make use of avatars which can be personalized, virtual environments and worlds where users can interact, and are even used in amusement parks (Cocking & Matthews, 2000). The use of simulations in amusement parks ranges from utilization on rides to improving user experience while waiting in lines. Unfortunately, the novelty effect can be an issue with simulation for recreational purposes, as over time the simulator becomes less engaging to the user and their initial response is only due to initial engagement levels, which can dwindle (Arıcı & Yılmaz, 2020).

DOMAINS OF APPLICATION The past several decades have seen an explosion in the use of simulation across multiple domains. Around the turn of the century, simulation was limited primarily to military, aviation, and systems engineering domains; however, simulation use is now prevalent in domains ranging from maintenance and emergency services to education and entertainment. Here we provide a brief discussion of simulation applications within both traditional and emerging domains.

Aviation “Within the aviation community, the effectiveness of simulators is accepted as an article of faith. Indeed, the aviation industry could not function without simulators and flight training devices (FTDs), whose existence is mandated by Federal Aviation Administration (FAA) regulations” (Hancock et al., 2008, p. 46). Even decades later, this statement remains true, as many aviation sectors have heavily invested in simulator training, with 30–50% of training conducted in simulators (Yoon et al., 2019).

Justification for Use of Simulation

71

Within the aviation industry, manufacturers, users, and regulators rely on simulators for training and testing purposes across manned, unmanned, and urban air mobility (UAM) contexts. The FAA establishes standards and policies for the certification of civil aviation pilots, and Federal Aviation Regulations (FARs) permit simulators to be used for various training objectives, including crew resource management (CRM), initial training, recurrent training, emergency operations, and stall procedures (FAA, 2020). Currently, many flight schools use a range of simulators from motion-based to fixed-base to personal computer-based flight simulators (Reweti et al., 2017). Simulators are also used to test and train future concepts and issues that have not yet reached the commercial aviation sector, such as EFB integration into the flight deck panel (Carroll et al., 2021; Pittorie et al., 2019), as well as emergency or novel situations that would be difficult to replicate otherwise, such cybersecurity hacking of systems or UAS search and rescue (Carroll et al., 2020; Rebensky et al., 2020). Simulation is widely used in CRM training in which aircrews are taught how to utilize every resource available to them, including communication. Simulated scenarios can include line-oriented flight training (LOFT) in which an aircrew flies a full simulated mission, incorporating hands-on learning, practice, and feedback available to the crew afterward in the form of a video recording of the flight. This technique is well accepted by the aviation industry and believed to provide an excellent training platform (Kanki, 2019). Additionally, simulators like the Virtual Drone Search Game have been utilized to research search and rescue, drone-teaming practices (Fraune et al., 2021).

Military Simulators have been used in the military realm for gunnery, driver, infantry, pilot, commander, and maintenance training (Pinheiro et al., 2012; Straus et al., 2019). Other uses of simulation in the military include support for development of tactics and combat management skills, as well as to simulate medical emergencies and evaluate operational systems (Straus et al., 2019; Fletcher et al., 2017; Hall et al., 2014). Modern simulators can be used for military purposes by leveraging video game technology to address training needs, including synthetic task environments that appear realistic and allow for simulation of crews, vehicles, and communication, as well as assessment and scoring capabilities for ensuring warfighter readiness (Kumm & Burwell, 2017). For example, Virtual Battlespace 3 (VBS3), an Army game-based training software, was developed from a popular commercial firstperson shooter game, Arma 3, and is used to train both individual- and team-level skills for army personnel (Straus et al., 2019).

Medical Simulation within the medical industry primarily focuses on surgical procedures as surgical skills require constant practice (Tan & Sarker, 2011; van de Ven et al., 2017). Simulations have also been widely used as a way to roleplay, and to practice and cultivate communication skills for physicians with future patients (Cegala & Broz, 2003;

72

Human Factors in Simulation and Training

Hardoff & Schonmann, 2001). Simulation can also benefit nurses with emergency response procedures such as surgical resuscitation, by increasing knowledge, feelings of confidence, and satisfaction, through realistic practice (McRae et al., 2017; Warren et al., 2016). Across various studies, simulation, particularly high-fidelity simulation in the critical care medical domain, has been found more effective than other methods of teaching for both skill and knowledge acquisition (Beal et al., 2017).

Driving Driving simulators are used in human factors research, including systems engineering research, and research examining human behaviors, perceptions, and operational features. Research demonstrates that simulators are as effective as on-road studies at uncovering and studying performance outcomes like mirror checking, maintaining speed, monitoring behaviors, and following the rules of the road (Meuleners & Fraser, 2015; Bella, 2008; Underwood et al., 2011). Simulator-based driving research has shown mixed effectiveness in using simulation to explore other driving behaviors and situations such as lane changing, speeding, driving performance and errors, braking, steering, mental workload, and eye fixations as high-fidelity and low-fidelity simulators vary in effectiveness depending on the constructs explored (Wynne et al., 2019). Driving simulations can also be used to explore environmental and design areas such as autonomous driving assistance and environmental hazardous conditions, as well as at-risk-population driving behaviors (Abdelgawad et al., 2017; Campos et al., 2017). Simulation has been utilized in all of these contexts to conduct studies aimed at reducing accident rates and human fatalities in a safe and efficient manner.

Emergency Response Simulations are used to train the police force, military, firefighters, and medical personnel for emergency situations that require dynamic decision-making, in stressful and unpredictable environments, in an interactive way (Taber, 2008). Some simulators have been developed for police training, utilizing VR systems that, through motion tracking technology, can help police officers learn how to read body language and emotions of suspects, in order to train effective responses (Himona et al., 2011). Additional simulators that can be used by police, medical teams, and firefighters have been developed for training personnel on response to crowd control in escalated situations (Kamiński et al., 2020). Early simulations allowed emergency personnel to practice different tactics for controlling a situation such as house fires using desktop computers, but these simulators required large developmental effort to simulate realistic fire and animations and still had limited realism (St. Julien & Shaw, 2003). However, modern simulators can replicate a range of features with little to no development expertise, including visual features, and even physical features such as haptic feedback to simulate the force of using a fire hose (Rebensky et al., 2020; Nahavandi et al., 2019). These high-fidelity environments replicate the difficulty associated

Justification for Use of Simulation

73

with emergency situations and allow repeated practice of necessary skills that are beneficial for high-risk, low-frequency emergency events (Kamiński et al., 2020; Nahavandi et al., 2019; Jerald et al., 2020; Marler et al., 2020).

Education and STEM Simulation can be utilized to teach difficult science, technology, engineering, and math (STEM) concepts that may otherwise be challenging to convey—such as allowing students to explore space in virtual reality to learn about planetary motion (Lindgren et al., 2016). A substantial amount of support has been found for the use of simulators to promote learner discovery of important concepts, motivation to learn, and conceptual understandings in science (Lindgren et al., 2016; Honey & Hilton, 2011; Rutten et al., 2012). Simulations in the form of virtual learning environments have gained interest from STEM educators for the support they provide to instructors, the wealth of resources and information to students, and their ability to engage students (Mpu & Adu, 2020). For example, the use of the virtual environment “SimScientists,” which allows students to examine the interactions between organisms and their roles in an interactive environment, has been linked to improved science process skills and learning goals (Honey & Hilton, 2011; Quellmalz et al., 2020). STEM fields commonly require hands-on experiments and experiences, in which learning by doing is essential (Pellas et al., 2017). The ability of simulators to rapidly simulate consequences and feedback that may naturally take weeks allows for a “visual” understanding that would normally take longer (Skinner et al., 2020).

Entertainment The use of simulation in entertainment spans from motion simulators such as simulated rides in amusement parks, and viewing simulators such as online simulators which allow people to virtually visit stores, to full human simulators used in some museums to bring a flair of interactivity to the exhibit. The use of simulators for entertainment is important, as their lessons learned and innovative advancements are commonly instantiated in other areas such as the military (Alexander et al., 2005; Balinova, 2020). The collaboration between entertainment and other industries for simulations is common. For example, the first medical mannequin designed to simulate mouth-to-mouth simulation was created by a toy manufacturer asked to create a doll for CPR training (Cooper & Taqueti, 2008). Additionally, the Marine Corps Infantry Immersion Trainer, a highly realistic, live, training environment, was designed with help from Hollywood set designers, who helped bring the environment to life with the sights, sounds, and smells of an actual battlefield (Dean et al., 2008). Additionally, their use can lead to sparked interest and solutions in other domains. The increased engagement found in gaming simulators used for entertainment has led to their adoption in training and learning areas, such as college engineering courses (Coller & Scott, 2009).

74

Human Factors in Simulation and Training

Maintenance Simulators for maintenance have been widely used in the manufacturing domain in order to optimize scheduling of maintenance, cost estimation, policy development, and system reliability, and have also been used for maintenance training (Alabdulkarim et al., 2013; Pinheiro et al., 2012; Winther et al., 2020; Neges et al., 2017; Liu et al., 2019). Research conducted by Winther et al. (2020) using simulation for training maintenance tasks has demonstrated that using VR simulation can be an effective medium for training procedural knowledge. However, tasks that require interacting with physical objects still pose limitations as the sensations of touch, motion, and force cannot currently be replicated entirely in virtual reality. Hands-on, in-person experiences are still superior. For maintenance tasks, training must extend beyond procedural knowledge to include physical aspects such as how much torque to use, how it feels to turn a part, or the right grip to utilize (Neges et al., 2017; Winther et al., 2020). Without haptic or physical objects to interact with, the maintainer will not be able to experience key signs, such as a part clicking into place, that indicate a maintenance task was completed correctly. These skills are necessary to accurately perform maintenance on various technologies in a wide array of domains. Newer technology allows for the integration of physical objects, into a virtual environment, with the virtual environment dynamically responding to user interactions with the physical object in the real world (Neges et al., 2017). An example of this is turning valves on a low-fidelity panel of physical levers with sensors that are presented in virtual reality that update the simulated hydraulic system allowing users to practice resolving emergency situations. Advances in this space will continue to enhance the applicability and transfer of training of simulation for maintenance tasks.

ACHIEVABLE OUTCOMES The benefits of using simulation for the above-specified purposes provide several beneficial outcomes that allow practitioners to achieve their goals. Simulators can often provide more in-depth, safer, less expensive, and more effective training, evaluation, and experiential opportunities compared to those provided by the actual system being simulated. This is due, in part, to simulation environments allowing the incorporation of instructional features that facilitate instructor intervention, sensors and measurement algorithms that provide precise assessments, and multimodal and high-fidelity environments that facilitate realistic and enjoyable interaction and immersion. As technology has advanced, more benefits of simulation use for a range of purposes have emerged. For education and training alone, a metaanalysis, across a wide range of applications that utilize AR and VR technology, demonstrated positive outcomes, including enhanced immersion, increased learning, and reductions in learning time and skill decay (Fletcher et al., 2017). The following sections outline several positive outcomes that can be achieved through simulation, including associated advantages and disadvantages.

Justification for Use of Simulation

75

Cost Benefit Simulation has shown to be a cost-effective training device by providing the required training at lower costs than the actual system, or live training events, which often have high operating and maintenance costs (Champney et al., 2017; Moroney & Moroney, 2009). When simulators first gained popularity, military simulators operated at 5–20% of the cost of operating the actual aircraft (12% being the median cost; Orlansky & String, 1977). Yoon et al. (2019) conducted an analysis demonstrating that simulator cost has reduced to 3.3–14% of actual aircraft cost with an average of 5.9%. They also demonstrated that, across various aviation industries, an average of 3 simulator hours can replace up to 1 hour of live flight training and that as training time in the simulator grows to 30–50% of total training, costs for training can be reduced by 22.8% to upward of 76%. In medical domains, simulation training can also reduce costs that could be incurred by adverse outcomes such as infections caused by catheter placements, which can result in $82,000 in costs per occurrence, making the $112,000 investment in simulator training a worthy investment as it could ultimately result in over $700,000 in savings annually by reducing catheter placement infections by ten patients in a year (Cohen et al., 2010). Simulation training can also lead to reductions in time to complete training, which can have positive financial impacts as well (Lubner et al., 2019). Further reductions in costs can be achieved by utilizing personal computer aviation training devices (PCATD), which have demonstrated comparable performance to full flight simulators in instrument procedures, air crew coordination skills, and reductions in training times (Taylor et al., 1999; Jentsch & Bowers, 1998; Taylor et al., 2004). PCATDs also appeal as low-cost tools in education courses and research as these systems can cost as little as $6,000 and can include modern technology such as electronic flight bags, touchscreens, and dual cockpit configurations (Taylor et al., 2004; Jentsch & Bowers, 1998; Pittorie et al., 2019). Development of specialized simulation-based trainers could cost anywhere between $100,000 and $800,000; however, leveraging game development software can reduce initial cost significantly, with preliminary analysis indicating development costs as low as $45,000 (Rebensky et al., 2020). Further, newer technologies such as COTS game developer software allow lower development and maintenance costs, often with reusable assets for future use cases (McGrath et al., 2017). Emerging areas, at the time of this chapter, are now expanding into VR-based aviation training that could be utilized for cockpit familiarization and checklist training (Sikorski et al., 2017). VR can greatly reduce the cost of the physical components of a simulator as these can be virtually represented and configured to a range of applications. Many simulators, even in the military sector, now utilize commercial-off-theshelf (COTS) technologies and tools, as they can reduce both initial cost as well as potential future cost due to increased reusability compared to made-from-scratch simulator technology with proprietary technology (Kumm & Burwell, 2017). The military is also utilizing low-fidelity game technology for training, which has been shown to exhibit significantly lower procurement, maintenance, and training costs per soldier and with increased collective mission training capabilities (Straus et al.,

76

Human Factors in Simulation and Training

2019). For example, the costs of using a traditional high-fidelity combat training system can range from $750 to upwards of $7,000 per solider, per day, for air-combat simulators, compared to $200 per solider, per day, utilizing games for training (Straus et al., 2019). High costs associated with live training, which often utilizes real people, actual systems, and, in some cases, live ammunition for military training, often far surpass the cost of even the highest-fidelity simulators (Straus et al., 2019). For example, simulator-based hospital evacuation training can result in cost reduction over the years, compared to live evacuation training, even with extremely high initial acquisition costs (Farra et al., 2019). Such examples clearly demonstrate the costbenefit of using simulators. On the other hand, simulators can come with extreme costs that need to be weighed against the training benefit they provide, including the cost of development, testing, and certification, which can be well into the millions for high-fidelity simulators (Straus et al., 2019). Additionally, different simulated tasks have different transfer of training effectiveness ratios, which could greatly impact the justification for using simulation over live training (Moroney & Moroney, 2009). Special attention should also be given to the adjustment time of the trainee when transitioning from the simulator to the live environment, as the real environment can differ from required inputs and physical demands and does not allow the same features in simulators, such as the ability to pause (Moroney & Moroney, 2009). Another consideration is that certain simulators, specifically dome and motion-based simulators, require large spaces, air conditioning, and specialized maintenance personnel (Moroney & Moroney, 2009; Straus et al., 2019). Further, each simulator comes with its own set of hardware and software maintenance issues, often requiring specialized technicians and additional upkeep costs not required for live training. For example, unique simulators may require specialized technical support when updates are instantiated, and parts becoming obsolete may be an issue. Due to this, many organizations have shifted from utilizing self-developed simulator software to COTS technologies that allow the off-loading of software update and maintenance responsibilities to the manufacturers (Kumm & Burwell, 2017). In some cases, simulator technology cannot keep up with the rapidly advancing technology and therefore quickly becomes obsolete (Straus et al., 2019). For example, in the four years that the Oculus Rift’s first developer’s kit was released, it had been superseded by two new versions of the virtual reality headset, a new controller set, and multiple software updates and revised hardware requirements (Kumm & Burwel, 2017). The breakneck pace of technology innovation in recent years had made it difficult for developers of software to keep simulators current and compatible. Because of this, simulators may need “futureproofing,” or going above and beyond the needed hardware requirements, in hopes of an extended simulator life cycle that can handle the demands of future technology innovations. However, even with these drawbacks, the cost-benefit advantages of simulation clearly outweigh the disadvantages, as demonstrated by acceptance in various domains and regulatory agencies (Straus et al., 2019; Yoon et al., 2019).

Justification for Use of Simulation

77

Safety Simulators also provide a safe environment for evaluation and training. Before the introduction and utilization of simulators, more accidents resulted from the practice of emergency situations than from actual emergencies (Rolfe & Staples, 1986). Simulators provide hazardous situations without worry of risk or threat to humans and the system (Sharma & Scribner, 2017) and can therefore be used to provide training to inexperienced and novice individuals in a system or situation that may pose a serious threat. Simulators can provide opportunities for the introduction of unexpected, lowprobability events or system errors that require the operator to respond quickly during a stressful situation (Neges et al., 2017). Hence, simulators can be used to expose operators to conditions or experiences that they would otherwise be unlikely to experience in actual environments, increasing their ability to respond in such situations if encountered. Example tasks include a hydraulic failure during a maintenance task, emergency resuscitation during surgery, and emergency evacuation of a building (Neges et al., 2017; McRae et al., 2017; Sharma & Scribner, 2017). Simulators can also be used to demonstrate operations without concern for safety requirements or violation of regulations. For example, an instructor can deliberately permit a trainee to make a mistake or error to demonstrate associated results or consequences, which may not be feasible in the actual system due to being operationally illegal (Moroney & Moroney, 1999). The utilization of simulators reduces hours of operation of the operational system, which in turn reduces wear and tear on the system and its equipment. It also reduces health and environmental impacts associated with pollution and noise (e.g., smoke and fire cannot be safely replicated in a live training environment without potential health threats to the trainee) (Yoon et al., 2019; Rebensky et al., 2020; Straus et al., 2019; Sharma & Scribner, 2017). As the military uses simulation to develop tactics, combat management skills, and to evaluate operational systems, use of simulation to achieve this can also reduce the impact on resources (e.g., damage to land associated with military missile and firing training; Straus et al., 2019). However, there are disadvantages to training in “safe” and “sanitized” environments. Trainees may not feel willing or able to practice within simulators. When simulators began dominating the aviation training space, simulators were associated with negative effects on morale and retention, as trainees were anxious to be in the operational environment: “I’m here to fly airplanes, not simulators” (Moroney & Moroney, 1999). As simulation begins to enter a new potential space of VR-based aviation training, research is examining the factors that impact one’s attitudes toward VR-based training, including perceptions of potential impacts on performance, and subsequent likelihood to engage in VR-based training—which may result in limited acceptance as seen with early use of simulators in aviation due to lack of experience (Fussell, 2020). However, as simulation technology enters the entertainment space, many individuals are obtaining experience with simulated environments prior to simulation-based training. Research shows that this may lead to a potentially more

78

Human Factors in Simulation and Training

positive acceptance of VR-based, simulation training (Luiser et al., 2019; Fussell, 2020; Fussell & Truong, 2020).

Data Simulators also allow for data collection to facilitate (a) performance comparisons (e.g., to a performance criteria or performance of other trainees), (b) performance and learning diagnosis (e.g., learning progress and problem areas), and (c) performance evaluation (measures of performance effectiveness; Moroney & Moroney, 1999). Modern simulators can track user decision—actions and performance—to facilitate providing feedback and assessment to the trainee and trainers (Marler et al., 2020). In medical domains, VR-based surgery simulators can be used as performance assessment tools during surgery, providing the assessor with information related to the accuracy of surgical instruments or the number of times a trainee touched unintended areas with surgical instruments (Pfandler et al., 2017). This type of information allows for the unobtrusive and objective measurement of performance and errors that can be utilized to provide feedback. For example, in emergency scenarios, simulators can be used to objectively assess the possibility of human error in off-nominal events, which would otherwise be deduced from subjective expert opinion (Musharraf et al., 2019). In the automotive domain, simulators can collect data to assess event recognition and driving errors utilizing steering wheel position, brake and acceleration, and head-position data collected in VR simulators (Taheri et al., 2017). In the system engineering and evaluation space, specific userdefined values can be entered, and simulators can output the specific impacts to the system such as particular stress levels experienced by aircraft structures (Hughes et al., 2013). The data provided by simulation allows for unique opportunities to remove biases from assessment strategies and objectively collect data previously unobtainable in live training. The benefit of data collection by simulation is equally beneficial for research purposes in which the goal is often to study response to the phenomenon being simulated. However, there are drawbacks to using simulators for data collection. With any technology, technical issues arise that can lead to data loss, such as power outages or issues with connection to the internet, although this can often be rectified through the use of generators and stable network connections (Mehta et al., 2009). Additionally, the fact that some simulators are connected to the internet can raise ethical concerns related to the privacy of the data, and the degree to which the data being collected is protected (Mugunthan et al., 2020). Finally, the output of the data from the software can sometimes be difficult and time-consuming to handle, leading to errors in interpretation. However, the ubiquity of technology has given rise to programs that facilitate the transformation and interpretation of data. Furthermore, the use of proper documentation and tutorials can help reduce the time it takes to perform data cleaning and setup (Mehta et al., 2009). There can also be a learning curve for users, requiring additional time to get to a point where they are comfortable and understand how to operate the simulator (Colaco et al., 2017; Howells et al., 2009). These hurdles can be mitigated through the use of tutorials, practice sessions, and feedback.

Justification for Use of Simulation

79

Intervention Simulators support the delivery of a range of different interventions. Simulationbased training can be enhanced by incorporating automated assessment and feedback to improve skill levels. Simulators that do not provide feedback have been found to yield little to no benefits for users without instructor intervention (Mahmood & Darzi, 2004; Van Heukelom et al., 2010). Simulators have the capability to provide learning interventions that not only alleviate instructor requirements, but also provide consistency in instructional strategy delivery. For example, the Metacognitive Scaffolding Service (MSS) software has been utilized in medical simulators to facilitate selfregulated learning during their training (Berthold et al., 2012). Another advantage is the capacity for real-time performance measurement and feedback, adaptive training, programmed demonstrations and malfunctions, and the immediate placement of position and situation (Waag, 1978; Rebensky et al., 2020). Trainees have the ability to receive immediate feedback, go back and reflect on their performance, and receive individualized remediation based on their performance (McGrath et al., 2017). Increasing the challenge of the learning task to match the learner’s abilities can be accomplished via simulators. In a simulator, this can be achieved without affecting other students who may not be ready to move to that level of challenge, resulting in positive impacts on motivation and performance (Bauer et al., 2012; Orvis et al., 2008). Determining how the human operator can work together with automated systems is a critical issue that can best be duplicated in a simulated environment. Unlike real-world scenarios, simulators can be paused to work through potential solutions in risky situations and assess operator states (Moroney & Moroney, 2009; Rebensky et al., 2020, Endsley, 1995) Keep in mind that simulation-based training is only as valid and valuable as the instructor utilizing the simulator and the associated assessments and interventions being utilized. The performance evaluations, training content, level of acceptance, and actual usage are subject to opinions and attitudes of the operators, instructors, and management (Moroney & Moroney, 1999; Green, 2000). Modern simulations help ensure effectiveness by allowing for standardized learning content delivery and objective performance assessment (McGrath et al., 2017).

Flexibility and Availability The flexibility of simulators also adds to their effectiveness. For example, traffic and other distractions can be eliminated from the training task. Simulators also provide standardized scenarios to allow various operators to practice functions, procedures, and dynamics within identical environmental conditions. The scenarios can be repeated until required performance criteria are met, or to promote efficiency or even automated responses (e.g., over learning of a task; Moroney & Moroney, 2009). Further, modern PCATDs can be easily moved and reconfigured into different buildings and different cockpit configurations, as 3D printing of controls and novel touch-based displays allow for inexpensive and rapid alterations to the system (Pittorie et al., 2019).

80

Human Factors in Simulation and Training

Simulators often increase the availability of training and operations. Simulators can have 24-hour accessibility, void of dependency on actual system availability and usability requirements, such as suitable weather conditions for flights or test mannequins for surgeries (McGrath et al., 2017; Moroney & Moroney, 2009). Simulators also provide immediate access to operating areas (e.g., a flight simulator allows students to practice multiple landings in succession without time or fuel costs; Moroney & Moroney, 2009). Users can repeat tactics in various scenarios, allowing for a safe environment to apply knowledge to novel situations (Straus et al., 2019). This approach can allow users to achieve mastery of skills as well as refresh skills that might otherwise decay. Additionally, simulators can allow for distance learning during times when travel to training sites is restricted, such as during deployments or unsafe travel conditions from a remote location (McGrath et al., 2017). It should be noted however, this may require trainees to be tech-savvy enough to load, control, and troubleshoot any issues with the simulator. VR continues to improve the portability and availability of these systems in various domains. For example, cockpit familiarization can move to VR environments as opposed to classrooms or more space- or hardware-intensive simulators (Sikorski et al., 2017). VR simulators can also use game-based software that allows for quick changes of training scenarios and interaction methods such as controller-based or speech-based (Marler et al., 2020; Kamiński et al., 2020). VR headsets are also becoming more compact and portable, allowing for a portable, fully immersive, simulation (Oculus, 2020).

Realism Many skills require training in realistic environments, which for emergency situations and complex tasks can be difficult to replicate in the real world. However, modern simulation technology now makes the re-creation of these environments possible. For effective training, simulators must replicate accurate cognitive loads, allow realistic communication, and facilitate the practice of requisite motor skills (Bennett et al., 2017). Realism is typically achieved through simulator fidelity, which at a high level can be broken into two main elements: task fidelity and instructional fidelity. Task fidelity refers to the similarity between the operational system or environment and the simulator and its missions, whereas instructional fidelity refers to the system’s ability to transfer new skills to the pilot (Macfarlane, 1997). Hightask fidelity is usually associated with physical fidelity and therefore high cost. Instructional fidelity is found in lower physical fidelity simulators such as low-cost, part-task, and procedural trainers. Physical fidelity is attributed to two main factors: high levels of visual detail and motion. Each of these factors has been explored by researchers, including the extent to which the factors affect transfer of training. With the advancement in simulator technology, it is often assumed that increases in fidelity are accompanied by increases in training effectiveness. This is not necessarily the case. Often the addition of motion or haptics can add significant development time and costs, with little impact on training effectiveness (Jerald et al., 2020). Often, simulators with lower

Justification for Use of Simulation

81

levels of physical fidelity (e.g., vibration features instead of a full motion simulator) can achieve equivalent training gains (Jerald et al., 2020; Nahavandi et al., 2019). However, for complex and mission-based training, visual and physical fidelity may be necessary (Straus et al., 2019). For example, to train psychomotor skills such as appropriate maneuvers during turbulence or understanding the appropriate force to apply for a scalpel during surgery (Yoon et al., 2019; McGrath et al., 2017). If the simulator aims to train a cognitive task, the simulator may only need to replicate the cognitive aspects to effectively train these tasks (Hochmitz & YuvilerGavish, 2011). As a result, military branches have shifted to lower-cost, game-based training systems instead of high-fidelity combat trainers, as the game-based systems are effective with much lower operation and maintenance costs (Straus et al., 2019). For example, using low-fidelity virtual environments to train room-clearing tactics can be just as effective as training with high-fidelity environments for some aspects of the task (Champney et al., 2017). Some research has even demonstrated that gamebased training that has high cognitive fidelity is more effective at improving aspects of mission performance when compared to high-fidelity combat trainers (Straus et al., 2019). As a result, many domains have shifted to lower fidelity, lower-cost simulators to train procedural skills and processes. Realistic, high-fidelity simulators also come with additional disadvantages, for example, simulator sickness, which poses problems for some users. Simulator sickness results from discrepancies between visual and vestibular cues encountered in simulators in which the visual system (quality and field of view) is more technologically advanced and therefore disproportionate to the motion system, which can lead to various symptoms such as nausea, headaches, and eye strain (Brooks et al., 2010). Pilots may devise strategies to compensate for, or avoid, simulator sickness, which may in turn have negative impacts on performance when transferred to the operational system. Some participants in driving simulators have to stop participation altogether as simulator sickness can become too intense (Brooks et al., 2010; Fletcher et al., 2017). Furthermore, any discrepancies, deviations, and misrepresented information depicted in terms of the motion, visual, and auditory cueing systems can impact learning and overall performance in the operational environment as trainees learn to respond to inaccurate information (Ray, 2000; Fletcher et al., 2017). One last consideration with respect to realism is that performance in a simulator may not necessarily reflect an individual’s operational performance. Never assume that proficiency in a simulator equates to proficiency in the operational system or environment. Increased performance outcomes may be due to reduced stress levels in simulation compared to an actual situation and operational performance may not be adequately reflected, as emergencies, system malfunctions, and unscheduled events are expected during training. On the other hand, added stress may occur in simulation evaluations due to socio-evaluative stress. Further, trainees might not perform as they would in the real world due to reduced levels of risk. For example, in a simulated driving experiment, some drivers may be more likely to speed in simulators due to the reduced risk perception in this environment (Bella, 2008). However, other research suggests that experienced drivers continue to engage in safe and hazard monitoring behavior regardless of the simulated nature, and driving speeds do

82

Human Factors in Simulation and Training

not differ in simulated studies (Underwood et al., 2011; Bella, 2005). Other factors that can influence performance in a simulator compared to the real world could include: (a) above-average performance due to preparation before simulation evaluations (Moroney & Moroney, 2009), (b) lack of fatigue, complacency, or boredom compared to the operational environment (Moroney & Moroney, 2009), (c) the learning curve required to become proficient in the simulator (Ronen & Yair, 2013), and (d) shift in responsibilities and expectations when transferring from the simulator to the operational environment (Green, 2000). As such, there will always be concerns when comparing performance in the simulator to performance in the real world.

CONCLUSION Simulation has many purposes, including the ability to train in safe environments, practice novel situations, evaluate futuristic concepts, research difficult-to-observe constructs, connect people across the globe, and experience new fantasy worlds. Simulation has various applications and is widely accepted in numerous domains, including aviation, military, medical, driving, emergency response, education, entertainment, and maintenance domains. As with any technology, disadvantages exist; however, they are outweighed by the benefits derived from their benefits in terms of cost reduction, safety, data access, intervention capabilities, flexibility, and realism. This chapter aimed to illustrate the advances in simulation technology, range of uses, and the associated beneficial outcomes across a range of domains and uses. As technology continues to advance, use of simulation will continue to increase— slowly overtaking live applications altogether (Yoon et al., 2019). As we continue to find ways to innovate, cost reductions, benefits, and gains from simulation will continue to broaden, leading to increased adoption across domains.

REFERENCES Abdelgawad, K., Gausemeier, J., Stöcklein, J., Grafe, M., Berssenbrügge, J., & Dumitrescu, R. (2017). A platform with multiple head-mounted displays for advanced training in modern driving schools. Designs, 1(2), 8. Alabdulkarim, A. A., Ball, P. D., & Tiwari, A. (2013). Applications of simulation in maintenance research. World Journal of Modelling and Simulation, 9(1), 14–37. Alexander, A. L., Brunyé, T., Sidman, J., & Weil, S. A. (2005). From gaming to training: A review of studies on fidelity, immersion, presence, and buy-in and their effects on transfer in pc-based simulations and games. DARWARS Training Impact Group, 5, 1–14. Alvarez, S., Page, Y., Sander, U., Fahrenkrog, F., Helmer, T., Jung, O., Hermitte, T., Düering, M., Döering, S., & Op den Camp, O. (2017). Prospective effectiveness assessment of ADAS and Active safety systems via virtual simulation: A review of the current practices. 25th International Technical Conference on the Enhanced Safety of Vehicles (ESV). Andreatta, P. B., Hillard, M., & Krain, L. P. (2010). The impact of stress factors in simulationbased laproscopic training. Surgery, 147(5), 631–639. Arıcı, F., & Yılmaz, R. M. (2020). The effect of laboratory experiment and interactive simulation use on academic achievement in teaching secondary school force and movement unit. Elementary Education Online, 19(2), 465–476.

Justification for Use of Simulation

83

Balinova, D. (2020). Military-Entertainment Complex: The myth of the War on Terror. Chicago. Bauer, K., Brusso, R., & Orvis, K. (2012). Using adaptive difficulty to optimize videogamebased training performance: The moderating variable of personality. Military Psychology, 24(2), 148–165. Beal, M. D., Kinnear, J., Anderson, C. R., Martin, T. D., Wamboldt, R., & Hooper, L. (2017). The effectiveness of medical simulation in teaching medical students critical care medicine: A systematic review and meta-analysis. Simulation in Healthcare: The Journal of the Society for Simulation in Healthcare, 12(2), 104–116. Beaubien, J. M., & Baker, D. P. (2004). The use of simulation for training teamwork skills in health care: How low can you go? BMJ Quality & Safety, 13(suppl 1), i51–i56. Chicago. Bella, F. (2005). Validation of a driving simulator for work zone design. Transportation Research Record: Journal of the Transportation Research Board, 1937(1), 136–144. Bella, F. (2008). Driving simulator for speed research on two-lane rural roads. Accident Analysis & Prevention, 40(3), 1078–1087. Bennett, W. B., Rowe, L. J., Craig, S. D., & Poole, H. M. (2017). Training issues for remotely piloted aircraft systems from a human systems integration perspective. In N. J. Cooke, L. J. Rowe, W. Bennett, & D. Q. Joralmon (Eds.), Remotely Piloted Aircraft Systems A Human Systems Integration Perspective (pp. 15–39). West Sussex, UK: John Wiley & Sons Ltd. Berthold, M., Moore, A., Steiner, C. M., Gaffney, C., Dagger, D., Albert, D., ... & Conlan, O. (2012, September). An initial evaluation of metacognitive scaffolding for experiential training simulators. In European Conference on Technology Enhanced Learning (pp. 23–36). Springer, Berlin, Heidelberg. Blodgett, N. P., Blodgett, T., & Kardong-Edgren, S. E. (2018). A proposed model for simulation faculty workload determination. Clinical Simulation in Nursing, 18, 20–27. Bouyer, G., Chellali, A., & Lécuyer, A. (2017, March). Inducing self-motion sensations in driving simulators using force-feedback and haptic motion. In 2017 IEEE Virtual Reality (VR) (pp. 84–90). IEEE. Breuer, J., & Bente, G. (2010). Why so serious? On the relation of serious games and learning. Journal for Computer Game Culture, 4, 7–24. Brooks, J. O., Goodenough, R. R., Crisler, M. C., Klein, N. D., Alley, R. L., Koon, B. L., Logan, J. W. C., Ogle, J. H., Tyrrell, R. A., & Wills, R. F. (2010). Simulator sickness during driving simulation studies. Accident Analysis and Prevention, 42(3), 788–796. https://doi​-org​.portal​.lib​.fit​.edu​/10​.1016​/j​.aap​.2009​.04​.013 Campos, J. L., Bédard, M., Classen, S., Delparte, J. J., Hebert, D. A., Hyde, N., Law, G., Naglie, G., & Yung, S. (2017). Guiding framework for driver assessment using driving simulators. Frontiers in Psychology, 8, 1428. Caro, P. W. (1988). Flight training and simulation. In E. L. Wiener & D. C. Nagel (Eds.), Human Factors in Aviation. San Diego, CA: Academic Press. Carroll, M., Rebensky, S., & Sanchez, P. (2020). Examining pilot response to cybersecurity events on the flight deck. National Training Aircraft Symposium, March 2–4, Daytona Beach, FL. Carroll, M., Rebensky, S., Wilt, D., Pittorie, W., Hunt, L., Chaparro, M., & Sanchez, P. (2021). Integrating uncertified information from the electronic flight bag into the aircraft panel: Impacts on pilot response. International Journal of Human Computer Interaction, 37:7, 630–641. https://doi​.org​/10​.1080​/10447318​.2020​.1854001 Cecil, J., Ramanathan, P., Pirela-Cruz, M., & Kumar, M. B. R. (2014). A virtual reality based simulation environment for orthopedic surgery. In OTM Confederated International Conferences “On the Move to Meaningful Internet Systems” (pp. 275–285). Springer, Berlin, Heidelberg.

84

Human Factors in Simulation and Training

Cegala, D. J., & Broz, S. L. (2003). Provider and patient communication skills training. In T. L. Thompson, A. K. Dorsey, K. Miller, & R. Parrott (Eds.), Handbook of Health Communication, 95–119. Mahwah, NJ: Erlbaum. Champney, R. K., Stanney, K. M., Milham, L., Carroll, M. B., & Cohn, J. V. (2017). An examination of virtual environment training fidelity on training effectiveness. International Journal of Learning Technologies, 12(1), 42–65. Chowdhury, T. I., Ferdous, S. M. S., & Quarles, J. (2017). Information recall in a virtual reality disability simulation. In Proceedings of the 23rd ACM Symposium on Virtual Reality Software and Technology (pp. 1–10). Cocking, D., & Matthews, S. (2000). Unreal friends. Ethics and Information Technology, 2(4), 223–231. Cohen, E. R., Feinglass, J., Barsuk, J. H., Barnard, C., O’Donnell, A., McGaghie, W. C., & Wayne, D. B. (2010). Cost savings from reduced catheter-related bloodstream infection after simulation-based education for residents in a medical intensive care unit. Simulation in Healthcare: The Journal of the Society for Simulation in Healthcare, 5(2), 98–102. 10.1097/SIH.0b013e3181bc8304 Colaco, H. B., Hughes, K., Pearse, E., Arnander, M., & Tennent, D. (2017). Construct validity, assessment of the learning curve, and experience of using a low-cost arthroscopic surgical simulator. Journal of Surgical Education, 74(1), 47–54. Coller, B. D., & Scott, M. J. (2009). Effectiveness of using a video game to teach a course in mechanical engineering. Computers & Education, 53(3), 900–912. Cook, D. A., Hatala, R., Brydges, R., Zendejas, B., Szostek, J. H., Wang, A. T., ... & Hamstra, S. J. (2011). Technology-enhanced simulation for health professions education: A systematic review and meta-analysis. Jama, 306(9), 978–988. Cooper, J. B., & Taqueti, V. (2008). A brief history of the development of mannequin simulators for clinical education and training. Postgraduate Medical Journal, 84(997), 563–570. Chicago. Crea, K. A. (2011). Practice skill development through the use of human patient simulation. American Journal of Pharmaceutical Education, 75(9), 188. Dean, S, Milham, L., Carroll, M., Schaeffer, R., Alker, M., & Buscemi, T. (2008). Challenges of scenario design in a mixed-reality environment. In Proceedings of the Interservice/ Industry Training, Simulation, and Education Conference (I/ITSEC) Annual Meeting, Orlando, FL. De Gloria, A., Bellotti, F., & Berta, R. (2014). Serious Games for education and training. International Journal of Serious Games, 1(1). Deterding, S., Canossa, A., Harteveld, C., Copper, S., Nacke, L. E., Whitson, J. R. (2015). Gamifying research: Strategies, opportunities, challenges, ethics. In Proceedings of the 33rd Annual Acm Conference Extended Abstracts on Human Factors in Computing Systems (pp. 2421–2424). Djaouti, D., Alvarez, J., Jessel, J. P., & Rampnoux, O. (2011). Origins of serious games. In M. Ma, A. Oikonomou, & L. Jain (Eds.), Serious Games and Edutainment Applications (pp. 25–43). London: Springer. Douglas, S., Hood, C., Overmans, T., & Scheepers, F. (2019). Gaming the system: Building an online management game to spread and gather insights into the dynamics of performance management systems. Public Management Review, 21(10), 1560–1576. Endsley, M. (1995). Measurement of situation awareness in dynamic systems. Human Factors, 37(1), 65–84. Farra, S. L., Gneuhs, M., Hodgson, E., Kawosa, B., Miller, E. T., Simon, A., Timm, N., & Hausfeld, J. (2019). Comparative cost of virtual reality training and live exercises for training hospital workers for evacuation. Computers, Informatics, Nursing, 37(9), 446–454. 10.1097/CIN.0000000000000540

Justification for Use of Simulation

85

Federal Aviation Administration. (2020). FAR 121 Subpart N—Training program. Federal Aviation Administration. Fletcher, J. D., Belanich, J., Moses, F., Fehr, A., & Moss, J. (2017). Effectiveness of augmented reality & augmented virtuality. In MODSIM Modeling & Simulation of Systems and Applications. World conference. Fraune, M. R., Khalaf, A. S., Zemedie, M., Pianpak, P., NaminiMianji, Z., Alharthi, S. A., ... & Toups, Z. O. (2021). Developing future wearable interfaces for human-drone teams through a virtual drone search game. International Journal of Human-Computer Studies, 147, 102573. Fussell, S. G. (2020). Determinants of aviation students’ intentions to use virtual reality for flight training. Dissertation. Fussell, S. G., & Truong, D. (2020). Preliminary results of a study investigating aviation students’ intentions to use virtual reality for flight training. International Journal of Aviation, Aeronautics, and Aerospace, 7(3), 2. Gore, B. F. (2011). Man-machine integration design and analysis system (MIDAS) v5: Augmentations, motivations, and directions for aeronautics applications. In P. C. Cacciabu, M. Hjalmdahl, A. Luedtke, & C. Riccioli (Eds.), Human Modelling in Assisted Transportation (pp. 43–54). Heidelberg, Germany: Springer. Granic, I., Lobel, A., & Engels, R. C. (2014). The benefits of playing video games. American Psychologist, 69(1), 66. Chicago. Green, M. F. (2000). Aviation instruction through flight simulation: Enhancing pilots’ decision-making skills. Flight Simulation—The Next Decade: Proceedings of the Royal Aeronautical Society, London, May 10–12. Hall, A. B., Riojas, R., & Sharon, D. (2014). Comparison of self-efficacy and its improvements after artificial simulator or live animal model emergency procedure training. Military Medicine, 179(3), 320–323. Hancock, P. A., Vincenzi, D. A., Wise, J. A., & Mouloua, M. (Eds.). (2008). Human Factors in Simulation and Training. Boca Raton, FL: CRC Press. Hardoff, D., & Schonmann, S. (2001). Training physicians in communication skills with adolescents using teenage actors as simulated patients. Medical Education, 35(3), 206–210. Himona, S. L., Stavrakis, E., Loizides, A., Savva, A., & Chrysanthou, Y. (2011). SIMPOL VR – A virtual reality law enforcement training simulator. MCIS 2011 Proceedings. Hochmitz, I., & Yuviler-Gavish, N. (2011). Physical fidelity versus cognitive fidelity training in procedural skills acquisition. Human Factors, 53(5), 489–501. Hoffman, R. R., Ward, P., Feltovich, P. J., DiBello, L., Fiore, S. M., & Andrews, D. H. (2014). Expertise: Research and Applications: Accelerated Expertise: Training for High Proficiency in a Complex World. Psychology Press. Honey, M. A., & Hilton, M. L. (2011). Learning Science Through Computer Games. Washington, DC: National Academies Press. Hong, T., Taylor-Lange, S. C., D’Oca, S., Yan, D., & Corgnati, S. P. (2016). Advances in research and applications of energy-related occupant behavior in buildings. Energy and Buildings, 116, 694–702. Howells, N. R., Auplish, S., Hand, G. C., Gill, H. S., Carr, A. J., & Rees, J. L. (2009). Retention of arthroscopic shoulder skills learned with use of a simulator: Demonstration of a learning curve and loss of performance level after a time delay. JBJS, 91(5), 1207–1213. Hromek, R., & Roffey, S. (2009). Promoting social and emotional learning with games: “It’s fun and we learn things”. Simulation and Gaming, 40(5), 626–644. Hughes, K., Vignjevic, R., Campbell, J., De Vuyst, T., Djordjevic, N., & Papagiannis, L. (2013). From aerospace to offshore: Bridging the numerical simulation gaps-simulation advancements for fluid structure interaction problems. International Journal of Impact Engineering, 61, 48–63.

86

Human Factors in Simulation and Training

Kamiński, J., Jurczak, J., & Jakubczyk, R. (2020). Simulator of police actions in crisis situations as an application of an intelligent decision support system in the process of improving Polish police actions. Internal Security, Sp Issue, 137–145. Jentsch, F., & Bowers, C. A. (1998). Evidence for the validity of PC-based simulation in studying aircrew coordination. International Journal of Aviation Psychology, 8(3), 243–260. Jerald, J., Haskins, J., Eadara, S., Gainer, S., Zhu, B., & Huse, W. (2020). Utilizing physical props to simulate equipment in immersive environments. I/ITSEC 2020. Jones, E. R., Hennessy, R. T., & Deutsch, S. (1985). Human Factors Aspects of Simulation. Washington, DC: National Academy Press. Kahlbaugh, P. E., Sperandio, A. J., Carlson, A. L., & Hauselt, J. (2011). Effects of playing Wii on well-being in the elderly: Physical activity, loneliness, and mood. Activities, Adaptation & Aging, 35(4), 331–344. Kanki, B. G. (2019). Communication and crew resource management. In B. G. Kanki, J. Anca, & T. R. Chidester (Eds.), Crew Resource Management (pp. 139–184). Elsevier Science & Technology. Kim, Y., Kim, H., & Kim, Y. O. (2017). Virtual reality and augmented reality in plastic surgery: A review. Archives of Plastic Surgery, 44(3), 179. Klimmt, C. (2003). Dimensions and deteminants of the enjoyment of playing digital games: A three-level model. In M. Copier & J. Raessens (Eds.), Level Up: Digital Games Research Conference (pp. 246–257). Utrecht, The Netherlands: Faculty of Arts, Utrecht University. Klimmt, C., Hartmann, T., & Frey, A. (2007). Effectance and control as determinants of video game enjoyment. Cyberpsychology & Behavior, 10(6), 845–848. Kronqvist, A., Jokinen, J., & Rousi, R. (2016). Evaluating the authenticity of virtual environments: Comparison of three devices. Advances in Human-Computer Interaction, 3, 1–14. Kumm, C., & Burwell, J. (2017). “Systems of systems” approach for the development of next generation modular simulation-based training systems. MODSIM World, 7, 1–17. Landon-Hays, M., Peterson-Ahmad, M. B., & Frazier, A. D. (2020). Learning to teach: how a simulated learning environment can connect theory to practice in general and special education educator preparation programs. Education Sciences, 10(7), 184. Lee, L. H., Chew, E. P., Frazier, P. I., Jia, Q., & Chen, C. (2013). Advances in simulation optimization and its applications. IIE Transactions, 45(7), 683–684. https://doi​.org​/10​ .1080​/0740817X​.2013​.778709 Lindgren, R., Tscholl, M., Wang, S., & Johnson, E. (2016). Enhancing learning and engagement through embodied interaction within a mixed reality simulation. Computers & Education, 95, 174–187. https://doi​.org​/10​.1016​/j​.compedu​.2016​.01​.001 Lindsey, S., Ganey, H. N., & Carroll, M. (2019). Designing military cockpits to support a broad range of personnel body sizes. 20th International Symposium on Aviation Psychology, 145–150. Liu, Y., Wang, T., Zhang, H., Cheutet, V., & Shen, G. (2019). The design and simulation of an autonomous system for aircraft maintenance scheduling. Computers & Industrial Engineering, 137, 106041. Lubner, M., Dattel, A. R., Allen, E., Henneberry, D., & DeVivo, S. (2019). Six-year follow-up of intensive, simulator-based pilot training. 20th International Symposium on Aviation Psychology, 450–455. Luisier, J., Yooyen, S., & Deebhijarn, S. (2019). Perceptions of Thai aviation students on consumer grade VR flight experiences. Multidisciplinary Digital Publishing Institute Proceedings, 39(1), 8.

Justification for Use of Simulation

87

Lukosch, H. K., Bekebrede, G., Kurapati, S., & Lukosch, S. G. (2018). A scientific foundation of simulation games for the analysis and design of complex systems. Simulation & Gaming, 49(3), 279–314. Mahmood, T., & Darzi, A. (2004). The learning curve for a colonoscopy simulator in the absence of any feedback: no feedback, no learning. Surgical Endoscopy and Other Interventional Techniques, 18, 1224–1230. Mairaj, A., Baba, A. I., & Javaid, A. Y. (2019). Application specific drone simulators: Recent advances and challenges. Simulation Modelling Practice and Theory, 94, 100–117. Marler, T., Straus, S. G., Mizel, M. L., Hollywood, J. S., Harrison, B., Yeung, D., Klima, K., Lewis, M. W., Rizzo, S., Hartholt, A., & Swain, C. (2020). Effective game-based training for police officer decision-making: Linking missions, skills, and virtual content. I/ITSEC 2020. Macfarlane, R. (1997). Simulation as an instructional procedure. In G. J. F. Hunt (Ed.), Designing Instruction for Human Factors Training in Aviation. Aldershot, UK: Avebury Aviation. McGrath, J. L., Taekman, J. M., Dev, P., Danforth, D. R., Mohan, D., Kman, N., Crichlow, A., & Bond, W. F. (2017). Using virtual reality simulation to assess competence for emergency medicine learners. Academic Emergency Medicine, 25(2), 186–195. https:// doi​.org​/10​.1111​/acem​.13308 McRae, M. E., Chan, A., Hulett, R., Lee, A. J., & Coleman, B. (2017). The effectiveness of and satisfaction with high-fidelity simulation to teach cardiac surgical resuscitation skills to nurses. Intensive and Critical Care Nursing, 40, 64–69. Mehta, S., Ullah, N., Kabir, M. H., Sultana, M. N., & Kwak, K. S. (2009). A case study of networks simulation tools for wireless networks. In 2009 Third Asia International Conference on Modelling & Simulation (pp. 661–666). IEEE. Meuleners, L., & Fraser, M. (2015). A validation study of driving errors using a driving simulator. Transportation Research Part F: Traffic Psychology and Behaviour, 29, 14–21. Morgan, P. J., Cleave-Hogg, D., McIlroy, J., & Devitt, J. H. (2002). Simulation technology: A comparison of experiential and visual learning for undergraduate medical students. Anesthesiology. The Journal of the American Society of Anesthesiologists, 96(1), 10–16. Moroney, W. F., & Moroney, B. W. (1999). Flight simulation. In D. J. Garland J. A. Wise, & V. D. Hopkin (Eds.), Handbook of Aviation Human Factors (pp. 355–388). Mahwah, NJ: Lawrence Erlbaum Associates. Moroney, W. F., & Moroney, B. W. (2009). Flight simulation. In D. J. Garland, J. A. Wise, & V. D. Hopkin (Eds.), Handbook of Aviation Human Factors. Boca Raton, FL: CRC Press. Mpu, Y., & Adu, E. O. (2020). Collaborative Virtual Learning in Education in STEM Education. Management, 8(4), 315–324. Mugunthan, V., Peraire-Bueno, A., & Kagal, L. (2020, October). Privacyfl: A simulator for privacy-preserving and secure federated learning. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (pp. 3085–3092). Musharraf, M., Moyle, A., Khan, F., & Veitch, B. (2019). Using simulator data to facilitate human reliability analysis. Journal of Offshore Mechanics and Arctic Engineering, 141(2), 021607. Nahavandi, S., Wei, L., Mullins, J., Fielding, M., Deshpande, S., Watson, M., Korany, S., Nahavandi, D., Hettiarachchi, I., Najdovski, Z., Jones, R., Mullins, A., & Carter, A. (2019). Haptically-enabled VR-based immersive fire fighting training simulator. Intelligent Computing: Proceeding of the 2019 Computing Conference, 1, 11–21.

88

Human Factors in Simulation and Training

Neges, M., Adwernat, S., & Abramovici, M. (2017). Augmented virtuality for maintenance training simulation under various stress conditions. 6th Annual Conference on Through-life Engineering Services, 7–8. Nestel, D., Groom, J., Eikeland-Husebø, S., & O'Donnell, J. M. (2011). Simulation for learning and teaching procedural skills: The state of the science. Simulation in Healthcare, 6(7) Supplement, S10–S13. Oculus. Quest 2 details. https://www​.oculus​.com​/quest​-2/​?locale​=en​_US Orlansky, J., & String, J. (1977). Cost-effectiveness of flight simulator for military training (Rep. No. IDA NO. HQ 77-19470). Arlington, VA: Institute for Defense Analysis. Orvis, K., Horn, D., & Belanich, J. (2008). The roles of task difficulty and prior videogame experience on performance and motivation in instructional videogames. Computers in Human Behavior, 24, 2415–2433. Patterson, M. D., Blike, G. T., & Nadkarni, V. M. (2008). In situ simulation: Challenges and results. Advances in patient safety: New directions and alternative approaches (Vol. 3: performance and tools). Chicago. Pellas, N., Kazanidis, I., Konstantinou, N., & Georgiou, G. (2017). Exploring the educational potential of three-dimensional multi-user virtual worlds for STEM education: A mixed-method systematic literature review. Education and Information Technologies, 22(5), 2235–2279. Pfandler, M., Lazarovici, M., Stefan, P., Wucherer, P., & Weigl, M. (2017). Virtual realitybased simulators for spine surgery: A systematic review. The Spine Journal, 17(9), 1352–1363. https://doi​.org​/10​.1016​/j​.spinee​.2017​.05​.016 Pinheiro, A., Fernandes, P., Maia, A., Cruz, G., Pedrosa, D., Fonseca, B., Paredes, H., Martins, P., Morgado, L., & Rafael, J. (2012). Development of a mechanical maintenance training simulator in opensimulator for F-16 aircraft engines. Procedia Computer Science, 15, 248–255. Pittorie, W., Lindsey, S., Wilt, D. F., & Carroll, M. (2019). Low-cost simulator for flight crew human-factors studies. In Proceedings of the AIAA Aviation and Aeronautics Forum and Exposition, June 17–21, Dallas, Texas. Quellmalz, E. S., Silberglitt, M. D., Buckley, B. C., Loveland, M. T., & Brenner, D. G. (2020). Simulations for supporting and assessing science literacy. In I. Management Association (Ed.), Learning and Performance Assessment: Concepts, Methodologies, Tools, and Applications (pp. 760–799). Hershey, PA: IGI Global. Radianti, J., Majchrzak, T. A., Fromm, J., & Wohlgenannt, I. (2020). A systematic review of immersive virtual reality applications for higher education: Design elements, lessons learned, and research agenda. Computers & Education, 147, 103778. Ray, P. A. (2000). Is today’s flight simulator prepared for tomorrow’s requirements? Flight simulation—The next decade. In Proceedings of the Royal Aeronautical Society, May 10–12. Rebensky, S., Carroll, M., Bennett, W., & Hu, X. (2020). Collaborative development of a synthetic task environment by academia and military. Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC) 2020. Reweti, S., Gilbey, A., & Jerffery, L. (2017). Efficacy of low-cost PC-based aviation training devices. Journal of Information Technology Education: Research, 16, 127–142. Rutten, N., van Joolingen, W. R., & van der Veen, J. T. (2012). The learning effects of computer simulations in science education Computers & Education, 58(1), 136–153. Ritterfeld, U., Cody, M., & Vorderer, P. (2009). Serious Games: Mechanisms and Effects. London, UK: Routledge. Rolfe, J. M., & Staples, K. J. (1986). Flight Simulation. New York: Cambridge University Press. Ronen, A., & Yair, N. (2013). The adaptation period to a driving simulator. Transportation Research Part F: Traffic Psychology and Behaviour, 18, 94–106.

Justification for Use of Simulation

89

Rui, L. (2020). Behind the Popularity: Simulation Game in China. [Published undergraduate honors thesis, UNC Digital Library]. https://doi​.org​/10​.17615​/ h3a0​-k802 Sena, P., Attianese, P., Carbone, F., Pellegrino, A., Pinto, A., & Villecco, F. (2012). A fuzzy model to interpret data of drive performances from patients with sleep deprivation. Chicago. Sharma, S., & Scribner, D. (2017). Megacity: A collaborative virtual reality environment for emergency response, training, and decision making. IS&T International Symposium on Electronic Imaging (pp. 70–77). Sikorski, E., Palla, A., & Brent, L. (2017). Developing an immersive virtual reality aircrew training capability. 2017 IITSEC. Skinner, H., Possignolo, R. T., Wang, S. H., & Renau, J. (2020, August). LiveSim: A fast hot reload simulator for HDLs. In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (pp. 126–135). IEEE. Smith, R. (2010). The long history of gaming in military training. Simulation & Gaming, 41(1), 6–19. Chicago. St. Julien, T., & Shaw, C. D. (2003). Firefighter command training virtual environment. TAPIA ’03: Proceedings of the 2003 conference on Diversity in Computing. https://doi​ -org​.portal​.lib​.fit​.edu​/10​.1145​/948542​.948549 Straus, S. G., Lewis, M. W., Connor, K., Eden, R., Boyer, M. E., Marler, T., … Smigowski, H. (2019). Collective Simulation-Based Training in the U.S. Army. Santa Monica, CA: RAND Corporation. Stroosma, O., Van Paassen, M. M., & Mulder, M. (2003, August). Using the SIMONA research simulator for human-machine interaction research. In AIAA Modeling and Simulation Technologies Conference and Exhibit (p. 5525). Taber, N. (2008). Emergency response: Elearning for paramedics and firefighters. Simulation and Gaming, 39(4), 515–527. Taheri, S. M., Matsushita, K., Sasaki, M. (2017). Development of a driving simulator with analyzing driver’s characteristics based on a virtual reality head mounted display. Journal of Transportation Technologies, 7(3). 10.4236/jtts.2017.73023 Tan, S. S. Y., & Sarker, S. K. (2011). Simulation in surgery: A review. Scottish Medical Journal, 56(2), 104–109. Tang, J., Lau, W., Chan, K., & To, K. (2014). AR interior designer: Automatic furniture arrangement using spatial and functional relationships. 2014 International Conference on Virtual Systems & Multimedia (VSMM). 10.1109/VSMM.2014.7136652 Taylor, H. L., Lintern, G., Hulin, C. L., Talleur, D. A., Emanuel, T. W., & Phillips, S. I. (1999). Transfer of training effectiveness of a personal computer aviation training device. International Journal of Aviation Psychology, 9(4), 319–335. Taylor, H. L., Talleur, D. A., Rantanen, E. M., & Emanuel, T. W. (2004). The effectiveness of a personal computer aviation training device, a flight training device, and an airplane in conducting instrument proficiency checks. Aviation Human Factors Division Institute of Aviation. Underwood, G., Crundall, D., & Chapman, P (2011). Driving simulator validation with hazard perception. Transportation Research Part F: Traffic Psychology and Behaviour, 14(6), 435–446. Van Heukelom, J. N., Begaz, T., & Treat, R. (2010). Comparison of postsimulation debriefing versus in-simulation debriefing in medical simulation. Simulation in Healthcare, 5(2), 91–97. van de Ven, J., van Baaren, G. J., Fransen, A. F., van Runnard Heimel, P. J., Mol, B. W., & Oei, S. G. (2017). Cost-effectiveness of simulation-based team training in obstetric emergencies (TOSTI study). European Journal of Obstetrics & Gynecology and Reproductive Biology, 216, 130–137.

90

Human Factors in Simulation and Training

Vlachopoulos, D., & Makri, A. (2017). The effect of games and simulations on higher education: A systematic literature review. International Journal of Educational Technology in Higher Education, 14(1), 1–33. Vlakveld, W., van Nes, N., de Bruin, J., Vissers, L., & van der Kroft, M. (2018). Situation awareness increases when drivers have more time to take over the wheel in a Level 3 automated car: A simulator study. Transportation Research Part F: Traffic Psychology and Behaviour, 58, 917–929. Waag, W. L. (1978). Recent studies of simulation training effectiveness. Proceedings of the Society of Automotive Engineers: Town and County, San Diego, November 27–30. Wang, H., & Zhai, Z. (2016). Advances in building simulation and computational techniques: A review between 1987 and 2014. Energy and Buildings, 128(15), 319–335. Warren, J. N., Luctkar-Flude, M., Godfrey, C., & Lukewich, J. (2016). A systematic review of the effectiveness of simulation-based education on satisfaction and learning outcomes in nurse practitioner programs. Nurse Education Today, 46, 99–108. Webster, M., Cameron, N., Fisher, M., & Jump, M. (2014). Generating certification evidence for autonomous unmanned aircraft using model checking and simulation. Journal of Aerospace Information Systems, 11(5), 258–278. Winther, F., Ravindran, L., Svendsen, K. P., & Feuchtner, T. (2020). Design and evaluation of a VR training simulation for pump maintenance based on a use case at Grundfos. 2020 IEEE Conference on Virtual Reality and 3D User Interfaces. Woda, A., Hansen, J., Paquette, M., & Topp, R. (2017). The impact of simulation sequencing on perceived clinical decision making. Nurse Education in Practice, 26, 33–38. Wynne, R. A., Beanland, V., & Salmon, P. M. (2019). Systematic review of driving simulation validation studies. Safety Science, 117, 138–151. Yoon, S., Park, T., Lee, J., & Kim, J. (2019). A study on transfer effectiveness and appropriate training hours. ITEC 2019. Yuksek, B., Vuruskan, A., Ozdemir, U., Yukselen, M. A., & Inalhan, G. (2016). Transition flight modeling of a fixed-wing VTOL UAV. Journal of Intelligent & Robotic Systems, 84, 83–105. Zhang, X., Liu, X., Yuan, S. M., & Lin, S. F. (2017). Eye tracking based control system for natural human-computer interaction. Computational Intelligence and Neuroscience, 2017, 1–9.

3

Simulation Fidelity Dahai Liu, Jiahao Yu, Nikolas D. Macchiarella, and Dennis A. Vincenzi

CONTENTS Introduction............................................................................................................... 91 Definition of Fidelity................................................................................................92 Physical Fidelity........................................................................................................ 95 Visual–Audio Fidelity...............................................................................................96 Equipment Fidelity...................................................................................................97 Motion Fidelity.........................................................................................................97 Psychological–Cognitive Fidelity............................................................................. 98 Other Fidelity............................................................................................................ 98 Measuring Fidelity....................................................................................................99 The Mathematical Model........................................................................................ 100 Subjective Methods........................................................................................ 101 Fidelity Evaluation Frameworks............................................................................. 102 Fidelity and Transfer of Training............................................................................ 103 Summary................................................................................................................. 105 References............................................................................................................... 106

INTRODUCTION With the increasing demand on training to function in highly complex situations, researchers and practitioners strive to build high-fidelity simulation devices as similar to real situations as possible. Each year, technological advances bring simulation closer and closer to duplicating precise, authentic environments. Simulation-based training has many benefits such as improving speed and technical skills (Bur et al., 2017), boosting knowledge, critical thinking, and decision-making (Akalin & Sahin, 2020), enhancing team communication (Chamberland et al., 2018), and increasing students’ self-confidence (Haddeland et al., 2019). Unfortunately, a high financial cost is associated with these highly sophisticated devices. Simulation quality and human capabilities are critical factors that determine training effectiveness and efficiency. The key issue related to simulation quality is the “degree to which the training devices must duplicate the actual equipment” or environment. This degree of similarity is called simulation fidelity (Allen, 1986). The issue of fidelity must be addressed not only because it is quite possibly the most important factor in the assessment of simulation quality, and a key factor of

DOI: 10.1201/9781003401360-3

91

92

Human Factors in Simulation and Training

simulation training transfer, but is also a critical factor related to cost-effective simulation device design. Often simulators are not utilized because the equipment is too costly to purchase (Garrison, 1985). It is widely accepted that training devices with excessive levels of fidelity may not be cost-effective (Allen, 1986; Fortin, 1989; Roza, 2000; Scott & Gartner, 2019). Medium-fidelity simulation had a significantly higher satisfaction rate than high-fidelity simulation in nursing students, and it was proved to be more cost-effective for the acquisition of basic skills at a lower cost (Alconero-Camarero et al., 2021). Existing research in the area of flight simulation training, aviation psychology, military aviation, and many other domains has contributed greatly to the understanding of simulation. Despite this work, many questions remain unanswered. Research indicates that simulation is a valuable tool for training (Akalin & Sahin, 2020; Bur et al., 2017; Chamberland et al., 2018; Haddeland et al., 2019; Hays et al., 1992). What is not fully understood is how to make simulation more efficient, i.e., how to determine what level of fidelity is sufficient for effective transfer of training. Despite this, fidelity, or “level of detail,” continues to be a major issue in simulation development (Hughes & Rolek, 2003). Due to the complex nature of simulation tasks, large numbers of objects and attributes, and random human behaviors involved, quantification of simulation fidelity becomes the most challenging aspect of fidelity measurement. Schricker et al. (2001) concluded that the main issues regarding fidelity and how it is addressed in the literature are as follows: (1) No detailed, agreed-upon definition; (2) Rampant subjectivity; (3) No method of quantifying the assignment of fidelity; and (4) No detailed example of a referent. Other major issues involve the accepted methods by which fidelity is measured. There are two major methods described in the literature for fidelity measurement. The first is through mathematical measurement that calculates the number of identical elements shared between the real world and the simulation; the greater the number of shared identical elements, the higher the simulation fidelity. This is referred to as the objective method (Gross & Freeman, 1997; Schricker et al., 2001). A second method to measure fidelity is through a trainees’ performance matrix. By assessing a human’s performance and then comparing it to real-world performance to measure the transfer of training, fidelity of a simulation can be measured indirectly (Parrish et al., 1983; Ferguson et al., 1985; Nemire et al., 1994; Field et al., 2002; Mania et al., 2003). We will address these methods in detail in a later section. First, a clear definition is needed to understand what is meant by “fidelity.”

DEFINITION OF FIDELITY Simulation fidelity is an umbrella term defined as the extent to which the simulation replicates the actual environment (Alessi, 1988; Gross et al., 1999). In aviation, simulation fidelity refers to the extent to which a flight training device looks, sounds, responds, and maneuvers like a real aircraft. Many simulation professionals attempt

Simulation Fidelity

93

to define fidelity comprehensively, whereas others argue that fidelity is a far too nebulous idea than can even be defined. This implies that efforts of defining fidelity are currently unsuccessful (Schricker et al., 2001). Rehmann et al. (1995) investigated over 30 years of research on fidelity and found that there was no agreed-upon single definition; in fact, at least 22 different definitions can be drawn from the literature. For example, fidelity can be defined as simply as “how closely a simulation imitates reality” (Alessi, 1988), or “the levels of physical and visual similarity with real work settings” (Hontvedt & Øvergård, 2020), or more specifically, on different fidelity dimensions, including but not limited to equipment fidelity, environmental fidelity, psychological and cognitive fidelity, task fidelity, physical fidelity, and functional fidelity. Furthermore, Rehmann et al. (1995) found that none of these terms or definitions is applicable to overall aircraft simulation in general. This lack of a welldefined and widely accepted fidelity concept causes miscommunications between researchers and inconsistencies in their research (Roza, 2000). Based on the previous research and a definition by Fidelity-ISG, Roza (2000) developed the following theorems regarding simulation fidelity:

1. Fidelity models are multidimensional; they involve and can be quantified using a variety of factors. 2. Fidelity is application-independent; it is an intrinsic and inherent property of a simulation model. 3. Fidelity must be quantified and qualified with respect to a referent; this means that metrics (i.e., size, weight, shape) should exist on how to determine if a simulation resembles its referent. 4. Fidelity quantification has a level of uncertainty. 5. Fidelity comparison should be based on a common referent in order to make sense. For example, comparing fidelity levels of an aircraft simulator should be drawn from the same or similar aircraft. Seven descriptive concepts were defined to further understand and quantify fidelity: detail, resolution, error, precision, sensitivity, timing, and capacity. Specific metrics or measurements can be defined in depth for each of these concept factors. A simulation referent is important not only to define fidelity, but also for its measurement. It can be simply described “in terms of the extent to which a representation reproduces the attributes and behaviors of a referent” (Hughes & Rolek, 2003). A referent is “an entity or collection of entities and/or conditions— together with their attributes and behaviors—present within a given operational domain” (Hughes & Rolek, 2003). For example, in the small aircraft aviation industry, one is particularly interested in studying the single pilot performance in a Cessna 172. Thus, the cockpit of the Cessna 172 is the reality. The simulated cockpit would be the referent (display, radio, controls, pedals, chair, etc.); the models would be the computer-simulated models that produce this referent such as Microsoft Flight Simulator. Roza (2000) defined a referent as “a formal specification of all knowledge about reality plus indicators to determine the uncertainty levels and quality of this

94

Human Factors in Simulation and Training

knowledge to judge the confidence level of this referent data.” In other words, a referent is the abstract model of the reality that is relevant to simulated tasks, and serves as a basis for measuring the simulated environment and tasks. According to Roza (2000), a referent structure consists of the following elements:

1. A referent identification section. 2. A referent applicability section. 3. A referent developer and validation agent. 4. A referent knowledge sources section. 5. A real-world structural properties section. 6. A real-world parametric and behavioral data section.

Roza (2000) claims that elements 2, 3, and 4 contain the indicators needed to assess the fidelity of the simulation reference, and elements 5 and 6 can be used to measure the fidelity. By using the concept of a referent, fidelity measurement can be simplified. As Schricker et al. (2001) pointed out, if one tried to consider fidelity issues on a real-world system, it would become far too intricate. Simulations are developed to represent a certain object or group of objects in a certain domain, which can be regarded as the simulated models of a certain referent of the reality. Fidelity can be further broken down into sub-definitions that describe detailed elements that are categorized based on the different aspects of the simulated tasks or environment. The categorization among researchers varies a great deal and no consensus has been reached. The different fidelity element categorizations depended largely on the different fidelity experiments and different simulated tasks. For example, in Zhang’s (1993) study, simulation fidelity is broken down into six elements: hardware fidelity, software fidelity, fidelity for a whole tested system, fidelity of the pilot’s subjective impression, simulation mission (task) fidelity, and simulation experience fidelity. Hontvedt and Øvergård (2020) introduced a framework with three central approaches: physical and functional accuracy which was conceptualized as technical fidelity, problem-solving strategies, mental models and feelings which were conceptualized as psychological fidelity, and precise coordination and collaborative patterns within a team which was conceptualized as interactional fidelity. Hays and Singer (1989) identified the following three major types of variables that were believed to interact with fidelity: 1. Task-related variables, including task domain, task type, task difficulty, task frequency, task criticality, task learning difficulty, task practice requirement, and task skills, abilities, and knowledge. 2. Training environment and personal variables (e.g., purpose of training, instructional principles, and student population). 3. Device utilization variables that are the least understood and the most “potent” in determining training device effectiveness.

95

Simulation Fidelity

TABLE 3.1 Fidelity Definitions Word

References

Definition

Simulation fidelity

Gross et al. (1999); Alessi (1988)

Degree to which device can replicate actual environment, or how “real” the simulation appears and Feds

Physical fidelity

Allen (1986)

Visual–audio fidelity Equipment fidelity

Rinalducci (1996) Zhang (1993)

Motion fidelity

Kaiser and Schroeder (2003) Kaiser and Schroeder (2003)

Degree to which device looks, sounds, and feels like actual environment Replication of visual and auditory stimulus Replication of actual equipment hardware and software Replication of motion cues felt in actual environment Degree to which device replicates psychological and cognitive factors (i.e., communication, situational awareness) Replication of tasks and maneuvers executed by user

Psychological–cognitive fidelity Task fidelity

Functional fidelity Interactional fidelity

Zhang (1993); Roza (2000); Hughes and Rolek (2003) Allen (1986) Hontvedt and Øvergård (2020)

How device functions, works, and provides actual stimuli as actual environment The accuracy and relevance of participant collaboration and enactment of work tasks in simulator training

Table 3.1 lists a number of different aspects of fidelity and how they relate to each other as well as how researchers have attempted to describe each aspect. These definitions have been compiled from research that has focused on structuring and defining the various components of fidelity. Definitions of fidelity mainly fall within two categories: those that describe the physical experience and those that describe the psychological or cognitive experience. These two categories are briefly described in Table 3.1, and will be explored further.

PHYSICAL FIDELITY The most commonly discussed fidelity categorization is physical fidelity (Allen, 1986; Hays & Singer, 1989; Andrews et al. 1995). Not to be confused with the broader term of simulation fidelity, this term specifically deals with the physical properties of the simulation experience. To consider high physical fidelity, the simulator must have high visual–audio fidelity, or the look, sound, feel, and, in some cases, smell of the real aircraft (Allen, 1986). Physical fidelity encompasses other definitions of fidelity such as visual–audio fidelity, equipment fidelity, and motion fidelity. Hontvedt and

96

Human Factors in Simulation and Training

Øvergård (2020) named it technical fidelity after adding the functional accuracy, to describe “the degree of accuracy to which the technical and environmental cues are recreated by the simulator technology.”

VISUAL–AUDIO FIDELITY Visual–audio fidelity is the most frequently studied aspect of fidelity in the available literature (Rinalducci, 1996). It can be thought of as the level of visual and aural detail that the simulator displays. For example, a visually simulated airport can include several elements or artifacts that could be found when directly viewing a real-world airport. The runways, lights, hangers, control towers, ground vehicles, natural surroundings, and other airplanes could be included in the simulation. Also, communication between the control tower and pilot can also be simulated. A low-fidelity simulation would include only a few of these artifacts mentioned. For example, the low fidelity simulator may contain simply a runway and landing threshold lights, and no aural detail at all. As the simulation includes more artifacts, the level of fidelity will increase. Technology has advanced visual fidelity, specifically, to a very high level. Although a perfect copy of the real world is beyond reach at this point, advances in satellite and aerial mapping for the purpose of simulation are producing increasingly accurate representations of the real environment and geography. For example, it is now possible to simulate and practice military operations in a computer-generated city that displays the buildings, streets, obstacles, and other physical features that may be encountered in the actual battle environment. This mitigates the chance of surprises and can greatly aid in the planning and rehearsal of military operations without risking lives. Visual–audio fidelity in flight simulation can be decomposed into two basic parts: the “what” and the “where.” The “what” refers to the pilot’s central vision, which is referred to as the foveal and parafoveal regions. This area includes the items which the pilot is directly viewing (e.g., instruments and the windscreen). The “where” refers to visual stimulus that is in the pilot’s peripheral vision. The stimuli in the peripheral vision provide the pilot with a sense of speed, motion, situational awareness, and attitude of the aircraft. Both regions of vision have been shown to be important to the control of an aircraft (Kaiser & Schroeder, 2003). Unfortunately, computer-generated graphics often lack the power to generate visual stimulus in both regions at the same time to the same degree. Thus, a trade-off exists, and displays must decide what stimulus to show at what time. As technology advances and the power of image generators improve, this confound may become a nonissue. Additionally, instrument flight requires pilots to engage and view only their instruments. Thus, there is no need for “outside” visual stimuli. In this situation, an outside visual scene may even hinder the transfer of training by distracting the pilot. Bradley and Abelson (1995) investigated visual fidelity issues for desktop flight simulators. Although improvements in computing power and software advances have enabled high-animation in visual display, the most significant factor limiting

Simulation Fidelity

97

visual fidelity (quality of performance) is still the speed of frame refresh, or “frame rate.” In highly detailed visual scenes, the computational demand for visual frame rate is still the bottleneck for fast visual feedback.

EQUIPMENT FIDELITY To consider high physical fidelity, proper equipment fidelity must also be present. Equipment fidelity refers to the extent to which a simulator can emulate or replicate the equipment being used, which includes all software and hardware components of the system (Zhang, 1993). Occasionally, it may become unfeasible or too costly to use actual equipment for the simulation, in which case a replica or substitute may be used in its place. It is important, however, to maintain a certain degree of equipment fidelity for training purposes. Reed (1996) studied equipment fidelity when trying to simulate an aerial gunner station of a helicopter. During the simulation, the actual weapons system appeared to interfere with the simulation equipment, and therefore had to be replaced by a replica. Using similarly shaped and weighted equipment in combination with parts of the actual weapons system, Reed was able to preserve some equipment fidelity without endangering the training effectiveness.

MOTION FIDELITY The role of motion fidelity is another important component of physical fidelity. It can be defined as the degree to which a simulator can reproduce the sense of motion felt by humans in the operational environment. The use of motion does increase the physical fidelity and realism of the simulation, but the benefits realized toward measurable transfer of training are minimal and insignificant. At best, empirical research on motion simulation has indicated that the addition of motion provides very limited benefits. For example, reproduction of the movement a pilot feels when banking for a standard turn adds to the realism experienced by the pilot, but has not been shown to improve the ability of the pilot to perform that turn in the real world. However, the motion component of simulation has been found to be quite critical for the training of pilots for certain types of aircraft. For example, military fighter pilots often rely heavily on motion cues to perform complicated maneuvers in jet airplanes (Thomas, 2004). Motion also has its limitations. The brain, for example, can often be tricked into sensing motion where motion does not exist. Motion, it seems, provides very little to overall training effectiveness (Garrison, 1985; Ray, 1996). It also does not reduce the real flight time necessary to reach proficiency when performing many tasks in the real world such as the turning maneuver mentioned earlier (Ray, 1996). Furthermore, Advani and Mulder (1995) argue that reproduction of motion cues with ground-based flight simulators is principally impossible due to the kinetic limitations inherent in the motion system. Simulators may be capable of reproducing the banking angle that a real aircraft may encounter in an operational setting, but they have not been able to

98

Human Factors in Simulation and Training

sufficiently produce and maintain the centrifugal forces experienced when actually executing the banking maneuver in the real world.

PSYCHOLOGICAL–COGNITIVE FIDELITY Beyond the look and feel of a simulation, there exists another component of fidelity that results in the robustness (or lack thereof) of the psychological and cognitive experience that a person receives from being in the simulator. This component is known as psychological–cognitive fidelity. Psychological–cognitive fidelity is the extent to which psychological and cognitive factors are replicated within the simulation (Kaiser & Schroeder, 2003), and is related to an individual’s problem-solving strategies and the establishment and use of mental models (Hontvedt & Øvergård, 2020). The result of the degree of psychological–cognitive fidelity present within a simulation is the degree to which the user is psychologically and cognitively engaged in the same manner when compared to the degree to which the actual equipment would engage the user. This kind of fidelity involves humans’ ability to perform the cognitive aspects of tasks, including decision-making, situation awareness, problem-solving, and sense-making (Hontvedt & Øvergård, 2020). The cockpit environment is a demanding environment, requiring the pilot to constantly monitor flight systems and instrumentation, looking for failures, and maintaining a flight plan (Wickens et al., 2004). As the learning situation is important for psychological–cognitive fidelity, aspects like simulator immersiveness, the designed task, guidance, and social affordances need to be noted (Hontvedt & Øvergård, 2020). To consider a simulation with high psychological and cognitive fidelity, the simulator must require the same attentional resources from the pilot and produce similar psychological effects such as stress and workload. Research concerned with cognitive and psychological fidelity focuses on those and other factors that affect performance. It is known that too much stress may cause critical performance decrement in flight. If, when immersed in a simulation environment, a user experiences symptoms of stress similar to those felt in the operational setting, it is generally accepted that some level of psychological– cognitive fidelity has been achieved.

OTHER FIDELITY Task fidelity is the degree to which a simulator replicates the tasks involved in the actual environment (Zhang, 1993; Roza, 2000; Hughes & Rolek, 2003). Flight training devices must act like actual airplanes; therefore, a pilot must be able to “fly” the simulator just as he or she would fly an aircraft. This means that all tasks that need to be executed in an aircraft must be done in the same fashion when in the simulator. The extent to which these tasks are simulated may be an issue for future research. Not all tasks may need to be replicated for all training exercises, and some may be isolated to investigate specific problems or issues. When training for cockpit communication, for example, the tasks involving communication must be exactly the

Simulation Fidelity

99

same as those in the operational setting. Other tasks (e.g., stick and rudder tasks) do not need to be the same, or can be eliminated altogether, if the objective of the training does not include these tasks. Much like task fidelity, functional fidelity is also important. Functional fidelity can be described as how the simulator reacts to the tasks and commands being executed by the pilot (Allen, 1986). Not only does the pilot have to react and fly as he or she would in a real aircraft but the simulator must also react and maneuver as a real aircraft would in an operational setting. This ensures operational correctness and accuracy, as well as adding realism and believability to the simulation. When a trainee pulls back on the yoke, for example, the simulator must react as if the aircraft were pulling up and climbing in altitude. This fidelity, along with task fidelity is essential for training effectiveness and positive transfer of training to occur. Interactional fidelity is kind of different as it focused on the collaborative and coordinating patterns of a socio-technical system (Hontvedt & Øvergård, 2020). Interactional fidelity was described as “the accuracy and relevance of participant collaboration and enactment of work tasks in simulator training” (Hontvedt & Øvergård, 2020). Hutchins and Klausen (1996) used a flight simulator to discover the interactions between pilots in the flight deck which showed a high interactional fidelity as pilots’ cognitive efforts to collaborate. Pilots need to learn how to work with others in the same way that they can expect in a real flight. Simulations with high interactional fidelity can also provide the recreation of social interaction as aiding communication between different entities in the system (Hontvedt & Øvergård, 2020). Simulator fidelity is a complex subject that includes a number of factors. It is important to understand that these dimensions are not mutually exclusive; there is a large degree of overlap. Although expensive, high-fidelity devices are often used for advanced pilot training. However, research has shown that high fidelity may not be necessary to produce effective training results (Connolly et al., 1989; Hays et al., 1992; Duncan & Feterle, 2000). Future research should focus on the specific knowledge to determine exact fidelity requirements. A significant related topic is that of fidelity measurement.

MEASURING FIDELITY Although it is well-accepted that fidelity is the degree of similarity between the simulation and reality, it is critical to have a detailed and precise capability to measure that fidelity. Due to the complex nature of simulated tasks, different definitions of fidelity can lead to different measures of fidelity, and thus cause inconsistency among research results. Yu (1997) points out that simulation fidelity needs to be defined on the basis of “application purpose and technical possibility.” Each aspect of fidelity must be verified in a way that is both consistent and specific. Equipment fidelity can be verified by checking whether the simulator performance is within some predetermined accuracy compared with real aircraft. In other words, visual fidelity can be measured by some consistent and specific visualization standard such as resolution.

100

Human Factors in Simulation and Training

In terms of fidelity quantification, there are several types of metrics available. Roza (2000) found all these metrics either subjective or qualitative in nature, which is not good enough for current simulation requirements. Existing fidelity metrics can be classified either as a singular metric or a set of metrics that can be statistically combined to create a meaningful multidimensional metric. Objective measurement of simulation fidelity attempts to compare the simulated objects with the corresponding referent or real-world environment. Due to the level of complexity involved, especially with complex simulation setup and tasking, it is nearly impossible to count and compare every single element of the simulation. To better illustrate this picture, take a look at your surroundings, either an office or a room. Imagine that your task is to count every object around you to compare every feature and detail with the simulated environment. You will have some idea of the difficulties involved in accomplishing that task. Thus, exact measure of realism is not feasible at this time, and is considered by some to be “a goal which can never be accomplished” (Roza et al., 2001). It is practically impossible to count everything, or know everything about the reality or referent due to (1) the high degree of uncertainty, (2) the overwhelming information involved, (3) complicated attributes and behaviors associated with the reality or referent, and (4) human limitations needed to observe and explain real-world information. A recent report on Fidelity Definition and Metrics (FDM-ISG) attempts to specify the fidelity requirements in a formal way (Gross and Freeman, 1997; Roza et al., 2001). Simulation fidelity is defined as “The degree to which a model or simulation reproduces the state and behavior of a real world object or perception of a real world object, feature, condition or standard in a measurable or perceivable manner.” The importance of fidelity measurement is also addressed by the FDM-ISG (Gross et al., 1999) as “what aspect should be simulated and how to observe the simulation purpose and objectives best.” These fidelity requirements are essential for simulation system design because the fidelity requirements will ultimately affect the simulation context, purpose, and hardware and software requirements, and thus affect the trade-off results between cost and achieved transfer of training. Research on simulation has been primarily focused on hardware and software development, which is targeted at “the ultimate display” to produce the real-time simulated environment. What is the minimum fidelity that is required to achieve the required level of transfer of training? To find the most appropriate level of fidelity needed for the simulation tasks, one still needs to be able to accurately assess the existing or proposed level of fidelity. The most common methods of fidelity measurement are mathematical modeling, research experiments, and rating methods (Schricker et al., 2001; Kaiser & Schroeder, 2003).

THE MATHEMATICAL MODEL For years, research on fidelity quantification focused mostly on the objective mathematical formulation. Schricker et al. (2001) offer a thorough review of mathematical models on fidelity measurements.

Simulation Fidelity

101

Subjective Methods Subjective methods use expert opinions (including developers and users) to determine the degree of fidelity. Clark and Duncan proposed one of the simplest methods (Schricker et al., 2001). In this model, fidelity is measured by a binary scoring system. Simulation conditions are evaluated by either a “0” or “1,” with “0” meaning simulation does not duplicate the real-world conditions, and “1” indicating that the simulation does reproduce the real-world conditions. Averaging those ratings together provides an assessment of the overall fidelity. Although the simplicity is appealing, this method is subject to the same rating issues as any other subjective form of rating. An arbitrary judgment, or guess, can significantly affect the fidelity score. A more advanced mathematical model is proposed by Gross and Freeman (Schricker et al., 2001). The model is based on four theorems: I. 0 ≤ F(A) ≤ 1 II. If F(A) = 1, then A ≡ R III. F(A) ≥ F(Meta A) IV. F(A) + F(B) = min(F(A), F(B)) where A and B are models of interest, F (A) is the fidelity of A, Meta A is a model of referent (including A itself), and R is the referent of A. The simple formula for determining the overall value of fidelity of a simulation system is as follows: Fs 5oFiWi, where Fi is the fidelity of each referent characteristics and Fs is the fidelity of the entire simulation system, Wi is the relative importance rate of characteristics i. This formula, however, contradicts Theorem IV (Schricker et al., 2001). The main reason for this contradiction is because it did not clearly define the set, operation on the set, and function. Liu and Vincenzi (2004) modified this model definition as follows. Let set S be defined as the following: S = {A : A = metai R} (where R is the referent of the realworld group of objects and S is the group of all possible simulation models of referent R). Then, the function of fidelity is defined on set S as F: S → [0, 1]. Furthermore, we define the following operations for A ∈ S and B ∈ S, A ⊕ B = A ∪ B (one can easily prove that A + B is commutative and associative). Based on these notations, Gross and Freeman’s (1997) model can be modified as follows: I, II, and III are still true and IV becomes min(F(A), F(B)) ≤ (A ⊕ B) ≤ max(F(A), F(B)) Gross and Freeman stated that the fidelity of any simulated system is equal to the fidelity of the individual component of the simulation of the lowest fidelity. It can be argued that this might not be the case. If we add low-fidelity components to a high-simulation model, it will certainly affect the high-fidelity components, but it should also improve the low-fidelity components if this add-on interacts with the low-fidelity component. Readers who are interested in this model should refer to Schricker et al. (2001) for more details.

102

Human Factors in Simulation and Training

With models like this, we have a more comprehensive mathematical framework in place to facilitate the assessment of fidelity. Several other attempts to measure fidelity mathematically currently exist. Interested readers should refer again to Schricker et al. (2001) for more details.

FIDELITY EVALUATION FRAMEWORKS It can be argued that although mathematical modeling is beneficial to an understanding of the concept of fidelity, it has little practical implication for actual measurements due to the complexity and uncertainty involved. Researchers are attempting to develop other alternatives to measure fidelity in the field. One way of doing this is to utilize evaluation framework (Roza et al., 2001; Schricker et al., 2001). Figure 3.1 illustrates a generic model of measurement framework modified from Schricker et al. (2001). By using this framework, all simulation task-critical objects can be identified, as well as their associated behaviors and attributes for the referent. By comparing the level (percentage) of the corresponding objects from a simulation model, the fidelity can be estimated quantitatively. This framework is based on the assumption of Perato’s law (20% of the elements contribute 80% of the training effects in simulation). Roza (2000) summarized existing research on simulation fidelity, especially on fidelity characterization and quantification. Based on his study, a preliminary fidelity theory and a practical tool called Fidelity Management Process Overlay Model (FiMO) was proposed to assess and quantify simulation fidelity. After investigating the distributed simulation fidelity requirement for the U.S. Department of Defense’s Defense Modeling and Simulation Office (DMSO) High-Level Architecture (HLA), Roza (2000) found that although HLA is well-defined by the Federation Development

FIGURE 3.1  A conceptual illustration of measurement procedure. (Adapted from Schricker, B., Franceschini, R., and Johnson, T., “Fidelity evaluation framework,” Proceedings of the 34th Annual Simulation Symposium, 2001.)

Simulation Fidelity

103

and Execution Process (FEDEP), it focuses primarily on technological aspects and cannot answer many questions that arise regarding fidelity or fidelity quantification. A systematic and structured way is needed to efficiently characterize aspects of fidelity. FiMO is one model employed for this purpose. This approach maps to the FEDEP framework and provides a process view for characterization of fidelity issues along simulation development stages and activities. The basic framework consists of five major, iterative activities. For detailed information on this approach, readers can refer to Roza’s (2000) paper. This framework is believed to have the ability to handle large amounts of fidelity data in a progressive manner. Bell and Freeman (1995) developed a draft Fidelity Description Requirement (FDR) they believed can be used to quantify simulation fidelity. An assessment process was proposed, and taxonomy was developed, in a hierarchical format. The top level is the simulation resource consisting of a combination of hardware and software solutions (i.e., a Cessna 172 Flight Simulator). Level 2 is the fidelity domain (i.e., physical fidelity or visual fidelity). Level 3 is the capability level, level 4 is the implementation specific instantiations of the capability, level 5 consists of individual characteristics of an implementation, and level 6 defines the measurements for level 5. Bell and Freeman (1995) also discussed the possibility of using Fuzzy logic as a means to help quantify fidelity as accurately as possible, taking into account unknowns and assumptions associated with characteristics of fidelity. This approach may be appropriate as there are a number of uncertainties involved. Fidelity can also be measured by other indirect measurements such as by the evaluation of human performance. It is assumed that the function of fidelity is objective, i.e., for one referent; two simulation models have the same transfer of training effect if these two simulated models have the same fidelity. (Please note the opposite might not be true.) As the ultimate goal of simulation is to transfer the skills gained in training to the real-world situation and the objective measurement of fidelity is far more intriguing, the measurement of human task performance would be a good metric for any application that mainly targets transfer of training in the real world (Mania et al., 2003). Although human performance assessment alone cannot provide a quantitative assessment of simulation fidelity, it can provide a measure or indication of the relative efficiency of different simulated models for the same referent. It is more intuitive and hands-on than many other methods, and this measurement is widely accepted and used (Mania et al., 2003). Lehmer and Chung (1999) applied Image Dynamic Measurement System (IDMS2) to verify simulation fidelity. Measurement of delay in the visual system response time was used to assess the fidelity.

FIDELITY AND TRANSFER OF TRAINING It is natural to assume that the higher the level of fidelity, the higher degree of transfer of training will occur. Based on the “identical elements” theory by Thorndike (1903), this notion is still strongly held by simulator designers and industries today. Thorndike argued that there would be transfer between the first task (simulation) and the second task (real world) if the first task contained specific component activities

104

Human Factors in Simulation and Training

that were held by the second task. There are a number of theoretical studies (Noble, 2002) that support the notion that higher fidelity will produce higher degrees of training transfer. The “Alessi Hypothesis” states that there is a certain point at which adding more fidelity does not transfer training at the same rate as early or beginning training (Alessi, 1988). Some proposed a U-shaped curve, whereas some theories proposed a normal curve. But according to Alessi (1988), experimental research has not provided sufficient evidence to support this high-fidelity notion as theory predicted. Some results even indicate that lower fidelity has an advantage. Alessi (1988) believed that the explanation for failing to show the high-fidelity advantage is that (1) high fidelity also means high complexity, which will require more cognitive skills, thus increasing trainee’s workload, which will, in turn, impede learning; and (2) proven instructional techniques, which improve initial learning, do not depend upon highfidelity components which, in turn, tend to lower the overall fidelity of the simulation. Alessi (1988) proposed that the relationship between learning and fidelity is nonlinear and also dependent on other factors such as the trainee experience. For different trainees, there are different learning curves, and all these curves are nonlinear and different. According to Alessi (1988), when the level of fidelity is increased, the corresponding change in transfer of training depends largely on the trainee’s characteristics and ability to respond to this increase in fidelity. For novices, initial learning is the primary focus, and for experts, who are well-versed in the initial knowledge needed to perform the job or individual tasks, transfer of training related to task-specific knowledge, skills, and abilities is essential. Actual assessment of fidelity is extremely difficult to obtain. Alessi gave an in-depth fidelity analysis of four different types of simulation to further investigate this effect: (1) situational simulation, (2) procedural simulation, (3) process simulation, and (4) physical simulation. Four dimensions of fidelity were identified and defined for this analysis: (1) the underlying model, (2) presentation, (3) user actions, and (4) system feedback. It was found that for different trainees and different types of simulation, the requirement for fidelity varied greatly. As an instance of procedural simulation, flight simulators need to have high fidelity of the presentation, actions, and system feedback to result in significant increases in transfer of training efficiency. With respect to the relationship between fidelity and transfer of training, other studies have also demonstrated that the law of diminishing returns holds true. This parallels a hypothesis put forth by Roscoe in 1971, where Roscoe hypothesized that there will be diminishing returns in transfer of training as the amount of simulator training increases. Thus, the first hours of simulation training have high amounts of positive transfer of training information, and latter hours will have lower amounts of positive transfer of training information (Roscoe, 1980). Combining these two theories (Roscoe, 1971; Alessi, 1988), we can conclude that adding more fidelity, especially in the later stages of training, produces minimal gain in transfer of training. If this is the case, increasing fidelity may not always be necessary. Fidelity is expensive, and eliminating certain key elements of fidelity that do

Simulation Fidelity

105

not necessarily increase transfer of training will reduce the cost of production. The question is: Which cues can we eliminate without reducing the amount of transfer? This issue still remains vague and less understood. To answer this question, more research is necessary. This research would need to directly compare simulation training to traditional training done in the actual aircraft or real-world environment. The main obstacle that researchers face is cost. It is extremely expensive to conduct this type of research due to the high operational costs associated with real-world systems such as commercial aircraft, combat aircraft, surface ships, submarines, and other military and civilian systems. Additional issues revolve around the possible disruption of normal training. It may be difficult to find flight schools, student pilots, or military personnel and facilities that would participate, especially because training time is already scarce, necessary, time-consuming, and expensive.

SUMMARY It is clear that simulation can provide great benefits regardless of the levels of fidelity (D’Asta et al., 2019; Labrague et al., 2019; Wenlock et al., 2020; Willie et al., 2016). The standard design approach for simulators is to incorporate the highest possible level of fidelity and hope for the best possible transfer of training outcome. This is the direct result of the belief (or assumption) that high levels of fidelity must equate to high levels of transfer of training despite the fact much evidence exists to indicate that this might not necessarily be true. It is important to focus on the goal of simulation training, which is the transfer or translation of skills learned in one arena to another. The ever-present debate in the realm of simulation centers around how much fidelity is necessary to achieve a desired degree of training transfer. In other words, how “real” does the simulation need to be in order for the trainee to properly execute the skills learned in simulation to the real world? This question can only be answered with “it depends.” As discussed previously, it depends on many factors, including the individual trainees, their levels of skill, and the instructor. It also depends on the particular skills to be learned and transferred. Low-fidelity simulators maximize the initial learning rate of novice pilots and minimize cost, whereas costly, high-fidelity simulators predict the real-world in-flight performance of expert pilots (Kinkade & Wheaton, 1972; Fink & Shriver, 1978; Hays & Singer, 1989). This may be true because novice pilots may become overwhelmed by high fidelity, whereas experts will not. Initial pilot training focuses more on becoming familiar with the controls and the layout of the instruments, whereas experts will concentrate on more advanced operational aspects. If economic constraints did not enter the picture, the question of the level of fidelity needed to obtain maximal transfer of training would not be an issue. Simulation designers would always include every possible attribute, increasing the level of fidelity as high as technology would allow. Unfortunately, financial resources are limited. To get the most value from simulation, it is important to eliminate attributes that do not aid in the transfer of training. In other words, it has become imperative to get the most “bang for the buck.” The goal is to give the trainee enough simulation

106

Human Factors in Simulation and Training

fidelity to facilitate learning, without attaching costly and unnecessary simulation options and characteristics. There is no easy answer to this question at this time. Researchers in the areas of human factors, psychology, computer science, engineering, and many other fields involved in the simulation industry need to work closely together to conduct systematic, multidisciplinary research to achieve maximal training benefit while controlling and minimizing cost.

REFERENCES Advani, S.K., and Mulder, J.A., 1995, Achieving high-fidelity motion cues in flight simulation, AGARD FVP Symposium on Flight Simulation: Where are the Challenges, Braunschweig, Germany. Akalin, A., and Sahin, S., 2020, The impact of high‐fidelity simulation on knowledge, critical thinking, and clinical decision‐making for the management of pre‐eclampsia, International Journal of Gynecology and Obstetrics, 150(3), 354–360. https://doi​.org​ /10​.1002​/ijgo​.13243 Alconero-Camarero, A.R., Sarabia-Cobo, C.M., Catalán-Piris, M.J., González-Gómez, S., and González-López, J.R., 2021, Nursing students' satisfaction: A comparison between medium- and high-fidelity simulation training, International Journal of Environmental Research and Public Health, 18(2), 804. https://doi​.org​/10​.3390​/ijerph18020804 Alessi, S.M., 1988, Fidelity in the design of instructional simulations, Journal of ComputerBased Instruction, 15(2), 40–47. Allen, J.A., 1986, Maintenance training simulator fidelity and individual difference in transfer of training, Human Factors, 28(5), 497–509. Andrews, D., Carroll, L., and Bell, H., 1995, The future of selective fidelity in training devices, Educational Technology, 35, 32–36. Bell, P.M., and Freeman, R., 1995, Qualitative and quantitative indices for simulation systems in distributed interactive simulation, IEEE Proceedings of ISUMA-NAFIPS, 745–748. Bradley, D.R., and Abelson, S.B., 1995, Desktop flight simulators: Simulation fidelity and pilot performance, Behavior Research Methods, Instruments, & Computers, 27(2), 152–159. Bur, A.M., Gomez, E.D., Newman, J.G., Weinstein, G.S., O'Malley, B.W., Rassekh, C.H., and Kuchenbecker, K.J., 2017, Evaluation of high‐fidelity simulation as a training tool in transoral robotic surgery, The Laryngoscope, 127(12), 2790–2795. https://doi​.org​/10​ .1002​/ lary​.26733 Chamberland, C., Hodgetts, H.M., Kramer, C., Breton, E., Chiniara, G., and Tremblay, S., 2018, The critical nature of debriefing in high‐fidelity simulation‐based training for improving team communication in emergency resuscitation, Applied Cognitive Psychology, 32(6), 727–738. https://doi​.org​/10​.1002​/acp​.3450 Connolly, T.J., Blackwell, B.B., and Lester, L.F., 1989, A simulator–based approach to training in aeronautical decision making, Aviation, Space, and Environmental Medicine, 60, 50–52. D’Asta, F., Homsi, J., Sforzi, I., Wilson, D., and de Luca, M., 2019, “SIMBurns”: A highfidelity simulation program in emergency burn management developed through international collaboration, Burns, 45(1), 120–127. Duncan, J.C., and Feterle, L.C., 2000, The use of personal computer-based aviation training devices to teach aircrew decision-making, teamwork, and resource management, Proceedings of IEEE 2000 National Aerospace and Electronics Conference, Dayton, OH, 421–426.

Simulation Fidelity

107

Ferguson, S.W., Clement, W.F., Hoh, R.H., and Cleveland, W.B., 1985, Assessment of simulation fidelity using measurements of piloting technique in flight: Part II, 41st Annual Forum of the American Helicopter Society, Ft. Worth, TX, 1–23. Field, E.J., Armor, J.B., and Rossitto, K.F., 2002, Comparison of in-flight and ground based simulations of large aircraft flying qualities, AIAA 2002-4800, AIAA Atmospheric Flight Mechanics Conference and Exhibit, Monterey, CA. Fink, C., and Shriver, E., 1978, Simulators for Maintenance Training: Some Issues, Problems and Areas for Future Research (Tech. Rep. No. AFHRL-TR-78-27), Lowery Air Force Base, CO: Air Force Human Resources Laboratory. Psychology, 46(4), 349–354. Fortin, M., 1989, Cost/performance trade-offs in visual simulation, Royal Aeronautical Society Conference on Flight Simulation: Assessing the Benefits and Economics, London, 19.1–19.15. Garrison, P., 1985, Flying Without Wings: A Flight Simulation Manual, Blue Ridge Summit, PA: TAB Books. Gross, D.C., and Freeman, R., 1997, Measuring fidelity differentials in HLA simulations, Fall 1997 Simulation Interoperability Workshop. Gross, D.C., Pace, D., Harmoon, S., and Tucker, W., 1999, Why fidelity? Proceedings of the Spring 1999 Simulation Interoperability Workshop. Haddeland, K., Slettebø, Å., Svensson, E., Carstens, P., and Fossum, M., 2019, Validity of a questionnaire developed to measure the impact of a high‐fidelity simulation intervention: A feasibility study, Journal of Advanced Nursing, 75(11), 2673–2682. https://doi​.org​/10​.1111​/jan​.14077 Hays, R.T., and Singer, M.J., 1989, Simulation Fidelity in Training System Design, New York: Springer. Hays, R.T., Jacobs, J.W., Prince, C., and Salas, E., 1992, Flight simulator training effectiveness: A meta-analysis, Military Psychology, 4(2), 63–74. Hontvedt, M., and Øvergård, K.I., 2020, Simulations at work – A framework for configuring simulation fidelity with training objectives, Computer Supported Cooperative Work, 29(1–2), 85–113. https://doi​.org​/10​.1007​/s10606​- 019​- 09367-8 Hughes, T., and Rolek, E., 2003, Fidelity and validity: Issues of human behavioral representation requirements development, Proceedings of the 2003 Winter Simulation Conference, New Orleans, LA. Hutchins, E., and Klausen, T., 1996, Distributed cognition in an airline cockpit, in Cognition and Communication at Work, Engeström, Y., and Middleton, D., Eds., Cambridge University Press, 15–34. Kaiser, M.K., and Schroeder, J.A., 2003, Flights of fancy: The art and science of flight simulation, in Principles and Practice of Aviation Psychology, Vidulich, M.A and Tsang, P.S., Eds., Mahwah, NJ: Lawrence Erlbaum Associates, 435–471. Kinkade, R., and Wheaton, G., 1972, Training device design, in Human Engineering Guide to Equipment Design, Van Cott, H. and Kinkade, R., Eds., Washington, DC: Department of Defense, 668–699. Labrague, L.J., McEnroe‐Petitte, D.M., Bowling, A.M., Nwafor, C.E., and Tsaras, K., 2019, High‐fidelity simulation and nursing students’ anxiety and self‐confidence: A systematic review, Nursing Forum, 54(3), 358–368. Lehmer, R.D., and Chung, W.W.Y., 1999, Image dynamic measurement system (IDMS2) for flight simulation fidelity verification, American Institute of Aeronautics and Astronautics, 137–143. Liu, D., and Vincenzi, D.A., 2004, Measuring simulation fidelity: A conceptual study, Proceedings of the 2nd Human Performance, Situation Awareness and Automation Conference (HPSAA II), Daytona Beach, FL, 160–165.

108

Human Factors in Simulation and Training

Mania, K., Troscianko, T., Hawkes, R., and Chalmers, A., 2003, Fidelity metrics for virtual environment simulations based on spatial memory awareness states, Presence, Teleoperators and Virtual Environments, 12(3), 296–310. Nemire, K., Jacoby, R.H., and Ellis, S.R., 1994, Simulation fidelity of a virtual environment display, Human Factors, 36(1), 79–93. Noble, C., 2002, The relationship between fidelity and learning in aviation training and assessment, Journal of Air Transport, 7(3), 34–54. Parrish, R.V., McKissick, B.T., and Ashworth, B.R., 1983, Comparison of Simulator Fidelity Model Predictions with In-simulator Evaluation Data (Technical Paper 2106), Hampton VA: NASA Langley Research Center. Ray, P.A., 1996, Quality flight simulation cueing—Why? Proceedings of the AIAA Flight Simulation Technologies Conference, San Diego, CA, 138–147. Reed, E.T., 1996, The aerial gunner and scanner simulator “affordable virtual reality training for aircrews,” Training-Lowering the Cost, Maintaining Fidelity: Proceedings from the Royal Aeronautical Society, London, 18.1–18.15. Rehmann, A.J., Mitman, R.D., and Reynolds, M.C., 1995, A Handbook of Flight Simulation Fidelity Requirements for Human Factors Research (DOT/FAA/CT-TN95/46), WrightPatterson AFB, OH: Crew System Ergonomics Information Analysis Center. Rinalducci, E., 1996, Characteristics of visual fidelity in the virtual environment, Presence, 5(3), 330–345. Roscoe, S.N., 1971, Incremental transfer effectiveness, Human Factors, 13(6), 561–567. https://doi​.org​/10​.1177​/001872087101300607 Roscoe, S.N., 1980, Aviation Psychology, Ames, IA: Iowa State University Press. Roza, M., 2000, Fidelity considerations for civil aviation distributed simulations, Proceedings of the AIAA Modeling and Simulation Technologies Conference and Exhibit, Denver, CO. Roza, M., Voogd, J., and Jense, H., 2001, Defining, specifying and developing fidelity referents, Proceedings of the 2001 European Simulation Interoperability Workshop, London. Schricker, B., Franceschini, R., and Johnson, T., 2001, Fidelity evaluation framework, Proceedings of the IEEE 34th Annual Simulation Symposium, Seattle, WA. Scott, A., and Gartner, A., 2019, 9 Low fidelity simulation in a high fidelity world, Postgraduate Medical Journal, 95(1130), 687–688. https://doi​.org​/10​.1136​/postgradmedj​-2019​ -FPM.9 Thomas, T.G., 2004, From virtual to visual and back? AIAA Modeling and Simulation Technologies Conference and Exhibit, Providence, RI: AIAA Paper 2004–5146. Thorndike, E.L., 1903, Educational Psychology, New York: Lemke & Buechner. Wenlock, R.D., Arnold, A., Patel, H., and Kirtchuk, D., 2020, Low-fidelity simulation of medical emergency and cardiac arrest responses in a suspected COVID-19 patient–an interim report, Clinical Medicine, 20(4), e66. Wickens, C.D., Lee, J.D., Liu, Y., and Becker, S.E.G., 2004, An Introduction to Human Factors Engineering, 2nd ed., Upper Saddle River, NJ: Pearson Prentice Hall. Willie, C., Chen, F., Joyner, B.L., and Blasius, K., 2016, Using high-fidelity simulation for critical event training, Medical Education, 50(11), 1161–1162. Yu, Z.-G., 1997, Inquiry into concepts of flight simulation fidelity, in First International Conference on Nonlinear Problems in Aviation and Aerospace Proceedings, Sivasundaram, S., Ed., USA, 679–685. Zhang, B., 1993, How to consider simulation fidelity and validity for an engineering simulator, American Institute of Aeronautics and Astronautics, 298–305.

4

Transfer of Training Dahai Liu, Jacqueline McSorley, Elizabeth Blickensderfer, Dennis A. Vincenzi, and Nikolas D. Macchiarella

CONTENTS Introduction............................................................................................................. 109 Transfer of Training: Terms and Concepts.............................................................. 110 Positive Transfer............................................................................................ 111 Negative Transfer........................................................................................... 111 Near Transfer................................................................................................. 111 Far Transfer.................................................................................................... 112 A Model of Factors Affecting the Transfer of Training.......................................... 112 Training Input Factors.................................................................................... 113 Training Outputs............................................................................................ 115 Conditions of Transfer................................................................................... 115 Dynamic Models of Training Transfer.................................................................... 115 Research Methods................................................................................................... 116 Transfer of Training Performance Measurement.................................................... 118 Objective Measures........................................................................................ 118 Subjective Measures...................................................................................... 118 Selecting Performance Measures............................................................................ 118 Using Performance Measures to Indicate Transfer........................................ 118 Experimental Design............................................................................................... 120 Forward Transfer Study................................................................................. 120 Backward Transfer Study.............................................................................. 120 Quasi-Experimental Study............................................................................. 121 Curve-Fitting Method............................................................................................. 121 Summary................................................................................................................. 122 References............................................................................................................... 122

INTRODUCTION Commercial aviation and the military have long reaped the rewards provided by the use of flight simulation to train pilots, and more recently, the healthcare industry has seen tremendous growth in use of simulation training. Using simulators for training maximizes the use of operational systems for revenue-producing activities and minimizes the use of operational systems for non-revenue-producing activities such as training. DOI: 10.1201/9781003401360-4

109

110

Human Factors in Simulation and Training

In addition, it eliminates the expense of aircraft fuel and maintenance costs associated with the training, and the loss of revenue, whereas the actual system is in use for training purposes (Moroney & Moroney, 1998). In the healthcare domain, simulations allow medical professionals to practice treatments and protocols without putting patients at risk; simulations in healthcare are often expensive, but the trade-off between economic cost and cost of patient lives is undeniably important (Ker et al., 2010). Additionally, simulation can save training time. Training time-savings occur due to trainers positioning the trainee into the exact situation required to learn specific skills. For example, if a flight instructor was attempting to teach a student how to land in a crosswind, it would no longer be necessary to perform multiple takeoffs to practice crosswind landings. In this situation, the trainer can stop the simulation and place the student pilot on final approach repeatedly until they become proficient at the targeted skill. Likewise, in the healthcare domain, simulation can provide practice opportunities for students or practitioners to encounter a wide range of patient symptoms without the need of a live human actually presenting with a specific set of symptoms. Also important, but more difficult to quantify, are increases in safety associated with using a simulated environment. In terms of flight training, mistakes made by a student pilot in flight simulation are regrettable, but not life-threatening. The instructor and student pilot can simply stop and reset the simulator to perform the flight task again. A clear example within in-flight training is unusual attitude recovery. In actual flight training, pilots are placed into unusual attitudes (e.g., extreme degrees of pitch, roll, and yaw) for training purposes (Federal Aviation Administration, 2006). All certified pilots have faced stressful and life-threatening situations during training and have recovered accordingly. However, when utilizing flight simulation, inexperienced pilots can encounter these dangerous situations without actually being in harm’s way, increasing safety. While it is clear that training via simulation offers numerous benefits, for simulation-based training to be successful, trainees must effectively apply learned knowledge, skills, and abilities gained from simulator training to their corresponding real-world task. In other words, transfer of training (ToT) must occur. This chapter describes the main concepts and theories pertaining to transfer of training with an emphasis on simulation-based training. We will first review the definition of transfer of training and discuss the factors involved. Next, we discuss research methods relating to transfer of training. Finally, we propose research issues regarding transfer of training for simulation-based training environments.

TRANSFER OF TRAINING: TERMS AND CONCEPTS The ultimate goal of training is the trainee being able to apply or “transfer” what was learned in training to the actual real-world setting. Transfer of training refers to the extent which knowledge, skills, and abilities learned from training programs are generalized or applied to real-world situations and to the maintenance of these knowledge, skills, and abilities over time on the job (Blume et al., 2010). Transfer of training can be classified into two types: positive and negative (Chapanis, 1996).

Transfer of Training

111

Positive Transfer Positive transfer occurs when an individual correctly applies knowledge, skills, and abilities learned in one environment (e.g., in simulation) to a different setting (in the case of aviation, this would be real flight) (Burke, 1997). Positive transfer is the goal of any type of training. When referring to transfer of training, it denotes a positive transfer of training unless a different interpretation is indicated.

Negative Transfer Negative transfer occurs when existing knowledge and skills (from previous experiences) impede proper performance in a different task or environment. For example, a skilled typist on a QWERTY keyboard would have difficulty using, or learning how to use, a non-QWERTY keyboard such as a Dvorak keyboard. Negative transfer develops from at least two related reasons: (1) system design changes and (2) a mismatch between a training system and the actual task. First, system design changes (e.g., controls and software menus) can create one type of negative habit transfer interference. Specifically, the task performer has experience performing the task set up in one manner and has developed a certain degree of automaticity. If a design change occurs, it is likely that the task performer will revert to performing the task according to the previous system (i.e., “habit interference”). Avoiding habit interference should be a major design goal (Chapanis, 1996). Second, if the training system procedures do not match those in the transfer environment, negative transfer is likely to occur. Consider a pilot who is trained to pull back on the yoke to lower the nose of the aircraft in a simulator, while in the actual aircraft, pulling back on the yoke actually raises the nose. In the actual aircraft, if the pilot pulled back aiming to lower the nose, negative transfer would have occurred. The end result would be a dangerous situation for the pilot, any passengers, and the aircraft. Thus, negative transfer occurs when the trainee reacts to the transfer stimulus with the correct response they learned during training; however, that response is incorrect in relation to the actual performance task—in this case, pulling back on the yoke thinking that this will lower the nose of the aircraft because that was what was trained, when in actuality it raises the nose.

Near Transfer Another element of training transfer lies in the type of transfer: near, far, or both (Noe, 2006). Near transfer utilizes training strategies that are identical to the situations the trainee will encounter on the job. Equipment usage represents a skill that benefits from near transfer (Noe, 2006). The theory of identical elements, fundamental to near transfer, dictates that the training should mirror the task in factors like equipment and environment (Thorndike & Woodworth, 1901). This type of training promotes near transfer by structuring the training as close to the task as possible (van der Locht et al., 2013).

112

Human Factors in Simulation and Training

Far Transfer Far transfer refers to the application of training to a more generalized environment (Noe, 2006). This occurs when the training environment and equipment may differ from what trainees will encounter in the real scenario. Far transfer is essential when the training cannot directly reflect on-the-job work, such as with interpersonal skills. The stimulus generalization approach supports the concept of far transfer by emphasizing broad principles to train rather than focusing on specific procedures (Noe, 2006). Cognitive theory of transfer proposes the use of both near and far transfer. It emphasizes specific, meaningful material as well as schemas that encourage effective storage of the general content. This theory also states that providing trainees with application assignments can increase the chances of long-term recall (Noe, 2006). Appropriate conditions for the application of the cognitive theory include nearly all types of training and environments.

A MODEL OF FACTORS AFFECTING THE TRANSFER OF TRAINING Over the past 30 years, the Baldwin and Ford (1988) model of transfer of training has been well-accepted in training research and practice. Due to its persistent impact, including that it has spawned expanded models (e.g., the Dynamic Model of Training Transfer (Blume et al., 2019)), it is the focus of the current chapter. As shown in Figure 4.1, the model depicts transfer of training in terms of training input factors, training outcomes, and conditions of transfer. This section will

FIGURE 4.1  A model of training transfer. (Adapted from Baldwin, T. and Ford, J., Personnel Psychol., 41, 63–105, 1988.)

Transfer of Training

113

review the Baldwin and Ford model with an emphasis on (1) research accomplished since publication of the model and (2) research performed in simulation and aviation studies.

Training Input Factors Starting with the left side of Figure 4.1, the “Training Input” factors include training design, trainee characteristics, and work environments. The model depicts each of these three input factors as having a direct influence on learning in the training environment. Additionally, the model also connects trainee characteristics and work environment characteristics directly with the transfer performance. Thus, those factors are thought to exert a direct influence on performance in the transfer setting. In terms of trainee characteristics, many research studies that suggest these characteristics affect training transfer efficiency (Smith-Jentsch, Salas, & Brannick, 2001). Numerous individual differences exist, including motivation, attitudes, and ability (e.g., cognitive and physical). In fact, Blume et al.’s (2010) meta-analysis found that cognitive ability was the number one predictor of training transfer. In an analysis specifically targeting flight simulation, Auffrey et al. (2001) argued that goal setting, planning, motivation, and attitudes were key factors in training effectiveness. In terms of motivation, a trainee must put forth effort to learn. A prerequisite for transfer is the trainee’s motivation to successfully complete the training—to acquire the new skills and knowledge (Colquitt et al., 2000). In terms of ability, the trainee must also have the raw ability to improve his or her skills. For example, a pilot who is working on shortening his or her takeoff distance must analyze the situation and realize that he or she needs to adjust the flaps accordingly (depending upon the type of aircraft) and increase speed sufficiently to gain enough lift. The trainee must possess the cognitive ability to understand that a change in specific elements is necessary to accomplish the desired change in performance. These are just two examples of individual characteristics that relate to training effectiveness. More research is needed regarding the impact of trainee characteristics on training effectiveness. Second, the training design or method plays a role in transfer of training. While an entire field of research on instructional systems design exists, of particular relevance to training transfer following simulation-based training are the principles of identical elements and the stimulus–response relationship. The first approach, “identical elements,” dates back to the turn of the 20th century. Thorndike (1903) put forth a notion that is still held by simulator designers today. Thorndike argued that there would be transfer between the first task (simulation) and the second task (real world) if the first task contained specific component activities that were held by the second task. This approach is entirely dependent on the presence of shared identical elements (Thorndike, 1903). Although outside the realm of flight simulation, an illustration of this principle can be found in athletics. If an individual plays softball and then tries out for the baseball team, he or she will be better-off than the person who has previously played only golf. All three sports share common elements (e.g., ball, striking stick, and grass playing field); however, softball shares far more elements with baseball than golf does (e.g., number of players, bases, scoring, umpires,

114

Human Factors in Simulation and Training

uniforms, fences, and dugouts). The more elements that are shared between the two environments, the better the transfer. Therefore, softball is a better form of simulation than golf for the training of baseball skills. This approach parallels the idea that simulators should duplicate the real-world situation to the greatest degree possible. Another principle for training design can be seen in terms of “stimulus and response.” In this respect, the idea is to examine the extent to which similarities exist between stimulus representation and the response demands of the training and those of the transfer task (Osgood, 1949). This perspective does not demand a duplication of elements. In contrast, the notion is that transfer of training can be obtained using training tasks and devices that do not duplicate the real world exactly, but that do maintain the correct stimulus–response relationship. Consider once again the example of athletics; golf transfers quite well to tennis, as the golf swing (low to high) and tennis hit (low to high) are similar. Even though the sports themselves are quite different (racquets versus golf clubs and holes), the stimulus response sequence for each of the sports is quite similar. In simulation-based training, two major categories relating to the training design are the fidelity/authenticity of the simulation (Kaiser, 2003; Hays & Singer, 1989) and the use of a proven scenario-based training model (Salas et al., 2006). First, simulation fidelity should be considered from the perspective of how well the simulation includes appropriate stimulus–response relationships that create a high degree of cognitive fidelity with regard to performing the simulated task. For example, cognitive fidelity in flight simulation addresses the extent that the simulation engages the pilot in the types of cognitive activities encountered in real flight (Kaiser, 2003). Second, research has shown that higher training effectiveness occurs when simulation-based training follows a scenario-based model, which links training requirements, simulated events, performance measures, and feedback (Blickensderfer et al., 2012). The third input factor in the model is work–environment characteristics. This refers to the overall organizational support for the learner, such as supervision, sponsorship, and subsequent reward for the training and skill development. It also refers to the skill that is being trained (i.e., task difficulty); some tasks are simpler and require little effort to transfer skills, whereas other tasks are difficult and require skills that are difficult to transfer and maintain (Blaiwes et al., 1973; Simon & Roscoe, 1984; Hays et al., 1992; Lathan et al., 2002). The simpler tasks or skills will be easier for a trainee to master and later transfer to the performance environment. Other work has focused on the organizational characteristics involved in training transfer. In a review regarding flight simulation, Auffrey et al. (2001) reported that the characteristics of the work environment were important factors in training transfer. Additionally, Awoniyi et al. (2002) investigated the effect of various work environment factors on training transfer. These authors discussed the importance of the person–environment fit; the notion is that transfer will depend on the degree of fit between a worker and the particular work environment. In other words, two workers may attend training and acquire equivalent knowledge, skills, and abilities. Depending on the degree of person–environment fit in the organization, one worker may show a significantly higher degree of training transfer than the other. Five

Transfer of Training

115

dimensions were studied: supervisory encouragement, resources, freedom, workload pressure, and creativity support. Results indicated that person–environment fit has a positive relationship with transfer of training and can be a moderate predictor for transfer of training.

Training Outputs The middle of the model shown in Figure 4.1, “Training Outputs,” focuses on the actual outputs of training or, in other words, the amount of learning that occurred during training and the amount retained after the training program was completed. The training outputs depend on the three inputs described above. In turn, the amount learned during training will directly influence ultimate training transfer.

Conditions of Transfer Finally, the right side of the model, “Conditions of Transfer,” refers to the post-training environment. At this stage, the learner is back in the actual work environment. Conditions of transfer include the real-world conditions surrounding the use, generalization, and maintenance of the knowledge, skills, and abilities learned in the training program; the degree to which the learner used the knowledge, skills, and abilities in a transfer setting; and the length of time the learner retained the knowledge, skills, and abilities. Blume et al.’s (2010) meta-analysis found moderate effect sizes for both transfer climate and support.

DYNAMIC MODELS OF TRAINING TRANSFER Although most transfer of training research focuses on immediate results of the training instruction, more than one author has explained that transfer does not occur immediately, but instead occurs in a series of stages through which trainees pass (Foxon, 1993; Blume et al., 2019). According to Foxon (1993) (see Figure 4.2), the degree of transfer increases progressively, whereas the chance to transfer failure gets lower. Although this process model was proposed for organizational training programs, it can likely be generalized to any kind of training, including simulation training. Additionally, Blume and colleagues (2019) developed an organizational training model that illustrates a cycle of transfer that builds on itself each time (see Figure 4.3). This dynamic model shows the impact on both trainee behaviors and performance each time the transfer process repeats. In summary, transfer of training is the combined result of input factors (characteristics of the trainee, training design, and work environment), the amount learned in training, and the conditions surrounding the transfer setting. Some of these factors are better understood than others. Simulation is one subfactor in this complex problem. Even with the best simulation available, if other variables do not exist in the appropriate manner, training will not be effective (it will not result in positive transfer). Overall, transfer of training remains a complex issue with numerous variables involved. Additional research is needed to further understand the differential

116

Human Factors in Simulation and Training

FIGURE 4.2  A model of the training transfer process. (Adapted from Foxon, M., Aust. J. Educ. Technol., 9(2), 130–143, 1993.)

impacts and the interaction effects of the variables. We now turn to a discussion of the methodologies underlying transfer of training studies.

RESEARCH METHODS How do researchers study transfer of training? Alternatively, how do designers validate their simulator as a training tool? It sounds straightforward enough: simply compare performance on the job before training and after training. Unfortunately, it is not that simple, and because of the complex nature of the problem, many unknowns still exist. Indeed, transfer of training research has been inconclusive due to the differences in concepts and definitions, differences in theoretical orientations, and methodological flaws. Perhaps the greatest problem is that transfer of training is difficult to measure (in terms of accurately determining whether transfer has occurred and how much has transferred). In other words, how transfer of training is conceptualized and when and how it is measured really do matter (Blume et al., 2010). Two major issues are involved in assessing training transfer: performance measurement and research design. A full review of these topics is beyond the scope of this chapter, and several other chapters in this volume describe and discuss performance

FIGURE 4.3  A model of the dynamic training transfer process. (Adapted from Blume, Ford, Surface, & Olenick, Human Resource Management Review, 29(2), 270–283., 2019.)

Transfer of Training 117

118

Human Factors in Simulation and Training

measurement. Since measuring training transfer is inherently linked to fundamental attributes of performance measurement for training effectiveness, we do offer some key terminology and issues.

TRANSFER OF TRAINING PERFORMANCE MEASUREMENT Human performance (e.g., job performance) can be assessed in many ways. The most commonly used methods are objective performance measures and subjective judgments.

Objective Measures Objective measures are accounts of various behaviors (e.g., number of errors) or the results of job behaviors (e.g., total passengers transported). Thus, objective measures are data that objectively reflect the trainee’s performance level. An obvious problem with objective measurements is that they cannot provide insight into cognitiverelated aspects of performance, such as workload, situational awareness, motivation, and attitudes. These variables are vital to transfer of training assessment because they are part of the transfer of training affected factors.

Subjective Measures Another approach is to use subjective measures, which are ratings given by an expert: examples include surveys and questionnaires. Numerous examples of subjective performance measures can be found in the training literature.

SELECTING PERFORMANCE MEASURES The selection of performance measures depends largely on the task. Indeed, it is rare for performance measures to be used in different tasks, and domains without at least some modification-training requirements are key (Hammerton, 1966; Bell et al., 2017). If the training purpose is to teach solely physical skills, objective measurement can be used as the primary tool. However, if cognitive training is involved, subjective performance assessment tools need to be applied to capture the cognitive skill learning and transfer. The notion of multiple measures also exists. For instance, Kraiger et al. (1993) advocated separate measures for cognitive, behavioral, and attitudinal factors, and the Kirkpatrick four levels of training evaluation model continue to be pervasive in the training and simulation literature (Kirkpatrick & Kirkpatrick, 2016). Combining the notions of multiple measures with the dynamic view of training transfer (e.g., Blume et al. (2019) suggest a need for multiple measures spread over multiple points in time.

Using Performance Measures to Indicate Transfer Finally, as shown in Tables 4.1 and 4.2, Hammerton (1966), Roscoe and Williges (1980), and Taylor et al. (1999) present a number of different calculations that can be

119

Transfer of Training

TABLE 4.1 Trials/Time to Transfer or Trial to Criterion (TTC) Is the Number of Training Trials that a Subject Attempted for a Given Task before Reaching the Criterion Level of Proficiency on that Task, Given the Following Variables Yc

The control group’s number of TTC for training conducted in aircraft

Yx X F L T C

The experimental group’s number of TTC for training conducted in aircraft The experimental group’s number of TTC for training conducted in the simulator The mean performance on the first simulator training trial The mean performance on the last simulator training trial The mean performance on the first post-transfer trial The mean first trial performance of the real-situation training

S

The stable performance in real-situation training

TABLE 4.2 Transfer of Training Formulas Name

Formula

Definition

Percent transfer

Yc - Yx ´100 Yc

Measures the saving of time or trials in aircraft training to criterion that can be achieved by use of a ground simulator

Transfer effective ration (TER)

Yc - Yx X

Measures the efficiency of the simulation by calculating the saving of trials per time

First shot performance

fsp = ×

Training retained (TR)

tr = ×

F -T F-L

C -T C-S

Measures how much training will be retained on first transference to the real situation Measures how much training is retained on the first post-transfer trial from simulator compared with that gained from real world

Source: Taylor, H.L. et al., 1997; Taylor, H.L., Lintern, G., Hulin, C.L., Talleur, D.A., Emanuel, T.W., Jr., & Phillips, , Int. J. Aviat. Psychol., 9(4), 319–335, 1999; Hammerton, M., Proceedings of the IEE Conference, 113(11), 1881–1884, 1966.

done on the performance data. These include the following: percent transfer (the saving of time or trials in an aircraft by using a flight simulator); transfer effective ratio (measures the efficiency of the simulation); first shot performance (how much training will be retained on first transference to the real situation); and training retained (how much training is retained on the first post-transfer trial from the simulator compared with that gained from the real world). In addition to performance measures, experimental design is another crucial issue to determine the degree of transfer of training that occurred. This will be discussed next.

120

Human Factors in Simulation and Training

EXPERIMENTAL DESIGN Numerous training evaluation studies exist in the simulation-based training literature, some studies are dated back in 1940s (Valverde, 1973). Overall speaking, three types of studies include “Forward Transfer Study” (i.e., a predictive validation study), “Backward Transfer Study” (i.e., a concurrent validation study), and “QuasiExperimental Study” (an approach similar to a construct validation study). 4.1

Forward Transfer Study In the case of aviation, the classic experiment is to compare two matched groups of pilot trainees. One group (the control group) would be trained using only actual aircraft and the other (the experimental group) would receive their training via a simulator. The transfer environment is the actual aircraft. Pilot performance after training (e.g., TTC and flight technical performance) is captured, measured, and compared. This type of study is referred to as a Forward Transfer Study (Kaempf & Blackwell, 1990; Dohme, 1992; Darken & Banker, 1998), as it concurs with the transfer direction—first simulation, then the transfer environment. In some situations, this type of study can be expensive, time-consuming, or impossible to complete. For instance, in the occurrence of rare and dangerous events, such as engine failure, turbulence, or severe weather conditions, a forward transfer study is nearly impossible in a practical sense.

Backward Transfer Study To overcome the technical difficulties inherent in forward transfer studies, researchers use the Backward Transfer Study method. In a backward transfer study (i.e., concurrent validation study), current proficient aviators perform tasks on the job and performance measures are taken. Next, the same aviators perform the tasks in the simulation. Their performance in the simulation is compared with their performance on the job. The logic of the backward transfer study method lies in the following assumption: “If the aircraft proficient aviators cannot perform the flying tasks successfully in the simulator, the poor performance is attributed to deficiencies in the simulators” (Kaempf & Blackwell, 1990). Possible simulator deficiencies include the following: cues that may be different from the actual environment, controls that may be different from the actual environment, and the fact that flight simulators may require skills that are not required to fly aircraft (Kaempf & Blackwell, 1990). Backward transfer studies use real pilots’ performance in simulators only to predict forward transfer effectiveness for a particular simulator. If a low degree of backward transfer occurs, it implies that there are deficiencies in that simulator. Unfortunately, in the case of a high degree of backward transfer, it is not necessarily an indication of a high degree of forward transfer. It may simply mean that the pilots in the study were exemplary at getting the simulator to perform how they desired. Table 4.3 illustrates the formulas involved in backward transfer studies. Results from

Transfer of Training

121

Kaempf and Blackwell (1990) indicate that the inexpensive backward transfer studies may be employed to predict forward transfer tasks.

Quasi-Experimental Study Lintern et al. (1997) and Stewart et al. (2002) completed a quasi-experimental study (i.e., similar to construct validation studies). In this method, transfer of training is compared between one of the configurations of simulation and another configuration of the same device. Quasi-experimental study is intended to investigate the basic knowledge about transfer of training principles and theories. Potential benefits include considerable savings in experimental cost and time, provided quasi-transfer methodology can be validated as the true transfer of training.

CURVE-FITTING METHOD Another method used to assess transfer is the curve-fitting technique. One reason to use curve fitting is that the traditional transfer of training formulas does not take prior training amount or experience into account; the only data collected tends to be from immediately after one simulation exercise. Another shortcoming of traditional transfer of training formulas is that they only provide “crude” transfer by giving one global value. Additionally, the traditional estimates of transfer of training are not statistical tests (Damos, 1991). In contrast, using the curve-fitting technique to assess transfer of training provides a more comprehensive measure of training transfer. In brief, the curve-fitting technique attempts to generate a learning curve for a particular task. Researchers find the best-fitting equation for the data. In this way, a more exact picture of the skill acquisition process is portrayed. Thus, the researcher is examining measures of learning over time rather than one individual measures. TABLE 4.3 Index of Backward Transfer Formula (B)

B=

å

æ Ai ö ç ÷ è Si ø N

N

i =1

i = subject N = total number of subjects A = the mean of the subject’s OPR scores for the last two trials in aircraft, and S = the subject’s OPR score during second simulator check ride B less than 1 indicates that performance in the simulator was substantially below that in the aircraft. Source: Kaempf, G.L. and Blackwell, N.J., Transfer-of-training study of emergency touchdown maneuvers in the AH-1 flight and weapons simulator (Research Report 1561), 1990.

122

Human Factors in Simulation and Training

Curve fitting is claimed to be a major improvement for ToT estimates (Damos, 1991). Using this method, the curve equation provides estimates for rate of initial level of performance, rate of improvement, and the asymptotic level of performance for a particular group (e.g., type of training). Damos (1991) found that the curvefitting method provided a much more detailed analysis of the data. This includes the following characteristics: 1. More insight into specific training effectiveness for differing training methods (i.e., calculating the inflection point and asymptote of the curves) 2. Transfer of training estimates without a control group 3. Statistical tests on curve parameters (to help assess differences between training interventions) The typical steps involved in curve fitting include the following: 1. Visual inspection of data to “guess” an equation form (i.e., a general exponential equation (Damos, 1991) dv = c exp(gx) + h, where dv is the dependent variable, c, g, and h are the parameters to be fit, and x is the trials block number) 2. Calculating the goodness-of-fit calculations by calculating the correlations between predicted and observed value 3. Performing a statistical test for each parameter More curve-fitting techniques can be found in Spears (1985). Few curve-fitting studies have emerged over the years, possibly due to the time-consuming, expensive, and complex nature of this technique (Farmer et al., 2017).

SUMMARY After reviewing the literature, a few points are quite clear. First, with 30 years of research behind it, the Baldwin and Ford model continues to provide guidance on the factors involved in transfer of training. Despite the presence of the overarching model, more research is needed before an exact understanding of the relative importance of the different variables will be achieved. Training transfer is linked acutely to measurement, and empirical studies with multiple measures of learning, transfer, and performance are needed to build our knowledge of training effectiveness in general (Bell et al., 2017). It is our hope that future empirical research will address the mechanism of the human learning process and the training transfer within different contexts and, in turn, will advance the understanding of the transfer of training.

REFERENCES Auffrey, A.L., Mirabella, A., & Siebold, G.L., 2001, Transfer of training revisited (Report ARI-RN-2001-10), Alexandria, VA: Army Research Institute for the Behavioral and Social Sciences.

Transfer of Training

123

Awoniyi, E.A., Griego, O.V., & Morgan, G.A., 2002, Person-environment fit and transfer of training, International Journal of Training and Development, 6(1), 25–35. Baldwin, T., & Ford, J., 1988, Transfer of training: A review and direction for future research, Personnel Psychology, 41, 63–105. Bell, B.S., Tannenbaum, S.I., Ford, J.K., Noe, R.A., & Kraiger, K., 2017, 100  years of training and development research: What we know and where we should go, Journal of Applied Psychology, 102(3), 305–323. Blaiwes, A.S., Puig, J.A., & Regan, J.J., 1973, Transfer of training and the measurement of training effectiveness, Human Factors, 15(6), 523–533. Blickensderfer, B., Strally, S., & Doherty, S., 2012, The effects of scenario-based training on pilots’ use of a whole plane parachute, International Journal of Aviation Psychology, 22(2), 184–202. Blume, B.D., Ford, J.K., Baldwin, T.T., & Huang, J.L., 2010, Transfer of training: A metaanalytic review, Journal of Management, 36(4), 1065–1105. Blume, B.D., Ford, J.K., Surface, E.A., & Olenick, J., 2019, A dynamic model of training transfer, Human Resource Management Review, 29(2), 270–283. Burke, L.A., 1997, Improving positive transfer: A test of relapse prevention training on transfer outcomes, Human Resource Development Quarterly, 8(2), 115–128. Chapanis, A., 1996, Human factors in systems engineering, New York: John Wiley & Sons. Colquitt, J.A., LePine, J.A., & Noe, R.A., 2000, Toward an integrated theory of training motivation: A meta-analytic path analysis of 20 years of research, Journal of Applied Psychology, 85(5), 678–707. Damos, D.L., 1991, Examining transfer of training using curve fitting: A second look, The International Journal of Aviation Psychology, 1(1), 73–85. Darken, R.P., & Banker, W.P., 1998, Navigating in natural environments: A virtual environment training transfer study, Proceedings of VRAIS, 12–19. Dohme, J., 1992, Transfer of training and simulator qualification or myth and folklore in helicopter simulation (N93-30687), NASA/FAA Helicopter Simulator Workshop Proceedings, 115–121. Federal Aviation Administration, 2006, Advisory circular. Retrieved from: https://www​.faa​ .gov​/regulations​_policies​/advisory​_circulars​/index​.cfm ​/go​/document​.information / documentID/1030235 Farmer, E., van Rooij, J., Riemersma, J., & Jorna, P., 2017, Handbook of simulator-based training, Routledge. Foxon, M., 1993, A process approach to the transfer of training. Part 1: The impact of motivation and supervisor support on transfer maintenance, Australasian Journal of Educational Technology, 9(2), 130–143. Hammerton, M., 1966, Factors affecting the use of simulators for training, Proceedings of the IEE Conference, 113(11), 1881–1884. Hays, R.T., Jacobs, J.W., Prince, C., & Salas, E., 1992, Flight simulator training effectiveness: A meta-analysis, Military Psychology, 4(2), 63–74. Hays, R.T., & Singer, M.J., 1989, Simulation fidelity in training system design, New York: Springer. Kaempf, G.L., & Blackwell, N.J., 1990, Transfer-of-training study of emergency touchdown maneuvers in the AH-1 flight and weapons simulator (Research Report 1561), Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Kaiser, M.K., 2003, Flights of fancy: The art and science of flight simulation, in Principles and practice of aviation psychology, Tsang, P.S. & Vidulich, M.A., Eds., Mahwah, NJ: Lawrence Erlbaum Associates, pp. 435–471. Ker, J., Hogg, G., Maran, N., & Walsh, K., 2010, Cost effective simulation, in Cost effectiveness in medical education, Radcliffe: Abingdon, pp. 61–71.

124

Human Factors in Simulation and Training

Kirkpatrick, J.D., & Kirkpatrick, W.K., 2016, Kirkpatrick’s four levels of training evaluation, 1st edition, Association for Talent Development. ISBN-10: 1607280086 Kraiger, K., Ford, J.K., & Salas, E., 1993, Application of cognitive, skill-based, and affective theories of learning outcomes to new methods of training evaluation, Journal of Applied Psychology, 78, 311–328. Lathan, C.E., Tracey, M.R., Sebrechts, M.M., Clawson, D.M., & Higgins, G.A., 2002, Using virtual environment as training simulators: Measuring transfer, in Handbook of virtual environment, Stanney, K.M., Ed., Mahwah, NJ: Lawrence Erlbaum Associates. Lintern, G., Taylor, H.L., Koonce, J.M., Kaiser, R.H., & Morrison, G.A., 1997, Transfer and quasi-transfer effects of scene detail and visual augmentation in landing training, The International Journal of Aviation Psychology, 7(2), 149–169. Moroney, W.F., & Moroney, B.W., 1998, Flight simulation, in Handbook of aviation human factors, Garland, D.J., Wise, J.A., & Hopkin, V.D., Eds., New York: Lawrence Erlbaum Associates. Noe, R.A., 2006, Employee training and development, Irwin: McGraw-Hill. Osgood, C.E., 1949, The similarity paradox in human learning: A resolution, Psychological Review, 56, 132–143. Roscoe, S.N., & Williges, B.H., 1980, Measurement of transfer of training, in Aviation psychology, Roscoe, S.N., Ed., Ames, IA: Iowa State University Press, pp. 182–193. Salas, E., Priest, H.A., Wilson, K.A., & Burke, C.S., 2006, Scenario-based training: Improving military mission performance and adaptability, in Operational stress. Military life: The psychology of serving in peace and combat: Operational stress, Adler, A.B., Castro, C.A., & Britt, T.W. Eds., Praeger Security International, pp. 32–53. Simon, C.W., & Roscoe, S.N., 1984, Application of a multifactor approach to transfer of training research, Human Factors, 26(5), 591–612. Smith-Jentsch, K.A., Salas, E., & Brannick, M.T., 2001, To transfer or not to transfer? Investigating the combined effects of trainee characteristics, team leader support, and team climate, Journal of Applied Psychology, 86(2), 279–292. Spears, W., 1985, Measuring of learning and transfer using curve fitting, Human Factors, 27, 251–266. Spector, P.E., 2003, Industrial/organizational psychology: Research and practice, 3rd ed., New York: John Wiley & Sons. Stewart, J.E. II, Dohme, J.A., & Nullmeyer, R.T., 2002, U.S. Army initial entry rotary-wing transfer of training research, The International Journal of Aviation Psychology, 12(4), 359–375. Taylor, H.L., Lintern, G., Hulin, C.L., Talleur, D.A., Emanuel, T.W. Jr., & Phillips, S.I., 1999, Transfer of training effectiveness of a personal computer aviation training device, The International Journal of Aviation Psychology, 9(4), 319–335. Thorndike, E.L., 1903, Educational psychology, New York: Lemke & Buechner. Thorndike, E.L., & Woodworth, R.S., 1901, The influence of improvement in one mental function upon the efficiency of other functions. II. The estimation of magnitudes, Psychological Review, 8(4), 384–395. Valverde, H.H., 1973, A review of flight simulator transfer of training studies, Human Factors, 15(6), 510–523. van der Locht, M., van Dam, K., & Chiaburu, D.S., 2013, Getting the most of management training: The role of identical elements for training transfer, Personnel Review, 42(4), 422–439. https://doi​.org​/10​.1108​/ PR​- 05​-2011​- 0072

5

SimulationBased Training for Decision-Making Providing a Guide to Develop Training Based on Decision-Making Theories Richard J. Simonson, Kimberly N. Williams, Joseph R. Keebler, and Elizabeth H. Lazzara

CONTENTS Theoretical Background of Decision-Making........................................................ 126 Normative Decision Models.......................................................................... 126 Decision Models: New Perspectives.............................................................. 127 Naturalistic Decision-Making (NDM) Framework and Recognition-Primed Decision (RPD) Model...................... 128 Biases in Decision-Making............................................................................ 129 Confirmation...................................................................................... 129 Over- and Under-Confidence............................................................. 129 Framing ............................................................................................ 130 Probability Perception (Gambler’s Fallacy)...................................... 130 Sunk Costs......................................................................................... 131 Decision Theory Applied to Training............................................................ 131 Steps in Developing Simulations to Train Decision-Making.................................. 132 Conduct a Needs Assessment........................................................................ 132 Identify Learning Objectives......................................................................... 133 Set the Simulation Context............................................................................ 133 Establish KSAs.............................................................................................. 133 Create Events to Elicit KSAs......................................................................... 134 Establish an Assessment Plan........................................................................ 135 Conclusion.............................................................................................................. 137 References............................................................................................................... 137

DOI: 10.1201/9781003401360-5

125

126

Human Factors in Simulation and Training

Decision-making is a core aspect of human work and performance. It includes a set of processes that individuals partake in every day of their lives to make choices and evaluate reality, both consciously and subconsciously. It is an activity that can both result in positive growth and beneficial outcomes, as well as lead to negative life-changing events and detrimental outcomes for the decision-maker and those influenced by their decisions. Additionally, decisions can compound themselves into habits that may further define individual and future judgments. Therefore, it needs to be trained to ensure that individuals are competent and knowledgeable about the success and failures associated with decision-making heuristics and biases. One tool to mimic real-world scenarios in a safe, controlled, and non-consequential environment is simulation-based training (SBT; i.e., a methodology of providing structured experiences in a replicated environment to facilitate the acquisition of knowledge, skills, and attitudes; Salas et al., 2009). SBT can allow individuals an opportunity to practice a variety of decision-making tasks to gain insights into their own inherent biases, heuristics, and decision-making pathways. SBT can also be leveraged to train specific decision-making competencies, but the effectiveness of SBT relies just as much on the design of training and instructional strategy as it does on the efficacy of the equipment and tools used. Therefore, the purpose of this chapter is twofold: (1) to summarize decision-making theories and (2) to provide guidance for utilizing simulation-based training to enhance decision-making competencies. To accomplish these objectives, we lay out this chapter in the following order. First, we will provide a brief summary of the theoretical conceptualization of how people form decisions. Next, we will elucidate on how to design simulation-based training targeting decision-making. Finally, we conclude with the challenges and opportunities for research related to simulation-based training in decision-making.

THEORETICAL BACKGROUND OF DECISION-MAKING Before we begin exploring how to best design SBT for decision-making, we provide a theoretical understanding of decision-making. Specifically, we describe the normative models of decision-making, expertise-based decision-making models, and common biases and heuristics associated with decision-making.

Normative Decision Models Early models of decision-making were based on economic theories that took a normative or rational approach to decision theory. Two of these early theories include expected value and expected utility theories. The primary aspect of these normative theories posits that when humans are given a multitude of choices, we logically weigh those options against each other as a product of their relative values (or utilities) and the corresponding probabilities of occurrence (given inputs and possible outcomes), and ultimately we choose the path with the best total expected value. Normative theories would predict this type of rational decision-making for all decisions, regardless of whether the subject of the uncertainty was losses or gains. For example, imagine an individual is presented with two scenarios:

127

Simulation-Based Training for Decision-Making

In the first scenario, the individual could choose either A) a guaranteed gain of $200; or B) have a 50% chance of gaining $500 and a 50% chance of gaining nothing at all. In the second scenario, they could choose either A) a guaranteed loss of $200; or B) have a 50% chance of losing $500 and a 50% chance of losing nothing at all. In each scenario’s option B, the total expected value (Eq. 5.1) of the choice is $250 (gain in the first, loss in the second).

(

)

E ( X ) = å X * P ( X ) (5.1)

Expected value theory would have predicted that in the first scenario an individual would choose option B because the expected value of this ($250) is higher than the option with the guaranteed gain ($200). In the second scenario, it would predict they would choose option A (guaranteed loss of $200) because the expected value of the guaranteed loss is lower than in option B (total expected value of the gamble is a loss of $250, which is a worse total outcome). The example above demonstrates that normative theories do not align with how individuals typically make decisions. Humans are risk-averse, so framing based on losses and gains changes the way they make decisions. This is captured in the seminal work of Daniel Kahneman and Amos Tversky  (1979), who modeled a more accurate picture of how and why humans actually make decisions with the groundbreaking Prospect Theory. This theory explains how humans systematically deviate from normative theories in their decision-making. One aspect of this is that individuals attribute different weights to equivalent gains versus losses because of their differential utility. In general, humans are risk-averse when presented with guaranteed versus uncertain gains and risk-seeking when faced with choices of guaranteed versus uncertain loss. So, in the example above, most individuals would choose option A in the first scenario (a risk-averse, “take the money and run” approach) and choose option B in the second scenario (risk-seeking, by risking a potentially large loss rather than be guaranteed to lose anything). Additional extensions of Prospect Theory have more to offer in understanding how humans deviate from normative theories and actually make decisions, primarily in their contribution to literature in heuristics and biases realm of decision-making literature, which will be discussed later in the chapter.

Decision Models: New Perspectives Since Kahneman and Tversky (1979) highlighted the fallacies of normative models of decision-making, this field has evolved its research approaches in an attempt

128

Human Factors in Simulation and Training

to accurately model human decision-making behaviors. As established, normative decision theories stipulate that individuals take the rational approach to all decisions, calculating the relative value of outcomes and probabilities associated with n number of options, regardless of the number and type of available options. Being so methodical requires substantial time and cognitive resources. Dedicating the amount of time and resources to make these types of calculations for all possible outcomes of all decisions is impractical. Because it is not feasible to make rational comparisons for every option for every decision, individuals must have different strategies to make these decisions more efficiently. In the updated decision literature, there are two seemingly contradictory observations that give rise to two different realms of research: (1) that experts make remarkably fast and accurate decisions based on “intuitions” even when all of the evidence is unavailable and (2) individuals use shortcuts in their decision-making that lead them to make systematic errors in decisions and these errors change as a function of expertise. The next sections will discuss the intuition-based naturalistic decisionmaking (NDM) approach and the related recognition-primed decision (RPD) model, followed by evidence of systematic biases that occur in decision-making. Naturalistic Decision-Making (NDM) Framework and Recognition-Primed Decision (RPD) Model The NDM approach is an entire field of study that focuses on how individuals gain expertise and how this expertise (typically described as intuition, or pattern recognition) influences decision-making. NDM focuses more heavily on decisionmaking as it occurs in realistic scenarios, rather than laboratory environments. In fact, NDM first arose from observations of chess players and attempted to describe how expert chess players are able to make decisions quickly and accurately even in the inherently complex and uncertain context in which a chess game unfolds (Kahneman & Klein, 2009). More specifically, it attempts to elucidate what cues experts (i.e., chess masters) relied on to form recognizable patterns that aid their decision-making (Klein, 2015). One of the primary models that has arisen from NDM is the RPD model. Klein et al. (1986) presented the RPD model to characterize how experts (i.e., firefighters) form rapid decisions under time pressure. RPD demonstrates how experience allows experts to refine their decision options to only one or two options that are likely to be effective; this refinement allows the expert to come to single decisions quickly and efficiently, rather than waste time identifying, comparing, and contrasting many possible options at once. RPD postulates that this process occurs in four steps. The first step in this process is recognition—this is a phase in which the expert assesses the scene in front of them, and their assessment activates prior memories/experiences about similar scenarios they have encountered (either through training or direct experience; i.e., recognition priming) and draws forward a pattern associated with those scenarios. In the second step, experts then evaluate this pattern relative to the situation at hand (known as “progressive deepening”; Kahneman & Klein, 2009 as in deGroot, 1978) and if the pattern is accurate, decides on the action (or set of actions) that will be

Simulation-Based Training for Decision-Making

129

effective based on those associated with the pattern model they had formed in prior experience. In essence, experts are using heuristics (i.e., mental shortcuts) they have built based on their experiences with the domain by identifying patterns of cues that signal a course of action (decision) that will result in the desired outcome. In the third step, the expert can then challenge their own decision at various stages of the process to determine if it is still the best course of action based on the unfolding of events. If at any point the original pattern is no longer applicable to the scenario, the expert then enters the fourth step where they reevaluate and find an alternative pattern that fits the cues present and proceeds to mentally test and act on this situation accordingly. This entire process can repeat as many times as needed until the outcome is decided. This method of decision-making is highly beneficial due to its efficiency. Therefore, one major focus of decision training is to aid inexperienced individuals in learning cues that signal different patterns that dictate decision outcomes, so they can effectively mimic the decision skills of experts (e.g., Klein, 2015).

Biases in Decision-Making Although this pattern recognition inherent in the RPD model may be able to lead to effective and efficient decision-making, it is also seen by some researchers as vulnerable to the influence of biases, which can ultimately lead to incorrect decision outcomes. The following sections will discuss various biases that have been known to affect decision-making abilities: confirmation, over- and under-confidence, framing, probability, and sunk costs. Confirmation Confirmation bias describes individuals’ tendency to only seek information or cues that confirm already-held beliefs, ideas, or hypotheses (Fischer et al., 2008). A common representation of this bias is found in politics, where individuals tend to seek out evidence that only supports their party rather than discredits it. Cognitive bias occurs because it is difficult for individuals to interpret contradictory information (and experience cognitive dissonance as a result), so this bias simplifies the decisionmaking process by ignoring or down-weighting contradictory information. Although confirmation bias has historically been seen as negative, it is recently being viewed as potentially beneficial for making fast and accurate decisions in individuals who develop a high degree of metacognition (Rollwage & Fleming, 2021). Metacognition allows individuals to recognize when an initial conclusion may be incorrect and to seek out disconfirming evidence. Metacognition also allows individuals to gain the benefits of easier processing associated with confirmatory evidence for most decisions, while still permitting them to correct the effect of the bias when they realize their initial conclusion was incorrect. Over- and Under-Confidence Over- or under-confidence occurs when an individual fails to appropriately calibrate their perception of their own skill level to their actual abilities. These misperceptions can negatively influence decision-making. An example of this can be found in how

130

Human Factors in Simulation and Training

individuals’ financial knowledge and levels of confidence in the accuracy of this knowledge correlates with their investment behaviors. Pikulinaa et al. (2017) found that both types of inaccurate confidence relative to skill level affected individuals’ decision-making on financial investments with very overconfident individuals making excessive investments and moderately under-confident individuals underinvesting. However, they also found that moderately overconfident individuals made the most accurate investments. These findings suggest that a slight degree of overconfidence may be beneficial when making decisions under conditions of uncertainty. Use of feedback, counter-thinking, and interventions that modify affect have been suggested as methods to calibrate confidence judgments; although, each of these methods has limited empirical support (Koellinger & Treffers, 2015). Framing Framing effects were first identified by Kahneman and Tversky (1979) in the context of Prospect Theory (discussed earlier in this chapter) and occur when the way potential outcomes are presented influences the decision. As a greatly simplified example, when physicians are presented with two different treatments and one is listed with a 98% survival rate while the other is listed as a 2% mortality rate, physicians may be more likely to choose the option listed with a 98% survival rate (see Fridmen et al., 2021 for a recent summary and extension of research exploring framing effects in the medical domain). This choice is because individuals tend to demonstrate risk-averse behavior when presented with a decision in a gain frame; meanwhile, individuals exhibit risk-seeking behavior when information is presented in a loss frame (e.g., Kahneman & Tversky, 1979). Empirical evidence for the framing effect is mixed. A meta-analysis supported its robustness (Steiger & Kühberger, 2018), but there is only partial support existing from statistical and mathematical modeling of the effect (e.g., Fan, 2017). Recent evidence, though, shows the impact of the framing effect on decisions may be reduced through explicit training in statistics/mathematics (Borracci et al., 2020). Probability Perception (Gambler’s Fallacy) Humans sometimes demonstrate biases when predicting sequential outcomes based on chance probabilities. An example of this is often represented by a coin toss example from Kahneman and Tversky’s 1972 paper; an individual is presented with the following sequences of heads/tails scenarios and asked to identify which is the most likely sequence given fair coin tosses: HHHHTH, HTHTTH, HHHTTT. In this paradigm, individuals will typically select HTHTTH, even though all of these scenarios are equally likely since each coin toss is independent of every other coin toss. This selection occurs because the “random” (more alternating) scenario is more representative of what individuals expect to see out of fair (i.e., random) coin tosses, though that perception does not match with how probability odds actually occur (e.g., Tversky & Kahneman, 1974). An extension of probability perception is gambler’s fallacy. Gambler’s fallacy occurs when individuals see patterns in random previous events and mistakenly see these patterns as influencing the probabilities of future events. For example, if given

Simulation-Based Training for Decision-Making

131

a gambling scenario where an H is a “loss” and T is a “win,” after a series of H results (HHHHHH), a chronic gambler is likely to incorrectly think that the next toss is more likely to be a T. The rationale for this choice is that the gambler believes that equivalent probability of occurrence of either H or T should equate to actual equivalent representation in reality; thus, they believe they are “due for a win” to even out the string of losing Hs. Recent research indicates that training that engages analytical thinking specific to these types of scenarios may reduce this effect and related gambling behaviors (Armstrong et al., 2020). Sunk Costs Once individuals invest something (e.g., time, effort, and money) into an outcome, they can fall victim to the sunk cost fallacy. A sunk cost is an investment that cannot be recovered, and sunk cost fallacy occurs when an individual (consciously or subconsciously) lets these sunk costs influence their future investment decisions. As an example, a small business owner decides to buy an expensive piece of machinery that will allow them to make items they think will sell in their shop. After several months, they fail to sell any of the products they have made with the machine thus far. The business owner knows how much money they spent on the machine and wants to put that money back into the products, so they invest more money to buy different materials to try to make the same products more appealing. This cycle may then repeat. Until the business owner can remove the previously incurred sunk costs (the machine, materials, etc.) from their decision to invest further, they are succumbing to the sunk cost fallacy. Sunk cost has been posited as a loss-oriented framing situation using Prospect Theory; that is, the frame of the sunk cost generates the risky investment behaviors associated with loss-framing (Thaler, 1980). This decision strategy is detrimental because it can lead to overinvestment in unprofitable projects and a high waste of resources (Roth et al. 2015 provide an overview of applied domains and potential varying effects). Activities that encourage introspection and suppression of future thinking toward projects have been seen as potentially successful in preventing sunk bias from influencing decisions (e.g., Strough et al., 2016).

Decision Theory Applied to Training Thus far, we have presented several concepts of actual decision-making strategies with a focus on two approaches: (1) intuitive decision-making developed through expertise and (2) systematic biases that individuals may employ as decision-making strategies. The debate is ongoing as to which of these two approaches should be capitalized on most to train more effective decision-making skills (e.g., Anderson et al., 2019). Both approaches have merit; that is, experts can make good decisions based on intuition, and all individuals can experience systematic biases in decision-making. Proponents of each approach have come to agree that the primary determination lies in the predictability of the environment decisions taking place as well as how well the decision-maker has been able to learn the boundaries of that environment and the impact of their decisions (see Kahneman & Klein, 2009). In other words, which approach decision-makers rely on is based on how reliably cues present in the

132

Human Factors in Simulation and Training

environment can predict outcomes of decisions and how well individuals have been exposed to these cues and outcomes to learn their associations correctly. Based on Kahneman and Klein’s interpretations (described above), we might expect that in a more reliable environment, a higher degree of practice-based training focused on building experience may lead to effective intuition-based decision-making (see also Klein, 2015); meanwhile, in a low-reliability environment, the risk of bias negatively impacting decisions is high. Therefore, training should focus more explicitly on reducing the effects of bias. Regardless, training decision-making abilities provide trainees an opportunity to build upon their abilities to identify risks and biases as well as recognize and utilize heuristics to obtain an expert-based framework in various decision-making tasks. Thus, the following sections will describe how to create a decision training paradigm in a simulated environment.

STEPS IN DEVELOPING SIMULATIONS TO TRAIN DECISION-MAKING Similar to other cognitive processes and skills, effective decision-making is not simply rooted in intuition, but rather effective decision-making requires a deliberate training curriculum. Due to the potential for disastrous outcomes associated with poor decision-making, decision-making is ripe for simulation-based training. Simulation-based training, when executed properly, provides the ability to replicate real-world tasks and environments while still within safe parameters void of actual catastrophes (Hochmitz & Yuviler-Gavish, 2011). Thus, the following portion of this chapter will provide a guide to applying decision-making training in a simulation-based training context. We will specifically focus on the following: (1) determining what training is required, (2) identifying the learning objectives, (3) selecting the context of the simulation, (4) establishing the knowledge, skills, and attitudes (KSAs), (5) developing events to elicit KSAs, and (6) creating an assessment plan.

Conduct a Needs Assessment The primary step in beginning any training endeavor is the implementation of a training needs assessment. Training needs assessments can assist in identifying the challenges that individuals face in completing various tasks as well as identify the competencies associated with said tasks. The application of a training needs assessment is typically associated with three objectives before initiating a training initiative. These include (1) identifying what competencies need to be trained, (2) establishing if the needs can be met by training, and (3) proposing a training solution (Brown, 2002). Although resource-intensive, a training curriculum that is not initiated by a needs assessment may lead to ineffective training and even negative organizational effects (Cekada, 2011). Additionally, as decision-making competencies are often dependent on the task and its associated cognitive processes, ignoring this needs assessment step can result in inappropriate training in decision-making that does not target the appropriate competencies.

Simulation-Based Training for Decision-Making

133

Identify Learning Objectives Once a needs assessment has been completed, one can begin to define the learning objectives. Learning objectives are the specifications of what the training is intending to accomplish. Consequently, learning objectives are the foundation of the curriculum and should be clear, concise, and measurable. Therefore, when crafting learning objectives, one needs to consider what specific metric will measure the corresponding learning objective as well as how to determine if the objective has been reached (e.g., at what point is the trainee considered successful?). If learning objectives do not adhere to those parameters, it becomes difficult to determine if the training is accomplishing its desired goals. Within the context of decisionmaking, a learning objective may relate to trainees significantly improving their knowledge of the biases that could disrupt their decision-making abilities in timesensitive tasks.

Set the Simulation Context When the learning objectives have been identified, the next step is to determine the simulation context. The simulation context provides a boundary to specify what should and should not be trained in the specified curriculum (Rosen et al., 2008). Establishing the simulation context assists in fostering cognitive fidelity. Cognitive fidelity refers to the extent that simulation requires the trainee to engage in the thought processes involved in the real-world task (Hochmitz & Yuviler-Gavish, 2011). Because decision-making is arguably a cognitive process, it is particularly relevant that the context fosters cognitive fidelity. Cognitive fidelity is essential for trained KSAs transferring to the real-world task (Hochmitz & Yuviler-Gavish, 2011); therefore, not all contexts are created equally. Rather, contexts should be carefully scrutinized such that the decision-making competencies targeted within the training can be elicited accordingly. For simulation-based training, setting contexts that can elicit specific competencies is crucial. Archer et al. (2006) provide an example of a decision-making storyline that utilizes decision-based nodes which dynamically change the story as the trainee moves through the scenario. Their specific context is that of soldiers planning a maneuver to a location but may end up in an ambush (failure), minefield (failure), or bridge (success).

Establish KSAs Following the determination of the simulation context, one should determine the necessary, specific knowledge, skills, and attitudes (KSAs). Described as the key components that should be elicited during training and, more importantly, during task execution, KSAs serve as a mechanism to provide information on what the individual knows (knowledge), what the individual can do (skill), and what the individual feels (attitude) (Bloom, 1956). Targeting KSAs during training increases the potential that individuals will exhibit the KSAs in the real-world task (Bloom, 1956).

134

Human Factors in Simulation and Training

Due to the complexity of decision-making as described in the various cognitive models (i.e., RPD & NDM), training every KSA within the entire realm of decisionmaking is impractical from a pedagogical perspective. Key KSAs can be identified through various methods, such as an ethnographic study, cognitive task analyses of various work, or a review of commonly used decision-making competencies within the decision-making literature base (see Stanton et al., 2013). For example, a multitude of decision-making KSAs have been researched and utilized within the training and decision-making research domain and include (1) resistance to framing: the degree to which value assessments are affected by confounding factors, (2) recognizing social norms: the ability to gauge social norms, (3) under- and overconfidence: the level of understanding of one’s own knowledge, (4) applying decision rules: the ability to apply elimination by aspects, satisficing, lexographic, and equal weights rules, (5) consistency in risk perception: the knowledge and ability to adhere to probability rules, (6) resistance to sunk costs: the degree to which one can ignore prior decisions regarding investments (emotional, financial, etc.), and (7) path independence: the ability to make consistent decisions regarding multi-stage events (c.f., Bruine de Bruine et al., 2007; Parker & Fishhoff, 2005; Parker et al., 2015; Finucane & Gullion, 2010).

Create Events to Elicit KSAs Since the objective of SBT is to foster the ability to elicit specific KSAs by a trainee, one should rely on embedding trigger events. A trigger event serves as a cue to prompt the trainee to demonstrate the targeted KSAs (Rosen et al., 2008). Trigger events should correspond to the learning objectives of the training, where each learning objective is associated with at least one trigger event. Trigger events are arguably the building blocks of the simulation as they serve as a mechanism to consistently generate specific, observable key KSAs. Each trigger event should be associated with at least one expected response in which the trainee exhibits the desired knowledge, skill, or attitude. Knowledgebased responses typically are used to elicit some type of mental store of information. These responses are usually followed by a verbal or written trigger requesting information from the trainee. Skill responses, on the other hand, necessitate a psychomotor reaction (i.e., physical movement) to a trigger that is typically observed via behavioral observation. An example of a decision-making simulation scenario in the context of a soldier planning a route through a combat zone is presented below. Each trigger (Tr) is tied to the response (R) in which the trainee is given an opportunity to elicit a knowledge, skill, or attitude that can be observed and measured by the trainer. Following the example from Archer et al. (2006), soldiers (trainees) are provided with a scenario in which they are asked to plan a route from some starting area to a recon bridge. During the training scenario they are asked to plan a route (Tr) and choose a route (Tr) which eventually spans out to a minefield (Tr), and ambush (Tr). However, every time they encounter one of these triggers, they are asked to plan a new route (R) based on the information garnered along their scenario.

Simulation-Based Training for Decision-Making

135

Establish an Assessment Plan Conducting the needs assessment, determining the learning objectives, selecting the simulation context, establishing KSAs, and finally creating the events and responses are all the steps one needs to develop a simulation-based training scenario. However, no training can be effective without assessment. Assessment facilitates the ability to determine if the training intervention was effective by providing evidence of demonstrable improvement in KSAs derived from the learning objectives. The first step in deciding on an assessment is to determine how to approach measuring training efficacy. The primary decision factor is based on three components. One component is the type of competency being measured. Assessing knowledge, skills, and attitudes requires different assessment types. For example, knowledge and attitudinal assessments often rely on questionnaires; meanwhile, a psychomotor-based behavior (i.e., skill) cannot be measured and evaluated using such a questionnaire. Thus, determining when to utilize questionnaires, observational checklists, or even interview-based assessment should be carefully considered and correspond to the objectives and KSAs of the training. To elaborate further, knowledge-based competencies rely on the recall or recognition of information typically assessed through questionnaires. Although the format may vary, knowledge-based assessments often consist of multiple-choice and free-response style questions. Regardless of the exact format, knowledge-based assessments offer the ability to quantitatively detect any changes in knowledge based on the training intervention. Skill-based competencies require that some psychomotor task is observed to determine if a desired behavior was performed. There are a variety of formats of behavioral tools. One option is a checklist. Checklists are considered to be a dichotomous rating, wherein observers indicate the presence or absence of a behavior. Checklists are the least cognitively demanding on raters; however, dichotomous ratings are not very diagnostic by not providing any insights regarding the quality of the exhibited behavior. In other words, checklists indicate whether a behavior was present, but they do not indicate if the behavior was exhibited superbly or poorly. Another option is a frequency count. Frequency counts, on the other hand, require more attentiveness, are more cognitively demanding, and can be prone to more reliability issues compared to dichotomous ratings. Frequency counts, though, are particularly useful when the same behavior is expected to be demonstrated repeatedly. They can offer information regarding how often a trainee exhibited a behavior. A third option is behaviorally anchored rating scales (BARS). BARS typically have numeric ratings that are accompanied by observable behaviors. BARS can be challenging to develop as each numeric score requires a well-defined behavior determined a priori. Regardless of the format, each behavioral assessment option has its respective advantages and disadvantages. Consequently, which option is selected should be based on the specific competency being assessed as well as the available resources and practical constraints. Finally, attitude-based competencies are often measured via the application of a survey. Surveys are the implementation and collection of perception-based questions

136

Human Factors in Simulation and Training

that trainees can answer to provide subjective beliefs. To ensure that trainees have a meaningful and appropriate range of responses to subjective questions regarding their beliefs, Likert-type anchor responses are used. These Likert-type responses are common in surveys and a multitude of anchors have been devised (Vagias & Wade, 2006) to assist in capturing the correct feelings of the trainees. For example, Grol et al.’s (1990) attitudinal questionnaire assesses risk perception and Siebert and Kunz (2016) evaluate proactive attitude in decision-making. Another component is whether there are psychometrically validated tools available. An important note on competencies and their associated measurable KSAs is the psychometric reliability and validity of the metrics utilized. Psychometrics consists of a set of processes to ensure that the metric is measuring constructs accurately and consistently (Shrout & Lane, 2012). Relying on metrics that lack reliability and validity may result in spurious assessments that do not provide any insight into the effectiveness of the training. One validated decision-making tool is the ADMC, which determines an adult’s decision-making competency based on six factors (Bruine de Bruine et al., 2007). A third component relates to the administration of the metric(s). One approach is referred to as a single measurement, and a second approach is multiple measurement. A single measurement approach relies on one method of measurement, either a knowledge, skill, or attitude, to assess a competency or set of competencies. This approach is great for trainings that have resource and time constraints, so the training relies on quick or simple assessments. As an example of a single measurement in decision-making, risk perception can be measured by a trainee exiting a bet. While this method does not require extensive time or resources, it is generally less informative in determining the efficacy of the training compared to the multiple method approach. The multiple method approach, as the name suggests, uses multiple assessment types. This approach is far more effective at assessing proficiency as each learning objective and corresponding KSA is being measured in multiple instances. An extension of this approach, which is commonly believed to be a gold standard of assessment due to its ability to measure a variety of competency-based validities, is the multi-trait multi-method (MTMM) approach (Eid & Diener, 2006). MTMM uses a construct approach and attempts to correlate knowledge-, skill-, and attitude-based assessments to one another when targeting a specific set of competencies. In extending the above example to MTMM, we might first assess the knowledge of the trainees’ risk perception, then test them in a behavioral context and assess their attitude toward withdrawing from or staying in the gamble. Once all these assessments are compiled, they would be constructed into a model and tested to determine if they are quantifiably associated with one another. Theoretically, if a trainee has a strong perception of risk, then all three measurements should be highly correlated to one another. As portrayed in this example, though, this method is substantially more timeand resource-intensive compared to the single measurement approach. Investing resources into assessment, though, is a worthwhile endeavor as assessment is foundational for effective training.

Simulation-Based Training for Decision-Making

137

CONCLUSION The purpose of this chapter was twofold. The first purpose was to provide a summary of decision-making theory. Our review highlighted milestones in decisionmaking research and described the cognitive processes behind decisions. More specifically, the Prospect Theory illustrated the concept that people do not make calculated decisions based on expected values in a strict probabilistic sense. Rather, people are risk-averse or risk-taking depending on the situation. Meanwhile, the NDM and RPD models provided a background into how people gain expertise in decision-making capabilities based on pattern recognition and experience. NDM and RPD also elucidate the idea that the process behind building this experience is susceptible to bias that influences decisions. Although these biases are inherent within cognitive processing, effective training can mitigate some of the potential problems. The second purpose of this chapter was to offer some guidance on how to utilize simulation-based training to enhance decision-making. To maximize the benefits, though, of simulation-based training, curriculums should be rooted in the following: (1) determining what training is required, (2) identifying the learning objectives, (3) determining the context of the simulation, (4) establishing the knowledge, skills, and attitudes (KSAs), (5) developing events to elicit KSAs, and (6) creating an assessment plan. Adhering to the science of training, SBT can be one way to effectively train decision-making competencies and alleviate the biases that affect them all within the context of a safe environment.

REFERENCES Anderson, N. E., Slark, J., & Gott, M. (2019). Unlocking intuition and expertise: Using interpretative phenomenological analysis to explore clinical decision making. Journal of Research in Nursing, 24(1–2), 88–101. https://doi​.org​/10​.1177​/1744987118809528 Archer, R., Brockett, A. T., McDermott, P. L., Warwick, W., & Christ, R. E. (2006). A simulation-based tool to train rapid decision-making skills for the digital battlefield. Micro Analysis and Design Boulder Co. Armstrong, T., Rockloff, M., Browne, M., & Blaszczynski, A. (2020). Training gamblers to re-think their gambling choices: How contextual analytical thinking may be useful in promoting safer gambling. Journal of Behavioral Addictions JBA, 9(3), 766–784. Retrieved April 21, 2021, from https://akjournals​.com ​/view​/journals​/2006​/9​/3​/article​ -p766​.xml Bloom, B. S. (1956). Taxonomy of educational objectives: The classification of educational goals. Cognitive domain. Longman. Borracci, R. A., Arribalzaga, E. B., & Thierer, J. (2020). Training in statistical analysis reduces the framing effect among medical students and residents in Argentina. Journal of Educational Evaluation for Health Professions, 17(25). https://doi​.org​/10​.3352​/jeehp​ .2020​.17​.25 Brown, J. (2002). Training needs assessment: A must for developing an effective training program. Public Personnel Management, 31(4), 569–578. Bruine de Bruin, W., Parker, A. M., & Fischhoff, B. (2007). Individual differences in adult decision-making competence. Journal of Personality and Social Psychology, 92(5), 938.

138

Human Factors in Simulation and Training

Cekada, T. L. (2011). Need training? Conducting an effective needs assessment. Professional Safety, 56(12), 28. Eid, M., & Diener, E. (Eds.). (2006). Handbook of multimethod measurement in psychology. American Psychological Association. https://doi​.org​/10​.1037​/11383​- 000 Fan, W. (2017). Education and decision-making: An experimental study on the framing effect in China. Frontiers in Psychology, 8. https://doi​.org​/10​.3389​/fpsyg​.2017​.00744 Finucane, M. L., & Gullion, C. M. (2010). Developing a tool for measuring the decisionmaking competence of older adults. Psychology and Aging, 25(2), 271. Fischer, P., Jonas, E., Frey, D., & Kastenm¨uller, A. (2008). Selective exposure and decision framing: The impact of gain and loss framing on confirmatory information search after decisions. Journal of Experimental Social Psychology, 44, 312–320. Fridman, I., Fagerlin, A., Scherr, K. A., Scherer, L. D., Huffstetler, H., & Ubel, P. A. (2021). Gain–loss framing and patients’ decisions: A linguistic examination of information framing in physician–patient conversations. Journal of Behavioral Medicine, 44, 38– 52. https://doi​.org​/10​.1007​/s10865​- 020​- 00171-0 Grol, R., Whitfield, M., Maeseneer, J. De, & Mokkink, H. (1990). Attitudes to risk-taking in medical decision-making among British, Dutch and Belgian General-Practitioners. British Journal of General Practice, 40(333), 134–136. Hochmitz, I., & Yuviler-Gavish, N. (2011). Physical fidelity versus cognitive fidelity training in procedural skills acquisition. Human Factors, 53(5), 489–501. Kahneman, D., & Klein, G. (2009). Conditions for intuitive expertise: A failure to disagree. American Psychologist, 64(6), 515–526. https://doi​.org​/10​.1037​/a0016755 Kahneman, D., & Klein, G. (2009). Progressive deepening. In deGroot, A. D. (1978). Thought and choice in chess. The Hague: Mouton. (Original work published 1946.) Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291. Klein, G. (2015). A naturalistic decision making perspective on studying intuitive decision making. Journal of Applied Research in Memory and Cognition, 4, 164–168. Klein, G. A., Calderwood, R., & Clinton-Cirocco, A. (1986). Rapid decision making on the fire ground. Proceedings of the Human Factors Society Annual Meeting, 30(6), 576– 580. https://doi​.org​/10​.1177​/154193128603000616 Koellinger, P., & Treffers, T. (2015). Joy leads to overconfidence, and a simple countermeasure. Plos One, 10(12). https://doi​.org​/10​.1371​/journal​.pone​.0143263 Parker, A. M., Bruine de Bruin, W., & Fischhoff, B. (2015). Negative decision outcomes are more common among people with lower decision-making competence: An item-level analysis of the decision outcome inventory (DOI). Frontiers in Psychology, 6, 363. Parker, A. M., & Fischhoff, B. (2005). Decision‐making competence: External validation through an individual‐differences approach. Journal of Behavioral Decision Making, 18(1), 1–27. Pikulina, E., Renneboog, L., & Tobler, P. N. (2017). Overconfidence and investment: An experimental approach. Journal of Corporate Finance, 43, 175–192. Rollwage, M., & Fleming, S. M. (2021). Confirmation bias is adaptive when coupled with efficient metacognition. Philosophical Transactions of the Royal Society B, 376(1822), 20200131. Rosen, M. A., Salas, E., Silvestri, S., Wu, T. S., & Lazzara, E. H. (2008). A measurement tool for simulation-based training in emergency medicine: The simulation module for assessment of resident targeted event responses (SMARTER) approach. Simulation in Healthcare, 3(3), 170–179. Roth, S., Robbert, T., & Straus, L. (2015). On the sunk-cost effect in economic decisionmaking: A meta-analytic review. Business Research, 8, 99–138. https://doi​.org​/10​.1007​ /s40685​- 014​- 0014-8

Simulation-Based Training for Decision-Making

139

Salas, E., Rosen, M. A., Weaver, S. J., Held, J. D., & Weissmuller, J. J. (2009). Guidelines for performance measurement in simulation-based training. Ergonomics in Design: The Quarterly of Human Factors Applications, 17(4), 12–18. Shrout, P. E., & Lane, S. P. (2012). Psychometrics. In M. R. Mehl & T. S. Conner (Eds.), Handbook of research methods for studying daily life (pp. 302–320). The Guilford Press. Siebert, J., & Kunz, R. (2016). Developing and validating the multidimensional proactive decision-making scale. European Journal of Operational Research, 249(3), 864–877. Stanton, N., Salmon, P. M., & Rafferty, L. A. (2013). Human factors methods: A practical guide for engineering and design. Ashgate Publishing, Ltd. Steiger, A., & Kühberger, A. (2018). A meta-analytic re-appraisal of the framing effect. Zeitschrift für Psychologie, 226(1), 45–55. https://doi​.org​/10​.1027​/2151​-2604​/a000321 Strough, J., Bruine de Bruin, W., Parker, A. M., Karns, T., Lemaster, P., Pichayayothin, N., Delaney, R., & Stoiko, R. (2016). What were they thinking? Reducing sunk-cost bias in a life-span sample. Psychology and Aging, 31(7), 724–736. https://doi​.org​/10​.1037​/ pag0000130 Thaler, R. (1980). Toward a positive theory of consumer choice. Journal of Economic Behavior & Organization, 1(1), 39–60. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131. Vagias, W. M. (2006). Likert-type scale response anchors. Clemson International Institute for Tourism & Research Development, Department of Parks, Recreation and Tourism Management. Clemson University.

6

Almost Like the Real Thing – The Hidden Limits in Flight Simulation and Training Shem Malmquist, Deborah Sater Carstens, and Nicklas Dahlstrom

CONTENTS Introduction............................................................................................................. 141 Introduction to Simulator Motion........................................................................... 142 How Do Humans Perceive Motion Drive?............................................................. 145 Let Them Eat Humble Pie—Or Not?...................................................................... 147 Same, Same But Different...................................................................................... 148 Fiddling with Fidelity While Missing the Story..................................................... 151 A Perfect Tool or a Tool That Can Be Perfected?................................................... 153 Summary................................................................................................................. 153 Acknowledgment.................................................................................................... 154 References............................................................................................................... 154

INTRODUCTION Probably the most visible aspect of modern pilot training is the full-motion flight simulator. Far removed from the early Link trainers, with their motion attached only to the pilot controls and a sensation while “flying” them feeling similar to trying to balance on the head of a pin, today’s simulators are a marvel of technology. With these impressive machines costing in the neighborhood of $10 million, it is not surprising that they are often showcased. These devices include a six-axis of motion via hydraulic or electric actuators, full “vision” for the pilots, as well as usually containing the actual flight controls and instruments manufactured for the real aircraft. Computer systems supply the simulator with an artificial but very detailed “world,” and the instruments, systems, motion, and visual and sound systems react to that environment as they would if the environment were real. But how real are they really, and what problems might be present that are not described in the academic literature or the guidance material published by their manufacturers? DOI: 10.1201/9781003401360-6

141

142

Human Factors in Simulation and Training

As with any modern endeavor, many researchers and flight training departments rely on the work of others. Previous work in the field serves as the foundation upon which new research is conducted and the foundation for how training programs are designed. The majority of literature describes the research and methods which can then form these foundations. This chapter is a bit different. Rather than describe research or methods, this chapter will delve into the aspects that the reader may not have considered. The intention is to enable the reader to cast a critical eye on the published literature. Does the research address internal and external validity? Are the instruments used to measure and analyze the training valid and reliable? Are the methods employed in the research appropriate to ensure internal validity? What limitations and delimitations might have been present? Were these reported or even known to the researchers? Tuncman (2020) found that threats to the internal validity of research were not reported and possibly not addressed in 94% of the literature he reviewed (p. 120). Also, quasi-transfer of training studies, the lack of peer review for findings in accident reports, variations and limitations of simulator motion drive algorithms, and methods used in research can create a perfect storm to yield research or a training program that, on the surface, appears robust, but may not be providing the outcome that was intended. As is described in Chapter 4, the goal of training is to transfer what is learned in training to the real world. This depends on the ability of the training environment to be generalized to the real world and the general population. In this chapter, we review the limitations that researchers and those working in training development should consider in the hopes that knowledge can improve both the quality of the research and the training.

INTRODUCTION TO SIMULATOR MOTION Although flight simulators are used for various purposes, from initial aircraft handling qualities to development of pilot procedures, flight deck design, airspace management, and accident investigation, this chapter will address their use relevant to pilot training. There has been debate in research circles about whether or not simulator motion is beneficial in a training context (Burki-Cohen et al., 1998; BurkiCohen & Go, 2005; Winter et al., 2012). The goal for a simulator in a training environment is to provide cognitive fidelity, which is the extent that the simulator can require the pilot to engage in the “same sort of cognitive activities … as the actual modern fight deck” (Kaiser & Schroeder, 2003, p. 440). To assure that the simulator fidelity is beneficial for pilot training, it needs to be valid, meeting the goal of accurately modeling the real aircraft to the extent necessary to accomplish the goals for transfer of training reliably. To identify the higher-level fidelity necessary for training, the simulators are evaluated with either rating methods or quasitransfer methods. “In a quasi-transfer experiment, transfer of training is tested in the simulator with motion as a stand-in for the airplane” (Burki-Cohen & Go, 2005, p. 6). What are the limitations of these approaches?

The Hidden Limits in Flight Simulation and Training

143

Much of the research for flight simulator motion has involved quasi-transfer exercises in testing the validity of simulator motion. In such a study, transfer of training is tested using the same simulator throughout the research and not on an aircraft. This is because testing in an aircraft presents many challenges, some obvious ones such as cost and safety for certain maneuvers and the control of extraneous variables. The ability, in research, to control aspects such as daylight or darkness and other factors makes simulation ideal for ensuring reliability and replicability of studies. Unfortunately, the same aspects represent threats to internal validity that need to be addressed. The increased control over the experiment comes at a cost to external validity (Ary et al., 2010). A simulation is, in the end, a simulation. It is necessarily a model of reality. As George E. P. Box stated, “Since all models are wrong, the scientist must be alert to what is importantly wrong” (Box, 1976, p. 792). Understanding what is “importantly wrong” with simulation is crucial for any organization conducting training for pilots using these high-technology devices. Flight simulation fidelity requirements for aircraft certification require higher standards, although these are not codified. The simulators used for training must meet regulatory standards. In the United States, these requirements are delineated in Title 14 CFR 60 of the Federal Aviation Administration (FAA) regulations. These regulations require that the key simulator equipment is able to duplicate the feel and appearance of the actual aircraft (FAA, 1992; Rehmann et al., 1995). To accomplish this, advanced modern simulators often use actual aircraft components in the cockpit, these include the same instruments and controls as the real airplane. The simulators also model the visual information and add motion to increase the fidelity of the experience for pilots (Kaiser & Schoeder, 2003). The main point of this is to emulate the control and feedback that pilots experience in the real world. As described in system engineering, pilots (controllers in the system) use their control to change the state of the aircraft and feedback to know what the current state is and what corrections they might need to make (Checkland, 1971). Even if the instruments and controls are the exact components used in a real airplane, the simulation is still a model. As stated by Weisberg (2012), “A model is similar to its target…when it shares certain highly valued features … and when the target doesn’t have many significant features the model lacks” (pp. 144–145). The instruments and components are being “fed” information from an artificial world that  is simpler than the real world (Kaiser & Schroeder, 2003). This affects the simulation in several important ways, which include both visual and motion. The visual systems are not as varied and detailed as in the real world; using several “cheats” to somewhat mitigate these differences, such  as  using “actual components of the environment, most notably the cockpit interior” (p. 446). Adding in the indications and “force feedback to the controls complete the emulation” (p. 446). The second “cheat” as described by Kaiser and Schroeder is “to not even try” (p. 447). Simulators use computer-generated imagery, which, although it looks good, is nowhere close to reality. Still, it is “good enough” to create impressions of aircraft motion in many cases. Nonvisual senses, coming from the inner ear,

144

Human Factors in Simulation and Training

skin, and joint receptors, supplement the visual sense, and are particularly useful for detecting aircraft aerodynamic, propulsive, and gear-specific forces. These specific forces are immediately perceived with motion and they provide the pilot with immediate feedback to make a much quicker control response (Kaiser & Schroeder, 2003). In simple terms, to design how the simulator should move, the aircraft model is programmed with data provided by the aircraft manufacturer, from which one can infer how the real aircraft would move, or more specifically, how the cockpit of the aircraft would move, in the real world. All modern simulators are programmed to replicate flight by moving the simulator in various ways. The design of that motion is a motion-drive algorithm. Motion cueing algorithms transform that data into simulator motion (Telban et al., 2000; Telban & Cardullo, 2005). Modern advanced flight simulators use hexapod actuators to create the motion necessary to create the sensations of flight for pilots in all six possible directions (pitch, roll, yaw, heave, surge, and sway). The amount of motion that can be provided is limited by the physical length of the hexapod actuators. Combinations of the actuator motions are used to mimic flight sensations. Designers employ several strategies to maximize the value of the motion given these limitations. Within the human vestibular system, the otolith organs sense the three translational specific forces (i.e., heave, surge, sway). As the otoliths cannot differentiate between platform acceleration and tilt, the designer of the motion can take advantage of this to “trick” the human brain into believing that acceleration is occurring when it is not (Telban & Cardullo, 2005). One technique is to move the platform just “a percentage of the full motion” (Kaiser & Schroeder, 2003, p. 456). An additional technique is to move the platform at high frequencies (p. 456). High-frequency accelerations, unlike lowfrequency accelerations, do not require a considerable simulator platform motion. The use of the simulator roll and pitch can be used to replicate  low-frequency accelerations that would require considerable platform motion. As an example, consider a takeoff, which has significant surge acceleration. After applying full thrust in the simulator, the simulator cab moves forward to mimic the aircraft motion. However, the simulator will quickly reach the physical limits of its actuators as the real aircraft continues to accelerate down the runway. The “trick” now is to slowly pitch up the simulator cab. With this pitch-up motion, a pilot’s back pushes on the seat, just as would happen on a takeoff. This pitch-up motion has to be done at a slow rate, one below the pitch rate threshold of the inner ear; otherwise, the pilot will believe the aircraft is pitching during the takeoff, which would be a false cue. This combination of fore-aft simulator motion and cab tilt is often a compromise. One consequence of these compromises is that it can adversely impact a pilot’s ability to recover from some very dynamic conditions, such as Dutch roll in the lateral-direction axes, or even respond to certain upsets or failure modes (Kaiser & Schroeder, 2003). In fact, the assumption that “incorrect motion cues are worse than no motion cues,” especially for large-amplitude motion, has influenced the use of motion in centerline thrust military simulators such that most do not have moving platforms (Allerton, 2009).

The Hidden Limits in Flight Simulation and Training

145

As the design of the motion cueing algorithms requires trade-offs, choices need to be made on what to optimize. In some cases, this has resulted in motion that adversely affected training. Wu and Cardullo (1997) reported that, in response to this, sometimes motion is even turned off to eliminate the problem. The optimal motion algorithm is actually dependent on what facet of flight is trying to be replicated (Zaal et al., 2015). Although a full discussion of the advantages and disadvantages of each type of motion algorithm is outside the scope of this chapter, the point here is that the differences should not be ignored. Motion algorithms are chosen based on a combination of compromises, which include assumptions about which aspects of flight are the most important, physical limitations of the flight simulator platform, the costs, the availability of the data from the manufacturer, and more (Nahon  & Reid, 1990). Can using a less-than-optimal motion cueing algorithm affect the transfer of training? Can the choice of the model affect the outcome? Schroeder (2012) identified several challenges to increasing simulator fidelity to simulate aircraft upsets and the difficulties pilots can face while attempting to recover. Proper simulation can make a difference. In one evaluation where airline pilots were placed into a flight simulator that contained much more accurate (but still necessarily limited due to physical constraints of simulation) aircraft stall characteristics, “less than one-quarter” of those pilots performed the recovery correctly (Schroeder et al., 2014, p. 15). It was postulated that the cues in the stall with updated algorithms (which were more closely aligned with the real aircraft) may have created a surprise effect that influenced the pilot’s performance (2014). As previously described, quasi-transfer of training tests motion against itself. If the motion model used is itself not valid, that can result in skewed results that are then published in the research. Those that are designing training programs need to be aware of these issues. The choice of which model is used may be a limitation or a delimitation, or possibly both.  Researchers should report these aspects, and those who are using the research should look for these, and if they are not reported, consider the possible impact of these factors. Consider that the assumptions made about pilot behavior and the choice of which model to use to balance the different needs can directly affect flight safety. Accidents often result from flawed assumptions (Leveson, 2015), so the implications of these problems have a much deeper aspect than just academic integrity.

HOW DO HUMANS PERCEIVE MOTION DRIVE? Motion is perceived by pilots through their vestibular, visual, and proprioceptive systems (Hosman & Advani, 2016). “Humans sense motion through the vestibular system located in the inner ear, which consists of semi-circular channels sensing angular motion and otoliths sensing translational motion” (Volkaner et al., 2016, p. 2). In addition to sensing motion, these are also responsible for humans’ ability to balance. The visual system can perceive images, depth, slope, and changes in visual feedback, which all contribute to motion perception in simulators. This includes optical workflow useful in predicting velocity, distance, and relative spatial range. As the proprioceptive system is responsible for body awareness – being able to sense where the body is positioned in space – the motion model is critical.

146

Human Factors in Simulation and Training

There are many advantages and disadvantages to flight simulators. Myers et al. (2018) describe the advantages as providing a safe way to practice routine emergency procedures, reduced training costs, reducing the carbon footprint, and providing a research platform. There are also disadvantages. Although training costs are reduced, flight simulators are still a costly purchase and can be overregulated. Simulators can also cause pilots to become less motivated because their stress levels are lowered because they know that they are not in a real aircraft. It can also induce adaptation and compensatory skills. Another disadvantage is that poor motion cueing can be a cause of motion sickness such as “sweating, fatigue, dizziness, and vomiting” (Myers et al., 2018, p. 4). Research suggests that pilot tracking behavior and performance was  improved when motion feedback was available (Hosman & Advani, 2016). However, Jones (2018) suggests that aspects of motion systems in flight simulation training devices may not be necessarily suited for pilot training because “the actual response of the motion platform to pilot control is not objectively considered” (p. 488-489). This is because the motion tuning is conducted only through a simulator assessment pilot and subject matter experts (SMEs) without manufacturer constraints. Therefore, the outcome of the motion tuning process is highly reliant on the evaluation pilot and simulator software engineer who assess the final configuration with changes to the motion parameter settings occurring from open-loop and closed-loop testing with SMEs and a motion filter expert (Hosman & Advani, 2016; Jones, 2018). This outcome is often less than ideal because of the process being dependent on humans with the potential for variability and errors.  This has brought about research in motion criteria to determine motion fidelity. Myers et al. (2018) describe the transfer of training as the process that a pilot goes through in acquiring knowledge, abilities, or skills from the simulator training. This describes how the training in the simulated environment transfers into the real-world environment. There was more emphasis placed on the importance of calibrating motion cueing systems when conducting transfer of training research that was generalizable versus just for the specific simulator that was part of a specific study (Hosman & Advani, 2016). Volkaner et al. (2016) examined human perception and how to improve it in flight simulators. Myers et al. (2018), referencing Vaden and Hall (2005), suggest that pilot performance is improved using a motion simulator if their performance depends on motion in flight, e.g. pilots desired motion simulators for the task of controlling an unstable aircraft. A lack of motion made it difficult for pilots to develop flight control strategies successfully. Another aspect of improving the transfer of training is for instructors to explain to their flight students the differences between the simulator and aircraft so that pilots are mindful of this. Knowing where simulators are weak in terms of lacking realism is a necessary step for future pilot performance. Fidelity is another challenge with regard to simulators. “Fidelity, the degree to which the simulator looks like the real aircraft and the similarity to which it acts like the real aircraft, is closely linked to training transfer” (Myers et al., 2018). Fidelity can be further broken down into three components: physical fidelity, cognitive fidelity, and functional fidelity. Physical fidelity comes down to the look and feel of the

The Hidden Limits in Flight Simulation and Training

147

simulator in comparison to a real aircraft and associated flight deck. As a simulator cannot fully simulate real-world motion, physical fidelity is critical to ensure that the motion feel is realistic.  Cognitive fidelity is how well the simulator can make the pilot have a similar cognitive state as when in a real aircraft. The cognitive state includes how realistic situational awareness, decision-making, anxiety, and stress are compared to flying a real aircraft. The last part of fidelity is functional fidelity. This is merely how well the simulator acts in comparison to an actual aircraft’s equipment. The next section focuses on fidelity.

LET THEM EAT HUMBLE PIE—OR NOT? A common self-imposed limitation in current training using high-fidelity simulators is the avoidance of consequences of trainee performance, i.e., letting the simulator “crash” or come close to this following trainee actions or inactions. This is rarely documented or stated policy but rather accepted practice in most pilot training environments across the industry. Overall, this can be seen as reflective of a more positive training culture than previous generations of pilots got to experience, i.e., more focused on developing pilot competencies than checking them. Still, in many other domains failure can be seen as an integral and productive way of learning, and most people have experienced this in learning writing, sports, music, etc. Specific examples of studies on this can be found for learning mathematics (Kapur, 2014) and other STEM topics (Simpson & Maltese, 2017), business studies (P.R. & Gupta, 2019), and art and design (Sawyer, 2018). A meta-study of failure and learning from failure as an instructional strategy (Darabi, Arrington & Sayilir, 2018) found 187 publications, with 62 of relevance, but only 12 based on an experimental design. The limited number of experimental studies demonstrated a lack of research on the role of failure for learning, but these studies overall showed a moderately positive result for learning from failure. There is even a specific learning design named “Productive Failure” (Kapur, 2015). Concerns about how failure in the simulator may affect the self-confidence of a pilot are relevant but should be put in context. When this concern is expressed, it often relates to how pilots, following a severe failure in the simulator, react in a challenging in-flight situation. The potential risk is then that a lack of selfconfidence may result in hesitation or lack of decisive action to resolve the situation. Self-confidence is certainly a desired trait in pilot selection (ECA, 2013), and in challenging situations, it can be supportive of taking action and a successful outcome. However, it should be remembered that pilot overconfidence has contributed to accidents, arguably more often than lacking self-confidence, most recently linked to the accident of PIA 8303 (BBC, 2020). In the context of recent aviation history, Crew Resource Management (CRM) training was in part introduced and promoted to counter the self-confidence of captains who would not let first officers question their actions. The argument to favor confidence over consequence by not letting pilots fail severely in the simulator implies that pilots would have fragile egos. In contrast, the industry experience implies that overconfidence is the bigger problem.

148

Human Factors in Simulation and Training

Considerations about letting pilots fail severely in the simulator must consider the context of pilot training. As mentioned previously, failure in simulator training can lead to not only re-training, but if repeated, also to actions regarding performance and ultimately loss of employment and licence. It is no surprise that pilots and their trainers, who normally are colleagues in an airline, want to avoid severe failure. For the airline, the administration and re-training related to failure requires resources and adds cost. There is nothing inherently positive about the experience of a serious safety event or an accident in the simulator. Designing scenarios that reliably produced such outcomes would be counterproductive to effective learning and successful training outcomes. On the other hand, there is most likely nothing inherently positive about avoiding this experience if prompted by poor pilot performance. Provided that the framework and setup for the training are supportive of learning from errors and failure, seeing the consequences of shortcomings in performance can be a motivating learning experience. While avoiding severe failure makes sense in the current context of pilot training in the industry, it is possible to imagine a different context, where this limitation was removed and just seen as a consequence that can be used for learning – like in many other domains.

SAME, SAME BUT DIFFERENT With the accident of Air France 447 (BEA, 2012) “startle effect” came into focus as a contributing factor in accidents. It was seen as contributing to “The excessive nature of the PF’s inputs” (p. 175), and “Poor management of the startle effect” was stated as having “generated a highly charged emotional factor for the two copilots” (p. 203). Also highlighted was “its potential to disrupt the ability of the crew to work as a team” (p. 188), and in conclusion, the accident report stated that “The startle effect played a major role in the destabilisation of the flight path and in the two pilots understanding the situation” (p. 211). In response, BEA (p.157-158) refers in section “1.18.4.6 Work currently underway on simulator fidelity and training” to a working group and one of its conclusions: The design of the training must be such that it generates surprise and startle effect to teach the pilots how to react to these phenomena and how to work in stressful situations, in order to prepare the trainees for the actual operating environment.

Since around the time of AF 447, startle and the related concepts of surprise have been identified as contributing factors in accidents, e.g., Colgan Air 3407 (NTSB, 2010), Air Asia 8501 (KNKT, 2014), Lion Air 610, and Ethiopian 302 (Committee on Transportation and Infrastructure, 2019), as well as implied at the time or in hindsight for many more. Not that startle or surprise was not known before this time, “automation surprise” had been a similar contributing factor to accidents previous to this, especially related to the increase in flight deck automation from the 1980s and forward (Dekker, 2009). It should be noted that startle and surprise are not the same, although they can be overlapping. As outlined by Landman (2019), startle is primarily related to the intensity of input, whereas surprise is related to a mismatch

The Hidden Limits in Flight Simulation and Training

149

between what is expected to happen and what actually happens in a situation. Startle and surprise are, however, at times used without this distinction when describing pilots’ reactions to unusual or unexpected events (Figure 6.1). The paradox of our times is that as aviation continues to improve its safety record, startle and surprise will remain a challenge and probably one of increasing importance to address. Referring to Figure 6.1 (Dahlstrom, 2019), experience has always been seen as leading to expertise for pilots. Accumulating flight hours was the way to gain experience, and given the exposure to different events to manage, this would lead to expertise. This assumption has been well aligned with the reality of the pilot profession for generations of pilots. However, if exposure to unusual and risk events decreases, it may take longer for experience to lead to expertise. Also, we can expect that events that happened relatively often for previous generations of pilots may these days be a source of startle and surprise, e.g., considering increasingly reliable technology and automation. The paradox is then that the safer we get, the more we may have to train pilots to be able to manage unexpected and unusual situations to avoid negative consequences of startle and surprise. The problem here is that a high-fidelity simulator does not perfectly replicate real flight regarding some aspects of flying, as is part of the focus of this chapter, but the high-fidelity simulator may not produce the same effect in regard to startle and surprise. Dahlstrom and Nahlinder (2007) illustrated this by using heart rate to measure workload (see Figure 6.2). The simulator, in this case, was not high fidelity, but the difference in heart rate shows that in an aircraft peak mental workload occurs

FIGURE 6.1  Argument about pilot expertise, slide from presentation at conference (Dahlstrom, 2019).

150

Human Factors in Simulation and Training

FIGURE 6.2  Average heart rate for eight participants during the flight segment rejected takeoff. Solid lines are aircraft flight, dashed lines are simulator flight (Dahlstrom & Nahlinder, 2007).

when managing a rejected takeoff, rather than when it was called, as was the case in the simulator. Dahlstrom and Nahlinder (2007) found more differences of a similar kind, and while these differences can be interpreted and challenged, the fact that there is a difference in the reactions of pilots between the aircraft and the simulator remains. This is not surprising to any experienced pilot, and should not be to anyone else, as the reality of being in an aircraft represents risks that are not present in the simulator (although the threat of being checked in the simulator adds some risks which are not present in the aircraft). The difficulty of producing startle or surprise effects in a simulator does not only have to do with the different contexts or the challenge to reproduce in-flight conditions in the simulator. It also has to do with how the simulator is used for pilot training. Anyone who has worked with pilot training knows that as soon as new training has been implemented, the first trainees will tell their other pilot colleagues about the exercises used in training. Training departments have tried to get around this by keeping training exercises secret, decreasing explicit and detailed descriptions of the training, providing multiple alternative scenarios for trainers to choose from, etc. Even so, given that a pilot’s career depends on successful simulator training and checks, few will arrive at the training without some idea about what will play out in the session in the simulator. This means that it remains an industry challenge to realistically reproduce startle and surprise effects in training to prepare pilots for what

The Hidden Limits in Flight Simulation and Training

151

can happen in real flight. This is and remains one of the current limits of simulator training. There has been a trend in recent years that airlines, some using the framework of evidence-based training (EBT), are moving to training that does not penalize failure and allows for learning from errors. Still, the professional pilot culture and industry culture are highly protective of performance standards and far from accepting with regard to errors and failures in training. This is as it should be but may continue to make it difficult to use high-fidelity simulators for effective training of pilots on how to manage effects of startle and surprise. Pilots need to be confident that they are supported to learn from what goes wrong in startle and surprise scenarios, rendering preparation for them counterproductive. The limitation of simulators in this respect can only be overcome by using them differently for training, in a manner that allows for the effects of startle and surprise to play out as fully as possible.

FIDDLING WITH FIDELITY WHILE MISSING THE STORY One hidden limit with current high-fidelity simulation stems from its success in convincingly providing a replication of an aircraft’s real-life environment and performance. It is indeed very close to the “real thing,” and for most involved in-flight simulation, this can only be a “good thing.” However, this replication of reality via high-fidelity environments and performance can lead to a focus on ever more detailed realistic features (Jackson, 1993) and limit in-depth considerations of training content and methodology. Dahlstrom, Dekker, van Winsen and Nyce (2009, p. 309) expressed this problem in this way: The emphasis on photorealism in visual and task contexts may retard or limit the development of skill sets critical for creating safety in domains where not all combinations of technical and operational failure can be foreseen or formalized (and for which failure strategies then cannot be proceduralised and simulated). The assumption that photorealism can capture all possible naturalistic cues in addition to the skills necessary to act competently in these domains may be overly optimistic. Competencies the aviation community recognises as important and significant (e.g. communication, coordination, problem solving, management of unanticipated and escalating situations) are thought to emerge directly from context-fixed simulator training. It is assumed that photorealism can achieve these ends.

Caird (1996, p. 127) stated that “For decades, the naïve but persistent theory of fidelity has guided the fit of simulation systems to training.” This theory seems to still be influential in the aviation industry. What it is missing is that the most important aspect of a training scenario is the fidelity of “the story.” When one pilot in a group speaks up with words such as “This is what happened to me the other day,” something happens in the room. A low-level simulation has started, with all other pilots in the room transported into the situation, thinking about what they would have done if they were there. The same can happen when a pilot reads a report from an event or when it is presented in training. It is the fidelity of the story, in this case, its realism

152

Human Factors in Simulation and Training

and relevance, that creates the simulation in the minds of the pilots, even without any tool to support this. With the current focus on evidence-based training, operational events are often turned into training scenarios. There is indeed a focus on making the best possible use of scenarios in training departments across the industry. Still, the impact on training from carefully choosing and constructing scenarios seems to be underestimated, especially regarding the time and effort it takes to fully develop a credible and effective training scenario. This may be due to regulatory requirements (focusing more on the level of a simulator than content), limited training resources (time, manpower, etc.), limited knowledge about the use of scenarios (e.g., how to build an effective scenario from a human factors perspective), and in general limited knowledge about learning and training beyond traditional methods of pilot training. Also, it may be due to reliance on the idea that the high fidelity of the full flight simulator will be enough to make a scenario effective. The untapped potential of cleverly constructed scenarios has been demonstrated by the effectiveness of simple training tools and lower fidelity levels of simulation throughout aviation history. More recently, this has been shown in research and in the industry with the use of Mid-Fidelity Simulation (MFS) scenarios (Dahlstrom, 2020). These are scenarios in the style of “microworlds,” presented on a screen and with simplified representations of environments, whether they are domain-specific (representing aircraft controls and displays) or represent other environments (an industrial plant, a ship, a spacecraft, etc.). One example of this is the use of a singleparameter simulation in a simulation named “Cold Store,” which put pilots in a new problem-solving situation and allowed the collection of data that was then used to identify patterns of problem-solving style (Rosa et al., 2021). Another example comes from simply putting a moving aircraft symbol on a map, inserting a loss of cabin pressure, and then allowing for search for information related to a diversion decision simulates most of the cognitive process of such a decision (Dahlstrom, 2020). These applications are low on environmental fidelity but high on the fidelity of the story, making them cost-effective training tools. They show that deeper thinking about scenarios still offers untapped potential for improving training, but this requires going beyond reliance on simulator fidelity as a proxy for quality of training. The critical importance of advanced high-fidelity simulation for pilot training and thus for aviation safety cannot be overstated. Still, today the reality is that while current high-fidelity simulation remains the most fundamental and powerful training tool the industry has available, the continued quest for incremental increases in fidelity does not seem to represent the most productive focus to further improve training. It should be possible to agree that high-fidelity simulation is well-designed, tried, and tested as a tool. At this point, we should start balancing how much time we are spending on improving the tool with how much time we spend on asking ourselves “how” we can use it better to achieve training outcomes that make operations safer and more efficient. The success of high-fidelity flight simulation should be a foundation for effective pilot training, not a limitation for further development of it.

The Hidden Limits in Flight Simulation and Training

153

A PERFECT TOOL OR A TOOL THAT CAN BE PERFECTED? The limitations outlined in this chapter do not take away from the critical contribution of all levels of simulation to the improvement of aviation safety, especially the advanced high-fidelity simulation available today. However, even for an almost perfect tool, it is useful, and at times necessary to be aware of its limitations so it can be used optimally. There is potential for further perfection of simulation by using data from simulators and other technology that can augment data collection to gain an in-depth understanding of pilot performance. With Simulator Operational Quality Assurance (SOQA), data from simulator training sessions can be collected, analyzed, and used to provide feedback to the individual pilot as a trainee and the whole organization. This has been shown with the promotion of the tool CAE Rise (CAE, 2021) and a similar application for collecting and presenting simulator data. According to EASA (2019, p.45), SOQA is “still in its infancy, but it offers interesting prospects.” This is probably an understatement, as having extensive data from simulator sessions would offer the opportunity not only to see what pilots did in training but also by inferences make it possible to analyze why they took certain actions. Eye-tracking equipment has been used in research for a long time to understand pilot performance, but new, high-precision, and nonintrusive solutions (no need to wear any equipment) can, with simple means, be integrated into simulators. Recent research (Niehorster et al., 2020) and proof of concept trials at airlines (Behrend, 2020, Cameron & Nolan, 2018; Behrendt, 2020; Saleh & Myers, 2020) have demonstrated the potential this holds for pilot training. However, eye-tracking has still not yet been integrated into a day-to-day operational training environment and holds the potential to improve understanding and improvement of pilot performance. Finally, speech analysis could be used to provide an additional source of data (Bresee, 2019). Taken together, data from the simulator, eye-tracking, and speech analysis could provide new insights on pilot performance as well as support trainer focus on competencies rather than on SOP compliance (which would be recorded via the combined data from interaction with the simulator, gaze data from eye-tracking and recording of callouts). This could bring simulator training to the next level regarding the in-depth understanding of pilot performance and behavior and assist trainers’ efforts to support and develop their trainees. Although there are hidden limitations for simulators, the potential for further improvement of them as training tools is in plain view.

SUMMARY This chapter focused on the use of flight simulators for pilot training and specifically addressed the motion argument in the context of training. The chapter consisted of five sections. These sections discussed an introduction of simulator motion, human perception in motion drive, high-fidelity simulator limitations in current training, contributing factors in accidents caused by limitations, and how simulators are useful but can also be improved. There are many advantages and disadvantages to flight

154

Human Factors in Simulation and Training

simulators. In order to assure that the simulator fidelity is beneficial for pilot training, it needs to be valid, meeting the goal of being able to accurately model the real aircraft to the extent necessary to accomplish the goals for transfer of training, which is how the training in the simulated environment transfers into the real-world environment. The authors hope to have provided the reader with insight into current limitations and essentially existing gaps within high-fidelity training environments. It is our hope that these simulators can continue to be improved to enhance pilot performance with regard to aviation safety, especially when making safety-critical decisions. Optimizing pilots’ transfer of training with regard to high-fidelity training environments is crucial and continues to be an essential field of research.

ACKNOWLEDGMENT The authors wish to acknowledge Jeffery Schroeder for his contribution in reviewing this chapter and providing input.

REFERENCES Allerton, D. (Ed.) (2009). Principles of Flight Simulation. Chichester, UK: Wiley. Ary, D., Jacobs, L. C., Sorensen, C., & Razavieh, A. (2010). Introduction to Research in Education, 8th edition. Canada: Wadsworth Cengage Learning. BBC. (2020). Pakistan Plane Crash was “Human Error” – Initial Report. Retrieved 2021-0417: https://www​.bbc​.com​/news​/world​-asia​-53162627 BEA – Bureau Bureau d’Enquêtes et d’Analyses pour la sécurité de l’aviation civile. (2012). Final Report on the Accident on 1st June 2009 to the Airbus A330-203Registered F-GZCP Operated by Air France Flight AF 447 Rio de Janeiro. Retrieved 2021-04-11: https://www​.bea​.aero​/docspa ​/2009​/f​-cp090601​.en ​/pdf​/f​-cp090601​.en​.pdf Behrend, J. (2020). The Influence of Role Assignment on Pilot Decision-Making: An EyeTracking Study. Presentation at the 73rd International Air Safety Summit, Arranged by Flight Safety Foundation, 19–22 October 2020, Online. Alexandria, VA: Flight Safety Foundation. Box, G.E. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791–799. Bresee, J. (2019). Beyond Freeze and Reset: Objectively Assessing Crew Interaction Without Interrupting Task Performance. Presentation at Asia Pacific Aviation Training Symposium (APATS), 3–4 September 2019, Singapore. Retrieved 2021-04-17: https:// apats​-event​.com ​/wp​-content​/uploads​/2019​/09​/Jerry​-Bresee​.pdf Bürki-Cohen, J., Soja, N. N., & Longridge, T. (1998). Simulator Platform Motion-The Need Revisited. The International Journal of Aviation Psychology, 8(3), 293–317. Bürki-Cohen, J., & Go, T. (2005). The Effect of Simulator Motion Cues on Initial Training of Airline Pilots. AIAA Modeling and Simulation Technologies Conference and Exhibit, 15–18 August 2005, San Francisco, CA. Retrieved 2021-03-01: https://doi​.org​/10​.2514​ /6​.2005​- 6109 CAE. (2021). CAE Rise™ Training System. Information of website of CAE. Retrieved 202104-17: https://www​.cae​.com ​/civil​-aviation​/aviation​-software​/cae​-rise/ Caird, J. K. (1996). Persistent Issues in the Application of Virtual Environment Systems to Training. Proceedings of HICS’96: Third annual symposium on human interaction with complex systems. Los Alamitos, CA: IEEE Computer Society Press, 124–132. Cameron, M., & Nolan, P. (2018). Eyes Are Never Quiet – Towards Eye Tracking as a Practical Training Tool. Presentation at the 71st International Air Safety Summit, Arranged by

The Hidden Limits in Flight Simulation and Training

155

Flight Safety Foundation. Retrieved 2021-04-17: https://flightsafety​.org​/wp​-content ​/ uploads​/2018​/11​/Cameron​-day​-2​-IASS​-20181112​-Eye​-Tracking​-Presentation​.pdf Checkland, P. (1971). Systems Thinking, Systems Practice. New York: Wiley. Committee on Transportation and Infrastructure. (2019). Hearing Before the Subcommittee on Aviation of the Committee on Transportation and Infrastructure, House of Representatives, One Hundred Sixteenth Session. Retrieved 2021-04-16: https://www​ .govinfo​.gov​/content​/pkg​/CHRG​-116hhrg37277​/pdf​/CHRG​-116hhrg37277​.pdf Dahlstrom, N. (2019). Human Factors and CRM – Successes, Shortcomings and Solutions. Conference Presentation at Flight Operational Forum, 1–3 April 2019, Oslo, Norway. Retrieved 2021-04-12: http://www​.fof​.aero​/presentations​/F​/o​/F​/2​/0​/1​/9​/2019​_FoF​_0206​ _dahlstrom​.pdf Dahlstrom, N. (2020). The Future Training EcoSystem: A New Normal for Flight Crew Training. Presentation at the “International Air Safety Summit”, Arranged by Flight Safety Foundation. Presentation available from author. Dahlstrom, N., & Nahlinder, S. (2007). Mental Workload in Simulator and Aircraft During Basic Civil Aviation Training. International Journal of Aviation Psychology, 19(4), 309–325. Dahlstrom, N., Dekker, S., van Winsen, R., & Nyce, J. (2009). Fidelity and Validity of Simulator Training. Theoretical Issues in Ergonomics Science, 10(4), 305–314. Darabi, A., Arrington, T. L., & Sayilir, E. (2108). Learning from Failure: A Meta-Analysis of the Empirical Studies. Educational Technology, Research and Development, 66, 1101–1118. Dekker, S. W. A. (2009). Flight Crew Human Factors Investigation Conducted for the Dutch Safety Baard into the Accident of TK1951, Boeing 737–800 near Amsterdam Schiphol Airport, February 25, 2009. Retrieved 2021-04-13: https://www​.onderzoeksraad​.nl​/nl​/ media ​/inline​/2020​/1​/21​/ human​_factors​_ report​_s​_dekker​.pdf De Winter, J. C., Dodou, D., & Mulder, M. (2012). Training Effectiveness of Whole Body Flight Simulator Motion: A Comprehensive Meta-Analysis. The International Journal of Aviation Psychology, 22(2), 164–183. EASA. (2019). “Breaking the Silos” – Fully Integrating Flight Data Monitoring into the Safety Management System. Report from European Operators Flight Data Monitoring Forum, Woking Group C. Retrieved 2021-04-17: https://www​.easa​.europa​.eu​/sites​/ default ​/files​/dfu​/ BreakingTheSilos​%20Issue​.pdf ECA – European Cockpit Association. (2013). Pilot Training Compass – “Back to the Future”. Retrieved 2021-04-17: https://www​.eurocockpit​.be​/sites​/default​/files​/eca​_pilot​ _training​_compass​_back​_to​_the​_future​_13​_0228​.pdf Federal Aviation Administration (FAA). (1992). Advisory Circular Number 120-45A. https:// www​.faa​.gov​/documentLibrary​/media ​/Advisory​_Circular​/AC ​_120 ​- 45A​.pdf Hosman, R., & Advani, S. (2016). Design and Evaluation of the Objective Motion Cueing Test and Criterion. The Aeronautical Journal, 120(1227), 873–891. http://dx​.doi​.org​.portal​ .lib​.fit​.edu​/10​.1017​/aer​.2016​.35 Jackson, P. (1993). Applications of Virtual Reality in Training Simulation. In K. Warwick, J. Gray, & D. Roberts (Eds.), Virtual Reality in Engineering (pp. 121–136). London: The Institution of Electrical Engineers. Jones, M. (2018). Enhancing Motion Cueing Using an Optimisation Technique. The Aeronautical Journal, 122(1249), 487–518. http://dx​.doi​.org​.portal​.lib​.fit​.edu​/10​.1017​/ aer​.2017​.141 Kaiser, M. K., & Schroeder, J. A. (2003). Flights of Fancy: The Art and Science of Flight Simulation. In M. A. Vidulich & P. S. Tsang (Eds.), Principles and Practice of Aviation Psychology (pp. 435–471). Mahwah, NJ: Lawrence Erlbaum Associates. Kapur, M. (2014). Productive Failure in Learning Math. Cognitive Science – A Multidisciplinary Journal, 38(5), 1008–1022.

156

Human Factors in Simulation and Training

Kapur, M. (2015). Learning from Productive Failure. Learning: Research and Practice, 1(1), 51–65. KNKT (Komite Nasional Keselamatan Trasportasi) – National Transportation Safety Committee of Indonesia. (2014). Final Report KNKT.14.12.29.04 PT. Indonesia Air Asia Airbus A320-216; PK-AXC Karimata Strait Coordinate 3°37’19”S–109°42’41”E Republic of Indonesia 28 December 2014. Jakarta, Indonesia: KNKT. Retrieved 2021-01-06: https://www​.bea​.aero​/uploads​/tx​_elydbrapports​/ Final​_Report​_ PK​-AXC​ -reduite​.pdf Landman, A. (2019). Managing Startle and Surprise in the Cockpit. Ph.D. thesis at Delft University. Delft, Netherlands: Delft University. Retrieved 2021-04-14: https://research​ .tudelft​.nl​/files​/55707836​/dissertation​_startle​.pdf Leveson, N. (2015). A Systems Approach to Risk Management Through Leading Safety Indicators. Reliability Engineering & System Safety, 136, 17–34. Myers III, P. L., Starr, A. W., & Mullins, K. (2018). Flight Simulator Fidelity, Training Transfer, and the Role of Instructors in Optimizing Learning. International Journal of Aviation, Aeronautics and Aerospace, 5(1), 6. http://dx​.doi​.org​.portal​.lib​.fit​.edu​/10​ .15394​/ijaaa​.2018​.1203 National Transportation Safety Board (NTSB). (2010). Loss of Control on Approach, Colgan Air, Inc., Operating as Continental Connection Flight 3407, Bombardier DHC-8400, N200WQ, Clarence Center, New York, 12 February 2009. NTSB/AAR-10/01. Washington, DC: NTSB. Retrieved 2021-04-16: https://www​.ntsb​.gov​/investigations​/ AccidentReports​/ Reports​/AAR1001​.pdf Nahon, M. A., & Reid, L. D. (1990). Simulator motion-drive algorithms-A designer’s perspective. Journal of Guidance, Control, and Dynamics, 13(2), 356–362. Niehorster, D. C., Hildebrandt, M. Smoker, A. Jarodzka, H., & Dahlstrom, N. (2020). Towards eye tracking as a support tool for pilot training and assessment. Eye-Tracking in Aviation. Proceedings of the 1st International Workshop (ETAVI 2020), 17–28. Retrieved 2021-04-17: https://www​.research​-collection​.ethz​.ch ​/ bitstream ​/ handle​/20​ .500​.11850​/407625​/ ETAVI​_paper​_2 ​_407625​.pdf​?sequence​=10​&isAllowed=y PR, S., & Gupta, D. (2019). Introducing Failure as a Deliberate Instructional Strategy to Enhance Learning and Academic Outcomes. In 2019 IEEE Tenth International Conference on Technology for Education (T4E), Goa, India, 67–70. Rehmann, A. J., Mitman, R. D., & Reynolds, M. C. (1995). A Handbook of Flight Simulation Fidelity Requirements for Human Factors Research. Crew System Ergonomics Information Analysis Center Wright-Patterson AFB OH. Rosa, E., Dahlstrom, N., Knez, I., Ljung, R., Cameron, M., & Willander, J. (2021). Dynamic Decision-Making of Airline Pilots in Low-Fidelity Simulation. Theoretical Issues in Ergonomics Science, 22(1), 83–102. Saleh, P., & Myers, R. (2020). How Eye Tracking Supported Tools Enhance “Monitoring” Training and Improve Flight Safety. Presentation at the 73rd International Air Safety Summit, arranged by Flight Safety Foundation, 19–22 October 2020, Online. Alexandria, VA: Flight Safety Foundation. Sawyer, R. K. (2018). Teaching and Learning How to Create in Schools of Art and Design. Journal of the Learning Sciences, 27(1), 137–181. Schroeder, J. (2012, August). Research and Technology in Support of Upset Prevention and Recovery Training. AIAA Modeling and Simulation Technologies Conference, 4567. Schroeder, J. A., Bürki-Cohen, J. S., Shikany, D., Gingras, D. R., & Desrochers, P. P. (2014). An Evaluation of Several Stall Models for Commercial Transport Training. AIAA Modeling and Simulation Technologies Conference, 1002.

The Hidden Limits in Flight Simulation and Training

157

Simpson, A., & Maltese, A. (2017). “Failure Is a Major Component of Learning Anything”: The Role of Failure in the Development of STEM Professionals. Journal of Science Education & Technology, 26, 223–237. Telban, R. J., & Cardullo, F. M. (2005). Motion Cueing Algorithm Development: HumanCentered Linear and Nonlinear Approaches. Telban, R. J., Wu, W., Cardullo, F. M., & Houck, J. A. (2000). Motion Cueing Algorithm Development: Initial Investigation and Redesign of the Algorithms. Tuncman, I. (2020). Assessing the Methodological Quality of Aviation Research: A Content Analysis of Articles Published in Subscription and Open Access Aviation Journals 2014–2016 [Doctoral dissertation, Florida Institute of Technology]. Scholarship Repository. https://repository​.lib​.fit​.edu​/ handle​/11141​/3217 Vaden, E. A., & Hall, S. (2005). The Effect of Simulator Platform Motion on Pilot Training Transfer: A Meta-Analysis. The International Journal of Aviation Psychology, 15(4), 375–393. https://doi​.org​/10​.1207​/s15327108ijap1504_5 Volkaner, B., Sozen, S. N., & Omurlu, V. E. (2016). Realization of a Desktop Flight Simulation System for Motion-Cueing Studies. International Journal of Advanced Robotic Systems, 13(3) http://dx​.doi​.org​.portal​.lib​.fit​.edu​/10​.5772​/63239 Weisberg, M. (2012). Simulation and Similarity: Using Models to Understand the World. Oxford: Oxford University Press. Wu, W., Cardullo, F., Wu, W., & Cardullo, F. (1997). Is there an optimum motion cueing algorithm? Modeling and Simulation Technologies Conference, 3506. Zaal, P. M., Schroeder, J. A., & Chung, W. W. (2015). Transfer of training on the vertical motion simulator. Journal of Aircraft, 52(6), 1971–1984.

7

Cybersickness in Immersive Training Environments Kay M. Stanney, Claire L. Hughes, Peyton Bailey, Ernesto Ruiz, and Cali Fidopiastis

CONTENTS Introduction............................................................................................................. 159 Cybersickness Background..................................................................................... 160 Individual Susceptibility and Stimulus Intensity.................................................... 163 Quantifying Immersive Stimulus Intensity............................................................. 168 Usage Protocol........................................................................................................ 170 Conclusions............................................................................................................. 171 Acknowledgments................................................................................................... 171 References............................................................................................................... 171

INTRODUCTION The availability of low-cost, high-quality wearable and mobile displays has led to a surge in application areas for immersive eXtended Reality (XR) technologies such as virtual reality (VR), augmented reality (AR), and mixed reality (MR) outside of traditional gaming genres, especially in training and education (Wu, Yu, & Gu, 2020). Immersive training systems surround a trainee within a multisensory environment, providing contextualized and situated learning opportunities with tangible objects to be manipulated and venues to be traversed. These training experiences can take on different forms. For example, VR aims to fully occlude the real world with a computer-generated world that fully immerses, isolates, and engages users in other than real-world reality. AR/MR, on the other hand, blend real and virtual worlds such that they coexist in the same space, allowing users to maintain awareness of the real-world while being presented with augmentations from beyond reality (Stanney, Nye, Haddad, Padron, Hale, & Cohn, 2021). All of these forms of extended reality, to a greater or lesser extent, allow for replication of relevant environmental (e.g., battlefield context) and task (e.g., applying a tourniquet) cues that may be necessary for training transfer. This blending of contextualized information and virtual cues has been shown to lead to faster knowledge acquisition and retention, increased

DOI: 10.1201/9781003401360-7

159

160

Human Factors in Simulation and Training

engagement and presence, and more immersive learning (Giamellaro, 2017; Lee, 2012). VR is perhaps best at immersion, as it allows an individual to experience a myriad of computer-generated worlds from a first-person perspective. AR is perhaps best at conveying knowledge, as it allows information to be overlaid onto a real-world location to instruct, advise, and inform. MR is perhaps best suited for internalizing physical behaviors, as it affords users the ability to embody psychomotor behaviors by requiring trainees to enact requisite actions (e.g., apply a physical tourniquet) on physical entities (e.g., a real manikin) while constructing knowledge and skills in a real-world context. Combining AR and MR allows for the coupling of embodied psychomotor behavior with information overlays that can guide knowledge acquisition tasks, which may speed skill acquisition and increase retention (Kahol, Vankipuram, & Smith, 2009; Levine et al., 2018). If AR/MR are further coupled with VR, true extended reality is achieved, which blends all three forms of technologically enabled reality together. Yet, even with modern XR headsets, minimizing cybersickness is still a potential barrier to the adoptability and utility of these state-of-the-art immersive technologies and their respective authoring tools (Caserman et al., 2021; Stanney, Fidopiastis, & Foster, 2020).

CYBERSICKNESS BACKGROUND There are many theories related to cybersickness, which is a form of motion sickness related to immersive environments (Keshavarz, Hecht, & Lawson, 2014; Stanney, Lawson et al., 2020). Sensory conflict theory, which is the most widely cited, suggests that cybersickness is due to mismatches between what the human sensory systems expect and what sensory cues in immersive environments provide (Reason, 1970, 1978; Reason & Brand, 1975; Nooij et al., 2017). Related is the evolutionary (or poison) theory, which suggests that the human brain has evolved such that any conflicts in real versus apparent motion stimuli are processed in the same manner as a toxin-related malfunction in the central nervous system (CNS) and thus they initiate an emetic response as a defense mechanism (Money, 1970; Treisman, 1977). The ecological theory suggests that postural instability is a necessary precursor to motion sickness (Riccio & Stoffregen, 1991). A relatively recent theory in the area is the multisensory re-weighting theory, which suggests that sensory dynamics can be re-shaped in XR to support adaptation to sensory perturbations, with those individuals better able to down-weight non-veridical XR cues being less susceptible to cybersickness (Weech, Calderon, & Barnett-Cowan, 2020). Taken together, these theories suggest that altered sensory cues, whether they be in the XR system (e.g., visual–vestibular mismatch, vergence–accommodation conflict) or in the human (e.g., postural instability, sensory re-weighting) drive cybersickness. Other individual differences beyond postural stability and sensory re-weighting capacity have been associated with cybersickness susceptibility, with demographic factors and technological variables each accounting for about half the variance in cybersickness (Rebenitsch & Owen, 2021). Previous experience with virtual motion, field of view (FOV), interpupillary distance (IPD), field dependence, female hormonal cycle, state/trait anxiety, migraine susceptibility,

Cybersickness in Immersive Training Environments

161

ethnicity, aerobic fitness, and body mass index have all been associated with cybersickness (Stanney, Fidopiastis, & Foster, 2020). One of the most common factors linked to cybersickness susceptibility is sex, but the results have been mixed: some have reported that females have increased susceptibility to cybersickness (Allen et al., 2016; Moroz et al., 2019; Shafer et al., 2019); others have suggested increased female susceptibility varies with experimental conditions (Munafo et al., 2017), while others have found no sex-linked risk for cybersickness (Al Zayer et al., 2019; Bracq et al., 2019; Clifton & Palmisano, 2020; Curry et al., 2020; Melo et al., 2018; Wilson & Kinsela, 2017). Recent research suggests that it may be the design of the head-worn displays (HWDs), specifically an inability to fit the anthropometrics of certain end users, that drives cybersickness and not sex (Stanney, Fidopiastis, & Foster, 2020). Cybersickness resulting from the exposure to immersive environments comprises a host of associated problems, including vomiting (about 1% in VR; none is expected in AR/MR because of the availability of real-world rest frames which resolve visual–vestibular mismatches), nausea, disorientation, and oculomotor problems, as well as sleepiness (i.e., sopite syndrome) and visual flashbacks (Stanney & Hughes, 2021; Stanney, Kennedy, & Hale, 2014). At least 5% of those exposed to VR will not be able to tolerate prolonged exposure (Lawson, 2014); but no such limitation is expected in AR/MR due to a lack of nausea-related symptoms. Thus, while VR is oftentimes self-limiting (i.e., users have to withdraw from exposure due to nausea and disorientation), AR is expected to be tolerable for very long duration exposure (Stanney & Hughes, 2021). Cybersickness effects can be quite pervasive. For example, ~80–95% of those exposed to VR and ~50–80% of those exposed to AR report some level of symptomatology postexposure, which may be as minor as a headache and visual fatigue to as severe as vomiting or intense vertigo (Stanney & Hughes, 2021; Stanney, Salvendy, et al., 1998). What’s more troubling is that the problems do not cease immediately upon cessation of exposure, posing lingering adverse physiological aftereffects that can affect the safety of users hours after exposure (Smyth et al., 2018; Stanney, Fidopiastis & Foster, 2020; Stanney & Kennedy, 1998). Table 7.1 provides a synopsis of the extent and severity of adverse effects associated with exposure to immersive training environments. Based on the studies summarized in Table 7.1, the most conservative prediction would be that a substantial proportion (>50%) of users, in general, would experience some level of adverse effects both during and after XR exposure, with prolonged aftereffects associated with safety concerns. Why does cybersickness matter? Immersive technologies such as AR, MR, and VR are entering the mainstream in industry and the military. The global XR market is estimated at ~$31 billion in 2021, and anticipated to grow to ~$300 billion by 2024 (Alsop, 2021). In terms of adoption, ~35% of businesses report having adopted XR technology at a small scale, with ~13% having adopted at a broader scale (Vailshery, 2021). Adoption has, however, been considerably slower than other interactive technologies (Doolani et al., 2020; see Figure 7.1). One has to wonder if people have been slow to embrace XR technology due, in part, to cybersickness.

162

Human Factors in Simulation and Training

TABLE 7.1 Problems Associated with Exposure to Immersive Training Systems • 80–95% of individuals interacting with a VR system report some level of side effects, with 5–50% experiencing symptoms severe enough to end participation, approximately 50% of those dropouts occurring in the first 20 min and nearly 75% by 30 min (Caserman et al., 2021; Cobb et al., 1999; DiZio & Lackner, 1997; Howarth & Finch, 1999; Regan & Price, 1994; Singer et al., 1998; Stanney, Kennedy, & Kingdon, 2002; Stanney, Lanham, Kennedy, & Breaux, 1999; Stanney, Kingdon, Graeber, & Kennedy, 2002; Wilson et al., 1995; Wilson et al., 1997). • Unlike VR, AR/MR are typically not associated with dropouts, likely due to low levels of nausea experienced (Stanney & Hughes, 2021). • VR presents with higher levels of disorientation and nausea, with lesser oculomotor disturbances. With prolonged (>45 min) VR exposure, oculomotor problems may become more pronounced, while nausea and disorientation level off (Stanney, Kingdon, Graeber, Kennedy, 2002). • ~50–80% of AR/MR users experience adverse symptoms, mostly in terms of oculomotor disturbances, such as visual discomfort and fatigue, eyestrain, double vision, headaches, adaptation of vestibulo-ocular reflex (VOR), with lesser disorientation and very little nausea (Stanney & Hughes, 2021). • Fully immersive VR exposure can cause people to vomit (about 1%), and approximately threequarters of those exposed tend to experience some level of nausea, disorientation, and oculomotor problems (Cobb, Nichols, Ramsey, & Wilson, 1999; DiZio & Lackner, 1997; Howarth & Finch, 1999; Regan & Price, 1994; Singer, Ehrlich, & Allen, 1998; Lawson, Graeber, Mead, & Muth, 2002; Stanney, Kingdon, Graeber, & Kennedy, 2002; Stanney, Lawson et al., 2020; Stanney, Salvendy et al., 1998; Wilson, Nichols, & Haldane, 1997; Wilson, Nichols, & Ramsey, 1995). • AR/MR are not typically associated with vomiting (Stanney & Hughes, 2021). • Females exposed to immersive systems have previously been expected to be more susceptible to motion sickness than males and to experience higher levels of oculomotor and disorientation symptoms as compared to males (Graeber, 2001; Stanney, Kingdon Graeber, & Kennedy, 2002). However, recent studies have demonstrated that this may be more related to the fit of the technology than sex-related (Stanney, Fidopiastis, & Foster, 2020). Specifically, when a user’s inter-pupillary distance (IPD) cannot be properly aligned in the XR headset, this may lead to higher levels of cybersickness. Current VR headsets, in particular, accommodate the IPD of males better than females. • Individuals susceptible to motion sickness can be expected to experience more than twice the level of adverse effects to XR exposure as compared to non-susceptible individuals (Stanney, Kingdon, Graeber, & Kennedy, 2002). • Individuals exposed to XR systems can be expected to experience lowered arousal (e.g., drowsiness, fatigue; a.k.a. sopite syndrome) upon post-exposure (Lawson et al., 2002; Stanney, Kingdon, Graeber, & Kennedy, 2002). • Flashbacks (i.e., visual illusion of movement or false sensations of movement, when not in the XR environment) can be expected to occur (Lawson et al., 2002; Stanney, Kingdon, Graeber, & Kennedy, 2002). • Prolonged aftereffects may occur after XR exposure, with symptoms potentially lasting more than 24 h (Baltzley, Kennedy, Berbaum, Lilienthal, & Gower, 1989; Stanney & Hughes, 2021; Stanney & Kennedy, 1998; Stanney, Kingdon, & Kennedy, 2002; Stanney, Kingdon, Graeber, & Kennedy, 2002). • Increased postural instability while in a seated position seems to precede cybersickness (Mehri, 2009; Riccio & Stoffregen, 1991).

Cybersickness in Immersive Training Environments

163

FIGURE 7.1  Interactive technology adoption rates. (Adapted from Llamas, 2019.)

With adoption going mainstream, ignoring the adverse effects of cybersickness could become problematic, with the potential for • creating unequal opportunities for immersive accessibility among the moderate to highly motion sickness susceptible population, with those who can handle immersive exposure advancing due to better, more contextualized, embodied training, while those who are susceptible to cybersickness are left to train with increasingly outdated technology (Allen et al., 2016; Stanney, Kingdon, & Kennedy, 2002; Stanney, Fidopiastis, & Foster, 2020); • decreased trainee acceptance and use of immersive training systems (Biocca, 1992; Sagnier et al., 2020); • decreased human performance (An et al., 2018; Kolasinski, 1995; Lawson et al., 2002; Stanney, Kingdon, & Kennedy, 2002); and • acquisition of improper behaviors (Kennedy, Hettinger, & Lilienthal, 1990). XR designers and developers must mitigate the effects of cybersickness in order to reduce the possibility of these powerful immersive technologies never achieving their full potential.

INDIVIDUAL SUSCEPTIBILITY AND STIMULUS INTENSITY There is a substantial individual component to cybersickness. As Lackner (2014, p. 2495) noted: the range of sensitivity in the general population varies about 10–1, and the adaptation constant also ranges from 10 to 1. By contrast, the decay time constant varies by 100– 1. The import of these values is that susceptibility to motion sickness in the general population varies by about 10,000–1, a vast range.

The fundamental question that needs to be addressed is as follows: Can an understanding of human physiological responses to immersive training technology be

164

Human Factors in Simulation and Training

developed and incorporated into design guidelines and usage protocols rendering immersive systems safe and effective to use for all? Research conducted over the past several decades has made tremendous gains in this regard (Chinn & Smith, 1953; Crampton, 1990; DiZio & Lackner, 2002; DiZio & Lackner, 1997; Heutink et al., 2019; Howarth & Finch, 1999; Hughes et al., 2020; Kennedy & Fowlkes, 1992; Keshavarz et al., 2014; Lawson, 2014; McCauley & Sharkey, 1992; McNally & Stuart, 1942; Pot-Kolder et al., 2018; Reason, 1970, 1978; Reason & Brand, 1975; Sjoberg, 1929; Stanney, Fidopiastis, & Foster, 2020; Stanney, Salvendy et al., 1998; Tyler & Bard, 1949; Weech et al., 2019; Welch & Mohler, 2014; Wendt, 1968). These researchers set out to achieve a number of challenging objectives, including developing tools to measure the adverse effects of immersive training system exposure (Kennedy & Stanney, 1996; Stanney, Kennedy, Drexler, & Harm, 1999), examining the psychometrics of cybersickness (Hildebrandt et al., 2018; Hughes et al., 2020; Kennedy, Stanney, & Dunlap, 2000; Kingdon, Stanney, & Kennedy, 2001; Stanney & Kennedy, 1997a, 1997b; Stanney, Lanham et al., 1999; Stanney, Kingdon, Nahmens, & Kennedy, 2003), developing usage protocols (Stanney, Kennedy, & Hale, 2014) and screening tools (Kennedy, Lane, Stanney, Lanham, & Kingdon, 2001), investigating system-related issues that influence cybersickness (Lucas et al., 2020; Park & Lee, 2020; Stanney, Fidopiastis, & Foster, 2020; Stanney & Hash, 1998; Stanney, Kingdon, Graeber, & Kennedy, 2002; Stanney, Salvendy et al., 1998; Widdowson et al., 2021; Wilson, 2016), examining the efficacy of readaptation mechanisms for recalibrating those exposed to immersive systems (Champney et al., 2007; Smither, Mouloua, & Kennedy, 2003), as well as examining the influences of cybersickness on human performance (Stanney, Kingdon, & Kennedy, 2001; Kennedy, French, Ordy, & Clark, 2003), among other related pursuits. These studies have led to an understanding that the response to immersive exposure varies directly with the capacity of the individual exposed (e.g., susceptibility, experience), dose (i.e., stimulus intensity), exposure duration (Hughes et al., 2020; Kennedy, Stanney, & Dunlap, 2000), and individual fit of the technology (Stanney, Fidopiastis, & Foster, 2020). These findings suggest that effective usage protocols that address the screening of individuals, strength of the immersive stimulus, usage instructions, and design of the HWD can minimize problems associated with immersive technologies. From the individual susceptibility perspective, age, prior experience, individual factors (e.g., unstable binocular vision; individual variations in inter-pupillary distance (IPD); susceptibility to photic seizures and migraines), drug/alcohol consumption, health status, and ability to adapt to novel sensory environments are all thought to contribute to the extent of symptoms experienced (Kennedy, Dunlap, & Fowlkes, 1990; Kolasinski, 1995; McFarland, 1953; Mirabile, 1990; Reason & Brand, 1975; Stanney, Kennedy, & Kingdon, 2002; Stanney, Salvendy et al., 1998) (see Table 7.2). While considerable research into the exact causes is requisite and has been ongoing for decades in its various forms (seasickness, motion sickness, simulator sickness, space sickness, cybersickness) (Chinn & Smith, 1953; Crampton, 1990; DiZio & Lackner, 2002; DiZio & Lackner, 1997; Heutink et al., 2019; Howarth &

Cybersickness in Immersive Training Environments

165

TABLE 7.2 Factors Affecting Individual Capacity to Resist Adverse Effects of Immersive Exposure • Age: Expect little motion sickness for those under age 2; expect greatest susceptibility to motion sickness between the ages of 2 and 12; expect motion sickness to decline after 12, with those over 25 being about half as susceptible as they were at 18 years of age. • Anthropometrics: Consider setting immersive stimulus intensity in proportion to body weight/ stature; ensure that hardware accommodates a large range of IPDs. • Individual susceptibility: Expect individuals to differ greatly in motion sickness susceptibility and use the Motion History Questionnaire (MHQ) (Kennedy & Graybiel, 1965; Kennedy, Lane, Grizzard, Stanney, Kingdon, & Lanham, 2001) or another instrument (cf. Golding, Rafiq, & Keshavarz, 2021) to gauge the susceptibility of the target trainee population. • Motion sickness history: Individuals who have experienced an emetic response associated with carnival rides can be expected to experience more than twice the level of adverse effects to immersive exposure as compared to those who do not experience such emesis (Stanney, Kingdon, Graeber, & Kennedy, 2002). • HWD design: Expect IPD fit to be a primary driver of cybersickness, especially in VR (Stanney, Fidopiastis, & Foster, 2020). Some VR headsets on the market today are expected to not properly fit up to ~40% of females and ~18% of males, which is expected to drive high levels and persistent cybersickness, particularly for females who tend to have a larger IPD mismatch. • Sensory plasticity: Cybersickness is expected to be less severe for those individuals who can rapidly re-weight (Oman, 1990) conflicting multisensory cues in XR environments, such as visual– vestibular mismatches (Gallagher & Ferrè, 2018). • Physiological state: Individuals with higher pre-exposure drowsiness will be more likely to experience drowsiness upon postimmersive exposure, and those exposed to VR for 60 min or longer can be expected to experience more than twice the level of drowsiness as compared to those exposed for a shorter duration (Lawson et al., 2002; Stanney, Kingdon, Graeber, & Kennedy, 2002). As drowsiness increases, one can expect a greater severity of flashbacks (Lawson et al., 2002; Stanney, Kingdon, Graeber, & Kennedy, 2002). • Body mass index: BMI does not tend to be related to cybersickness symptoms; however, those with higher BMIs may be less prone to experience an emetic response (Stanney, Kingdon, Graeber, & Kennedy, 2002). • Drug/alcohol consumption: Limit immersive exposure to those individuals who are free from drug or alcohol consumption. • Rest: Encourage individuals to be well rested before commencing immersive exposure • Ailments: Discourage those with cold, flu, or other ailments (e.g., headache, diplopia, blurred vision, sore eyes, or eyestrain) from participating in immersive exposure; encourage those susceptible to photic seizures and migraines, as well as individuals with pre-existing binocular anomalies to avoid exposure. • Clinical trainee groups: Obtain informed sensitivity to the vulnerabilities of these trainee groups (e.g., unique psychological, cognitive, and functional characteristics). Encourage those displaying comorbid features of various psychotic, bipolar, paranoid, substance abuse, claustrophobic, or other disorders where reality testing and identity problems are evident to avoid exposure.

166

Human Factors in Simulation and Training

TABLE 7.3 System and Usage Factors Influencing Immersive Stimulus Intensity • Exposure duration: Adverse effects associated with immersive exposure in VR are positively correlated with exposure duration (Kennedy, Stanney, & Dunlap, 2000; Saredakis et al., 2020). Lanham (2000) has shown that sickness increases linearly at a rate of 23% per 15 min. Dropouts in VR occur in as little as 15 min of exposure (Cobb et al., 1999; DiZio & Lackner, 1997; Duzmanska et al., 2018; Howarth & Finch, 1999; Hughes et al., 2020; Regan & Price, 1994; Singer et al., 1998; Stanney, Kingdon, & Kennedy, 2002; Stanney, Lanham et al., 1999; Stanney, Kingdon, Graeber, & Kennedy, 2002; Wilson et al., 1995; Wilson et al., 1997). However, AR exposure does not appear to follow this same pattern. Recent research suggests that those who undergo protracted, long duration (2 h+) AR exposure may actually experience habituation, feeling more comfortable the longer they don the AR headset (Stanney & Hughes, 2021). • Intersession intervals: VR studies have indicated that intersession intervals of two to five days are effective in mitigating adverse effects, while intervals less than or greater than two to five days are ineffective in reducing symptomatology (Kennedy, Lane, Berbaum, & Lilienthal, 1993; Watson, 1998). On the other hand, repeated, intermittent brief AR exposures appear to drive sensitization, where the sensory conditions in the AR headset continue to elicit cybersickness (Stanney & Hughes, 2021). • Movement control: As the amount of trainee movement control in terms of degrees of freedom (DoF) and head tracking within fully immersive environments increases so too does the level of nausea experienced (Clifton & Palmisano, 2020; Grassini & Laumann, 2020; So & Lo, 1999; Stanney & Hash, 1998; Stanney, Kingdon, Graeber, & Kennedy, 2002). Complete trainee movement control (six DoF) can be expected to lead to 2.5 times more dropouts than streamlined control (three DoF). Further, movements in the rotational (i.e., roll, pitch, and yaw) axes may be more provocative than those in the translational (i.e., x, y) axes. Further, the type of movement initiation, whether initiated by a controller or walking, matters, with walking potentially being advantageous when trying to minimize cybersickness (Saredakis et al., 2020). Artificial continuous locomotion techniques (e.g., joystick-based movement) are typically associated with higher levels of cybersickness than discrete locomotion techniques (e.g., teleportation, rotation snapping, translation snapping; Caserman et al., 2021; Farmani & Teather, 2020). • Visual scene complexity: The rate of visual flow (i.e., visual scene complexity) may influence the incidence, and more so the severity of motion sickness experienced by an individual (Kennedy & Fowlkes, 1992; McCauley & Sharkey, 1992). Complex visual scenes may be more nauseogenic than simple scenes with complex scenes possibly resulting in 1.5 times more emetic responses; however, scene complexity does not appear to affect dropout rates (Dichgans & Brandt, 1978; Kennedy, Berbaum, Dunlap, & Hettinger, 1996; Stanney, Kingdon, Graeber, & Kennedy, 2002). Immersive gaming content and 360 videos may be the most provocative (Saredakis et al., 2020). Such affects may be exacerbated by a large field-of-view (FOV) (Kennedy & Fowlkes, 1992), high spatial frequency content (Dichgans & Brandt, 1978), and visual simulation of action motion (i.e., vection) (Kennedy, Berbaum et al., 1996). • When large FOVs are used, determine if it drives high levels of vection (i.e., perceived self-motion). If high levels of vection are found and they lead to high levels of sickness, then reduce the spatial frequency content of visual scenes. (Continued)

Cybersickness in Immersive Training Environments

167

TABLE 7.3 (CONTINUED) System and Usage Factors Influencing Immersive Stimulus Intensity • Sensory mismatch: Visual–vestibular mismatches are probably one of the most provocative drivers of cybersickness, being associated with both substantially higher severity (perhaps double the severity) and dropout levels (Caserman et al., 2021). Further, differences in virtual versus physical head pose may be particularly provocative (Palmisano, Allison, & Kim, 2020). • Consider use of concordant motion (e.g., a motion base to reduce visual–vestibular conflicts; Kuiper et al., 2019), limiting or slowing forward speed and acceleration to reduce visual scene motion (Dennison & D’Zmura, 2017), rest frames (e.g., adding inertially stable visual motion cues, such as a fixed-horizon; Cao, Grandi, & Kopper, 2021), teleportation (Caserman et al., 2021), rotation and translation snapping (Farmani & Teather, 2020), depth-of-field or peripheral blur (Carnegie & Rhee, 2015), and dynamic field of view (e.g., modifying field of view based on speed and angular velocity; Fernandes & Feiner, 2016) techniques to minimize sensory conflicts. • Visual display: Various visual display factors thought to influence how provocative an immersive environment includes system consistency (Uliano, Kennedy, & Lambert, 1986); lag (So & Griffin, 1995); update rate (So & Griffin, 1995); mismatched inter-pupillary distance (IPDs) (MonWilliams, Rushton, & Wann, 1995, Stanney, Fidopiastis, & Foster, 2020); vergence–accommodation conflict (Zhan et al., 2020), restricted field-of-view (Fernandes & Feiner, 2016); unimodal and intersensorial distortions (both temporal and spatial) (Welch, 1978); and lighting conditions, with darker displays reported to be more conducive to cybersickness (Hoesch et al., 2018; Kobayashi et al., 2018). To minimize the effects of visual display on cybersickness: • Ensure any system lags/latencies are stable; variable lags/latencies can be debilitating (Palmisano, Allison, & Kim, 2020; Stauffert, Niebling, & Latoschik, 2020). • Minimize display/phase lags (i.e., end-to-end tracking latency between head motion and resulting update of the display; Kämäräinen et al., 2017). • Optimize frame rates, with a minimum frame rate of 60 Hz (frames per second) and upwards of 90 Hz recommended to minimize cybersickness (Freiwald, Katzakis, & Steinicke, 2018). • Provide adjustable IPD, with a range of ~ 50–77 mm recommended to capture >99% of people (Stanney, Fidopiastis, & Foster, 2020). • Provide multimodal feedback that minimizes sensory conflicts (i.e., provide visual, auditory, and haptic/kinesthetic feedback appropriate for situation being simulated; Kemeny, Chardonnet, & Colombet, 2020). • When appropriate, make the display as light as possible (minimize the use of dark scenarios; Hoesch et al., 2018; Kobayashi et al., 2018).

Finch, 1999; Hughes et al., 2020; Kennedy & Fowlkes, 1992; Keshavarz et al., 2014; Lawson, 2014; McCauley & Sharkey, 1992; McNally & Stuart, 1942; Reason, 1970, 1978; Reason & Brand, 1975; Sjoberg, 1929; Stanney, Fidopiastis, & Foster, 2020; Stanney, Salvendy et al., 1998; Tyler & Bard, 1949; Weech et al., 2019; Welch & Mohler, 2014; Wendt, 1968), there is still a lack of understanding about the factors that drive immersive environment stimulus intensity such that this knowledge can be used to identify usage protocols that minimize adverse effects. In fact, current usage of immersive technology generally treats trainees as if they are immune to cybersickness or possess low motion sickness susceptibility and are capable of rapid

168

Human Factors in Simulation and Training

acclimation to novel sensory environments. This is not always the case, as evidenced by the intensity and extent of side effects listed in Table 7.1 and individual susceptibility factors summarized in Table 7.2. Beyond individual predisposition, one can see from Table 7.3 that there are a number of system and usage factors influencing stimulus intensity. By developing an understanding of these factors, means of acclimating trainees to an immersive experience could potentially be identified.

QUANTIFYING IMMERSIVE STIMULUS INTENSITY The summary in Table 7.3 suggests that the intensity of an immersive stimulus could be reduced for VR by shortening exposure duration, maintaining an intersession interval of two to five days, reducing degrees of freedom (DoF) of trainee movement control – particularly avoiding rotational movements – and simplifying visual scenes. While the latter are expected to be the same for AR, usage protocol recommendations for minimizing stimulus intensity in AR are divergent, with repeated long duration (>2 h) exposures potentially being the most advantageous. Table 7.3 further suggests that the exact conditioning strategy that is most effective may depend on individual susceptibility. If these tactics are coupled with conditioning approaches, and improvements to the design of HWDs to include a larger segment of the population with respect to adjustable IPD ranges, enlarged FOV, and visual display advancements, reductions in adverse effects and associated dropout rates should result. There are several published case reports on effective interventions to alleviate moderate-to-severe cybersickness. Rine and colleagues (1999) reported on a successful ten-week intervention combining visuo-vestibular habituation and balance

TABLE 7.4 Steps to Quantifying Immersive Stimulus Intensity 1. Get an initial estimate: Talk with target trainees (not developers) of the system and determine the level of adverse effects they experience 2. Observe: Watch trainees during and after exposure and note comments and behaviors 3. Try the system yourself: Particularly if you are susceptible to cybersickness or have an IPD that does not match the HWD, obtain a firsthand assessment of the adverse effects 4. Measure dropout rate: If most people can stay in for an hour without symptoms, then the system is likely benign; if most people drop out within 10 min, then the system is probably in need of redesign 5. Monitor: Use simple rating scales to assess sickness (Kennedy, Lane, Berbaum, & Lilienthal, 1993) and visual, proprioceptive, and postural measures to assess aftereffects (Kennedy, Stanney, Compton, Drexler, & Jones, 1999) 6. Compare: Determine how the system under evaluation compares to other immersive systems 7. Report: Summarize the severity of the problem, specify required interventions (e.g., warnings, instructions), and set expectations for use (e.g., target exposure duration, intersession intervals) 8. Expect dropouts: With a high-intensity immersive stimulus, dropout rates can be high

169

Cybersickness in Immersive Training Environments

TABLE 7.5 Immersive Training Systems Usage Protocol 1. Reviewing information in Table 7.2, identify individual capacity of target trainees to resist adverse effects of immersive training exposure 2. Considering the factors in Table 7.3, design immersive training stimuli to minimize adverse effects 3. Following the guidelines in Table 7.4, quantify the stimuli intensity of immersive system 4. Warnings: Provide warnings for those with severe susceptibility to motion sickness, photic seizures, and migraines, as well as those with preexisting binocular anomalies, cold, flu, or other ailments (see Table 7.2) 5. Educate trainees as to how the occurrence or intensity of adverse effects may be mitigated by limiting drug/alcohol consumption and ensuring ample rest (see Table 7.2) 6. Educate trainees on the potential risks of immersive training exposure. Inform trainees of the insidious effects they may experience during exposure, including nausea, malaise, disorientation, headache, dizziness, vertigo, eyestrain, drowsiness, fatigue, pallor, sweating, increased salivation, and vomiting 7. Educate trainees as to the potential adverse aftereffects of immersive training exposure (see Table 7.1). Inform trainees that they may experience lowered arousal (e.g., drowsiness, fatigue), disturbed visual functioning, visual flashbacks, as well as unstable locomotor and postural control for prolonged periods following exposure. Relating these experiences to excessive alcohol consumption may prove instructional in understanding safety concerns with performing complex tasks (e.g., operating heavy machinery, or driving an automobile) shortly after exposure 8. Inform trainees that if they start to feel ill, they should terminate their immersive training exposure because extended exposure is known to exacerbate adverse effects (Kennedy, Stanney, & Dunlap, 2000) 9. Prepare trainees for their transition to the immersive training by informing them that there will be an adjustment period 10. Adjust environmental conditions: Provide adequate air flow and comfortable thermal conditions (Kloskowski, Medeiros, & Schöning, 2019). Sweating often precedes an emetic response, thus proper air flow can enhance trainee comfort. In addition, extraneous noise should be eliminated, as it can exacerbate ill-effects 11. Adjust equipment to minimize fatigue: Fatigue can exacerbate the adverse effects of immersive training exposure. To minimize fatigue, ensure all equipment is comfortable and properly adjusted for fit, including adjustment for IPD and performing any available visual calibrations to mitigate visual fatigue 12. Gauge initial exposure duration: For strong VR training stimuli, limit initial exposures to a short duration (e.g., 10 min or less) and allow an intersession recovery period of two to five days (Table 7.3). For AR, consider protracted long-duration exposure to foster habituation 13. Avoid provocative movements: For strong immersive training stimuli, warn trainees to avoid movements requiring high rates of linear or rotational acceleration and extraordinary maneuvers (e.g., flying backward) during initial interaction (McCauley & Sharkey, 1992) 14. Monitor trainees: Throughout immersive training exposure, an attendant should be available at all times to monitor trainees’ behavior and ensure their well-being (Continued)

170

Human Factors in Simulation and Training

TABLE 7.5 (CONTINUED) Immersive Training Systems Usage Protocol 15. Look for red flags: Indicators of impending trouble include excessive sweating, verbal frustration, lack of movement within the environment for a significant amount of time, and less overall movement (e.g., restricting head movement). Trainees demonstrating any of these behaviors should be observed closely, as they may experience an emetic response. Extra care should be taken with these individuals, postexposure. Note: It is beneficial to have a plastic bag or garbage can located near trainees in the event of an abrupt emetic response 16. Termination: Set criteria for terminating exposure. Exposure should be terminated immediately if trainees verbally complain of symptoms and acknowledge they are no longer able to continue. Also, to avoid an emetic response, if telltale signs are observed (i.e., sweating, increased salivation), exposure should be terminated. Some individuals may be unsteady upon postexposure. These individuals may need assistance when initially standing up after exposure 17. Debriefing: After exposure, the well-being of trainees should be assessed. Measurements of their hand–eye coordination and postural stability should be taken. Similar to field sobriety tests, these can include measures of balance (e.g., standing on one foot, walking an imaginary line, leaning backward with eyes closed), coordination (e.g., alternate hand clapping and finger-to-nose touch while the eyes are closed), and eye nystagmus (e.g., follow a light pen with the eyes without moving the head). Do not allow individuals who fail these tests to conduct high-risk activities until they have recovered (e.g., have someone drive them home) 18. Releasing: Set criteria for releasing trainees. Specify the amount of time after exposure that trainees must remain on premises before driving or participating in other such high-risk activities. In our lab, a 2-to-1 ratio is used; postexposure trainees must remain in the laboratory twice the amount of exposure time to allow recovery 19. Follow-up: Call trainees the next day, have them call, or complete online surveys to report any prolonged adverse effects

training, which successfully rid a patient of strong visually induced motion sickness. Another effective strategy for reducing motion sickness is autogenic feedback training, in which subjects are taught to control physiological responses to environmental stressors (Cowings & Toscano, 2000). Thus, while technological improvements and sound design principles that ensure that the hardware developed reflects human physiological variation are essential, components for expanding the use of immersive environments while minimizing unwanted side effects should carefully consider usage protocols that foster habituation (Stanney & Hughes, 2021). Focusing on the factors in Table 7.3, system developers can identify the primary factors that are inducing adverse effects in their system and adjust accordingly. Further, the steps in Table 7.4 can be used to establish the stimulus strength of a particular immersive system.

USAGE PROTOCOL Integrating the issues reviewed above, Table 7.5 provides a systematic usage protocol that can be used by system developers and system administrators to minimize risks to trainees exposed to immersive training systems.

Cybersickness in Immersive Training Environments

171

CONCLUSIONS To facilitate adoption and minimize the risks associated with exposure to immersive training systems, developers and system administrators should identify the capacity of end users to resist the adverse effects of exposure, quantify and minimize stimulus intensity, and follow a systematic usage protocol. This protocol should focus on warning, educating, and preparing trainees, setting appropriate environmental and equipment conditions, gauging initial exposure duration based on system type, monitoring trainees while looking for red flags, and setting criteria for terminating exposure, debriefing, and terms for release. Adopting such a protocol can minimize the risk factors associated with immersive training system exposure, thereby enhancing the safety of trainees, while limiting the liability of system developers and administrators.

ACKNOWLEDGMENTS This chapter is dedicated to the late Robert S. Kennedy, a world-leading expert in motion sickness and many other things. His tireless work ethic, unrivaled intellect, and selfless mentorship were a gift to all those whose lives he touched. This material is based upon work supported in part by the Office of Naval Research (ONR) under grant No.N000149810642, the National Science Foundation (NSF) under grants No. DMI9561266 and IRI-9624968, the National Aeronautics and Space Administration (NASA) under grants No. NAS9-19482 and NAS9-19453, and the U.S. Army Medical Research & Development Command (USAMRDC) under the guidance of the Joint Program Committee – JPC-1 at Ft. Detrick, MD under Contract number: MTEC-W81XWH1990005. The views, opinions, and/or findings contained in this research/presentation/publication are those of the authors/company and do not necessarily reflect the views of the ONR, NSF, NASA, or USAMRDC and should not be construed as an official DoD/NSF/NASA position, policy or decision unless so designated by other documentation. No official endorsement should be made. Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the U.S. government.

REFERENCES Allen, B., Hanley, T., Rokers, B., & Green, C. S. (2016). Visual 3D motion acuity predicts discomfort in 3D stereoscopic environments. Entertainment Computing, 13, 1–9. Alsop, T. (2021, Mar 22). Extended reality (XR): AR, VR, and MR – Statistics & facts. Statista. https://www​.statista​.com ​/topics​/6072​/extended​-reality​-xr/​#dossierSummary Al Zayer, M., Adhanom, I. B., MacNeilage, P., & Folmer, E. (2019, May). The effect of fieldof-view restriction on sex bias in VR sickness and spatial navigation performance. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 354:1–354:12). Glasgow, Scotland UK: ACM. An, B., Matteo, F., Epstein, M., & Brown, D. E. (2018). Comparing the performance of an immersive virtual reality and traditional desktop cultural game. In CHIRA (pp. 54–61). In Proceedings of the 2nd International Conference on Computer-Human Interaction

172

Human Factors in Simulation and Training

Research and Applications – Volume 1: CHIRA (pp. 54–61). https://doi​.org​/10​.5220​ /0006922800540061 Baltzley, D. R., Kennedy, R. S., Berbaum, K. S., Lilienthal, M. G., & Gower, D. W. (1989). The time course of postflight simulator sickness symptoms. Aviation, Space, and Environmental Medicine, 60(11), 1043–1048. Biocca, F. (1992). Will simulation sickness slow down the diffusion of virtual environment technology? Presence: Teleoperators and Virtual Environments, 1(3), 334–343. Bracq, M. S., Michinov, E., Arnaldi, B., Caillaud, B., Gibaud, B., Gouranton, V., & Jannin, P. (2019). Learning procedural skills with a virtual reality simulator: An acceptability study. Nurse Education Today, 79, 153–160. Cao, Z., Grandi, J., & Kopper, R. (2021). Granulated rest frames outperform field of view restrictors on visual search performance. Frontiers in Virtual Reality, https://doi​.org​/10​ .3389​/frvir​.2021​.604889 Carnegie, K. C., & Rhee, T. (2015). Reducing visual discomfort with HMDs using dynamic depth of field. IEEE Computer Graphics and Applications, 35(5), 34–41. Caserman, P., Garcia-Agundez, A., Gámez Zerban, A. et al. (2021). Cybersickness in currentgeneration virtual reality head-mounted displays: Systematic review and outlook. Virtual Reality. https://doi​.org​/10​.1007​/s10055​- 021​- 00513-6 Champney, R., Stanney, K. M., Hash, P., Malone, L., Kennedy, R. S., & Compton, D. (2007). Recovery from virtual environment exposure: Expected time-course of symptoms and potential readaptation mechanisms. Human Factors, 49(3), 491–506. Chinn, H. I., & Smith, P. K. (1953). Motion sickness. Pharmacological Review, 7, 33–82. Clifton, J., & Palmisano, S. (2020). Effects of steering locomotion and teleporting on cybersickness and presence in HMD-based virtual reality. Virtual Reality, 24(3), 453–468. Cobb, S. V. G., Nichols, S., Ramsey, A. D., & Wilson, J. R. (1999). Virtual Reality-Induced Symptoms and Effects (VRISE). Presence: Teleoperators and Virtual Environments, 8(2), 169–186. Cowings, P. S., & Toscano, W. B. (2000). Autogenic-feedback training exercise is superior to promethazine for control of motion sickness symptoms. Journal of Clinical Pharmacology, 40, 1154–1165. Crampton, G. H. (Ed.). (1990). Motion and space sickness. Boca Raton, FL: CRC Press. Curry, C., Li, R., Peterson, N., & Stoffregen, T. A. (2020). Cybersickness in virtual reality head-mounted displays: Examining the influence of sex differences and vehicle control. International Journal of Human–Computer Interaction, 36(12), 1161–1167. https:/doi​ .org​/10​.1080​/10447318​.2020​.​1726108 Dennison, M. S., & D’Zmura, M. (2017). Cybersickness without the wobble: Experimental results speak against postural instability theory. Applied Ergonomics, 58, 215–223. Dichgans, J., & Brandt, T. (1978). Visual-vestibular interaction: Effects on self-motion perception and postural control. In R. Held, H. W. Leibowitz, & H. L. Teuber (Eds.), Handbook of sensory physiology, Vol. VIII: Perception (pp. 756–804). Heidelberg: Springer-Verlag. DiZio, P., & Lackner, J. R. (1997). Circumventing side effects of immersive virtual environments. In M. Smith, G. Salvendy, & R. Koubek (Eds.), Design of computing systems: Social and ergonomic considerations (pp. 893–896). Amsterdam, Netherlands: Elsevier Science Publishers, San Francisco, CA, August 24–29. DiZio, P., & Lackner, J. R. (2002). Proprioceptive adaptation and aftereffects. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 791–806). Mahwah, NJ: Lawrence Erlbaum Associates. Doolani, S., Wessels, C., Kanal, V., Sevastopoulos, C., Jaiswal, A., Nambiappan, H., & Makedon, F. (2020). A review of extended reality (XR) technologies for manufacturing training. Technologies, 8, 77. https://doi​.org​/10​.3390​/technologies8040077

Cybersickness in Immersive Training Environments

173

Duzmanska, N., Strojny, P., & Strojny A. (2018). Can simulator sickness be avoided? A review on temporal aspects of simulator sickness. Frontiers in Psychology, 9, 2132. https://doi​ .org​/10​.3389​/fpsyg​.2018​.02132 Farmani, Y., & Teather, R. J. (2020). Evaluating discrete viewpoint control to reduce cybersickness in virtual reality. Virtual Reality, 24, 645–664. https://doi​.org​/10​.1007​ /s10055​- 020​- 00425-x Fernandes, A. S., & Feiner, S. K. (2016). Combating VR sickness through subtle dynamic field-of-view modification. In 2016 IEEE Symposium on 3D User Interfaces (3DUI), Greenville, SC. https://doi​.org​/10​.1109​/3DUI​.2016​.7460053 Freiwald, J. P., Katzakis, N., & Steinicke, F. (2018). Camera time warp: Compensating latency in video see-through head-mounted-displays for reduced cybersickness effects. Proceedings of VRST ’18, November 29–December 1, 2018, Tokyo, Japan. Gallagher, M., & Ferrè, E. R. (2018). Cybersickness: A multisensory integration perspective. Multisensory Research, 31(7), 645–674. https://doi​.org​/10​.1163​/22134808​-20181293 Giamellaro, M. (2017). Dewey’s Yardstick: Contextualization as a crosscutting measure of experience in education and learning. Sage Open, 7(1). https://doi​.org​/10​.1177​ /2158244017700463 Golding, J. F, Rafiq, A., & Keshavarz, B. (2021). Predicting individual susceptibility to visually induced motion sickness by questionnaire. Frontiers in Virtual Reality, 2, 576871. https://doi​.org​/10​.3389​/frvir​.2021​.576871 Graeber, D. A. (2001). Use of incremental adaptation and habituation regimens for mitigating optokinetic side effects. Unpublished doctoral dissertation, University of Central Florida. Grassini, S., & Laumann, K. (2020). Are modern head-mounted displays sexist? A systematic review on gender differences in HMD-mediated virtual reality. Frontiers in Psychology, 11, 1604. https://doi​.org​/10​.3389​/fpsyg​.2020​.01604 Heutink, J., Broekman, M., Brookhuis, K. A., Melis-Dankers, B. J., & Cordes, C. (2019). The effects of habituation and adding a rest-frame on experienced simulator sickness in an advanced mobility scooter driving simulator. Ergonomics, 62(1), 65–75. Hildebrandt, J., Schmitz, P., Valdez, A. C., Kobbelt, L., & Ziefle, M. (2018, July). Get well soon! human factors’ influence on cybersickness after redirected walking exposure in virtual reality. International Conference on Virtual, Augmented and Mixed Reality (pp. 82–101). Cham: Springer. Hoesch, A., Poeschl, S., Weidner, F., Walter, R., & Doering, N. (2018). The relationship between visual attention and simulator sickness: A driving simulation study. 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR) (pp. 1–2). https://doi​.org​/10​ .1109​/ VR​.2018​.8446240. Howarth, P. A., & Finch, M. (1999). The nauseogenicity of two methods of navigating within a virtual environment. Applied Ergonomics, 30, 39–45. Hughes, C. L., Bailey, P. S., Ruiz, E., Fidopiastis, C. M., Taranta, N. R., & Stanney, K. M. (2020). The psychometrics of cybersickness in augmented reality. Frontiers in Virtual Reality: Virtual Reality & Human Behavior, 1, 602954. https://doi​.org​/10​.3389​/frvir​ .2020​.602954 Kahol, K., Vankipuram, M., & Smith, M. L. (2009). Cognitive simulators for medical education and training. Journal of Biomedical Informatics, 42(4), 593–604. https://doi​ .org​/10​.1016​/j​.jbi​.2009​.02​.008 Kämäräinen, T., Siekkinen, M., Ylä-Jääski, A., Zhang, W., & Hui, P. (2017). Dissecting the end-to-end latency of interactive mobile video applications. Proceedings of the 18th International Workshop on Mobile Computing Systems and Applications – HotMobile '17 (pp. (pp. 61–66). Sonoma, CA: ACM Press. https://doi.org10.1145/3032970. 3032985

174

Human Factors in Simulation and Training

Kemeny, A., Chardonnet, J.-R., & Colombet, F. (2020). Getting rid of cybersickness in virtual reality, augmented reality, and simulators. Switzerland: Springer International. https:/ doi​.or​g/ 10.1007/978-3-030-59342-1 Kennedy, R. S., Berbaum, K. S., Dunlap, W. P., & Hettinger, L. J. (1996). Developing automated methods to quantify the visual stimulus for cybersickness. Proceedings of the Human Factors and Ergonomics Society 40th Annual Meeting (pp. 1126–1130). Santa Monica, CA: Human Factors & Ergonomics Society. Kennedy, R. S., Dunlap, W. P., & Fowlkes, J. E. (1990). Prediction of motion sickness susceptibility: A taxonomy and evaluation of relative predictor potential. In G. H. Crampton (Ed.), Motion and space sickness (pp. 179–215). Boca Raton, FL: CRC Press. Kennedy, R. S., & Fowlkes, J. E. (1992). Simulator sickness is polygenic and polysymptomatic: Implications for research. International Journal of Aviation Psychology, 2(1), 23–38. Kennedy, R. S., French, J., & Ordy, J. M., & Clark, J. (2003). Visually induced motion sickness, cognitive performance, saliva melatonin, and cortisol. Paper accepted for presentation at the Society for Neuroscience 33rd Annual Meeting, November 8–12, New Orleans, LA. Kennedy, R. S., & Graybiel, A. (1965). The Dial test: A standardized procedure for the experimental production of canal sickness symptomatology in a rotating environment (Rep. No. 113, NSAM 930). Pensacola, FL: Naval School of Aerospace Medicine. Kennedy, R. S., Hettinger, L. J., & Lilienthal, M. G. (1990). Simulator sickness. In G. H. Crampton (Ed.), Motion and Space Sickness (pp. 247–262). Boca Raton, FL: CRC Press. Kennedy, R. S., Lane, N. E., Berbaum, K. S., & Lilienthal, M. G. (1993). Simulator sickness questionnaire: An enhanced method for quantifying simulator sickness. International Journal of Aviation Psychology, 3(3), 203–220. Kennedy, R. S., Lane, N. E., Grizzard, M. C., Stanney, K. M., Kingdon, K., & Lanham, S. (2001). Use of a motion history questionnaire to predict simulator sickness. Proceedings of the Sixth Driving Simulation Conference- DSC2001 (pp. 79–89). France: INRETS/ Renault. Kennedy, R. S., Lane, N. E., Stanney, K. M., Lanham, D. S., & Kingdon, K. (2001). Use of a motion experience questionnaire to predict simulator sickness. Usability evaluation and interface design: Cognitive engineering, intelligent agents and virtual reality (pp. 1061–1065). Mahwah, NJ: Lawrence Erlbaum Associates. Kennedy, R. S., & Stanney, K. M. (1996). Postural instability induced by virtual reality exposure: Development of a certification protocol. International Journal of HumanComputer Interaction, 8(1), 25–47. Kennedy, R. S., Stanney, K. M., Compton, D. E., Drexler, J. M., & Jones, M. B. (1999). Virtual environment adaptation assessment test battery (Phase II Final Report, Contract No. NAS9-97022). Houston, TX: NASA Lyndon B. Johnson Space Center. Kennedy, R. S., Stanney, K. M., & Dunlap, W. P. (2000). Duration and exposure to virtual environments: Sickness curves during and across sessions. Presence: Teleoperators and Virtual Environments, 9(5), 463–472. Keshavarz, B., Hecht, H., & Lawson, B. D. (2014). Visually-induced motion sickness: Causes, characteristics, and countermeasures. In K. S. Hale & K. M. Stanney (Eds.), Handbook of virtual environments: Design, implementation, and applications (2nd edition, pp. 647–698). New York, NY: CRC Press. Kingdon, K., Stanney, K. M., & Kennedy, R. S. (2001). Extreme responses to virtual environment exposure. The 45th Annual Human Factors and Ergonomics Society Meeting (pp. 1906–1910). Minneapolis/St. Paul MN, October 8–12, 2001.

Cybersickness in Immersive Training Environments

175

Kloskowski, H., Medeiros, D., & Schöning, J. (2019). OORT: An air-flow based cooling system for long-term virtual reality sessions. Proceedings of VRST ’19, November 12–15, 2019, Parramatta, NSW, Australia. Kobayashi, N., Yamashita, H., Matsuura, A., & Ishikawa, M. (2018). Effects of illuminance environment on visual induced motion sickness. 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE) (pp. 429–430). https://doi​.org​/10​.1109​/GCCE​.2018​ .8574778 Kolasinski, E. M. (1995). Simulator sickness in virtual environments (ARI Technical Report 1027). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Kuiper, O. X., Bos, J. E., Diels, C., & Cammaerts, K. (2019). Moving base driving simulators’ potential for carsickness research. Applied Ergonomics, 81, 102889. Lackner, J. R. (2014). Motion sickness: More than nausea and vomiting. Experimental Brain Research, 232(8), 2493–2510. https://doi​.org​/10​.1007​/s00221​- 014​- 4008-8 Lanham, S. (2000). The effects of motion on performance, presence, and sickness in a virtual environment. Master’s Thesis, University of Central Florida. Lawson, B. D. (2014). Motion sickness symptomatology and origins. In K. S. Hale & K. M. Stanney (Eds.), Handbook of virtual environments: Design, implementation, and applications (2nd edition, pp. 531–600). New York, NY: CRC Press. Lawson, B. D., Graeber, D. A., Mead, A. M., & Muth, E. R. (2002). Signs and symptoms of human syndromes associated with synthetic experiences. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 791–806). Mahwah: NJ: Lawrence Erlbaum Associates. Lee, K. (2012). Augmented reality in education and training. TechTrends, 56(2), 13–21. Levine, S., Goldin-Meadow, S., Carlson, M., & Hemani-Lopez, N. (2018). Mental transformation skill in young children: The role of concrete and abstract motor training. Cognitive Science, 42, 1207–1228. https://doi​.org​/10​.1111​/cogs​.12603 Llamas, S. (2019). XR by the numbers: What the data tells us. Gaming & Entertainment Track at AWE USA 2019. Santa Clara, CA. https://www​.slideshare​.net​/AugmentedWorldExpo​ /stephanie​-llamas​-superdata​-xr​-by​-the​-numbers​-what​-the​-data​-tells​-us Lucas, G., Kemeny, A., Paillot, D., & Colombet, F. (2020). A simulation sickness study on a driving simulator equipped with a vibration platform. Transportation Research Part F: Traffic Psychology and Behaviour, 68, 15–22. McCauley, M. E., & Sharkey, T. J. (1992). Cybersickness: Perception of self-motion in virtual environments. Presence: Teleoperators and Virtual Environments, 1(3), 311–318. McFarland, R. A. (1953). Human factors in air transportation: Occupational health & safety. New York: McGraw-Hill. McNally, W. J., & Stuart, E. A. (1942). Physiology of the labyrinth reviewed in relation to seasickness and other forms of motion sickness. War Medicine, 2, 683–771. Melo, M., Vasconcelos-Raposo, J., & Bessa, M. (2018). Presence and cybersickness in immersive content: Effects of content type, exposure time and sex. Computers & Graphics, 71, 159–165. Merhi, O. A. (2009). Motion sickness, virtual reality and postural stability. Retrieved from the University of Minnesota Digital Conservancy, https://hdl​.handle​.net​/11299​/58646. Mirabile, C. S. (1990). Motion sickness susceptibility and behavior. In G. H. Crampton (Ed.), Motion and space sickness (pp. 391–410). Boca Raton, FL: CRC Press. Money, K. E. (1970). Motion sickness. Psychological Reviews, 50(1), 1–39. Mon-Williams, M., Rushton, S., & Wann, J. P. (1995). Binocular vision in stereoscopic virtual-reality systems. Society for Information Display International Symposium Digest of Technical Papers, 25, 361–363.

176

Human Factors in Simulation and Training

Moroz, M., Garzorz, I., Folmer, E., & MacNeilage, P. (2019). Sensitivity to visual speed modulation in head-mounted displays depends on fixation. Displays, 58, 12–19. Munafo, J., Diedrick, M., & Stoffregen, T. A. (2017). The virtual reality head-mounted display Oculus Rift induces motion sickness and is sexist in its effects. Experimental Brain Research, 235, 889–901. https://doi​.org​/10​.1007​/s00221​- 016​- 4846-7 Nooij, S. A., Pretto, P., Oberfeld, D., Hecht, H., & Bülthoff, H. H. (2017). Vection is the main contributor to motion sickness induced by visual yaw rotation: Implications for conflict and eye movement theories. PloS One, 12(4), e0175305. Oman, C. M. (1990). Motion sickness: A synthesis and evaluation of the sensory conflict theory. Canadian Journal of Physiology and Pharmacology, 68(2), 294–303. https:// doi​.org​/10​.1139​/y90​- 044 Palmisano, S., Allison, R. S., & Kim, J. (2020). Cybersickness in head-mounted displays is caused by differences in the user's virtual and physical head pose. Frontiers in Virtual Reality, 1, 587698. https://doi​.org​/10​.3389​/frvir​.2020​.587698 Park, S., & Lee, G. (2020). Full-immersion virtual reality: Adverse effects related to static balance. Neuroscience Letters, 733, 134974. Pot-Kolder, R., Veling, W., Counotte, J., & Van Der Gaag, M. (2018). Anxiety partially mediates cybersickness symptoms in immersive virtual reality environments. Cyberpsychology, Behavior, and Social Networking, 21(3), 187–193. Reason, J. T. (1970). Motion sickness: A special case of sensory rearrangement. Advancement in Science, 26, 386–393. Reason, J. T. (1978). Motion sickness adaptation: A neural mismatch model. Journal of the Royal Society of Medicine, 71, 819–829. Reason, J. T., & Brand, J. J. (1975). Motion sickness. New York: Academic Press. Rebenitsch, L., & Owen, C. (2021). Estimating cybersickness from virtual reality applications. Virtual Reality, 25, 165–174. https://doi​.org​/10​.1007​/s10055​- 020​- 00446-6 Regan, E. C., & Price, K. R. (1994). The frequency of occurrence and severity of sideeffects of immersion virtual reality. Aviation, Space, and Environmental Medicine, 65, 527–530. Riccio, G. E., & Stoffregen, T. A. (1991). An ecological theory of motion sickness and postural instability. Ecological Psychology, 3(3), 195–240. Rine, R. M., Schubert, M. C., & Balkany, T. J. (1999). Visual-vestibular habituation and balance training for motion sickness. Physical Therapy, 79(10), 949–57. Sagnier, C., Loup-Escande, E., Lourdeaux, D., Thouvenin, I., & Valléry, G. (2020). User acceptance of virtual reality: An extended technology acceptance model. International Journal of Human–Computer Interaction, 36(11), 993–1007. https://doi​.org​/10​.1080​ /10447318​.2019​.1708612 Saredakis, D., Szpak, A., Birckhead, B., Keage, H., Rizzo, A., & Loetscher, T. (2020). Factors associated with virtual reality sickness in head-mounted displays: A systematic review and meta-analysis. Frontiers in Human Neuroscience, 14, 96. https://doi​.org​/10​.3389​/ fnhum​.2020​.00096 Shafer, D. M., Carbonara, C. P., & Korpi, M. F. (2019). Factors affecting enjoyment of virtual reality games: A comparison involving consumer-grade virtual reality technology. Games for Health Journal, 8(1), 15–23. Singer, M. J., Ehrlich, J. A., & Allen, R. C. (1998). Virtual environment sickness. Adaptation to and recover from a search task. Proceedings of the 42nd Annual Human Factors and Ergonomics Society Meeting (pp. 1506–1510). Chicago, IL, October 5–9. Sjoberg, A. A. (1929). Experimental studies of the eliciting mechanism of sea sickness. Acta oto-laryngolica, 13, 343–347.

Cybersickness in Immersive Training Environments

177

Smither, J. A., Mouloua, M., & Kennedy, R. S. (2003). Reducing symptomatology of visually-induced motion sickness through perceptual training. Manuscript submitted for publication. Smyth, J., Jennings, P., Mouzakitis, A., & Birrell, S. (2018, November). Too sick to drive: How motion sickness severity impacts human performance. 2018 21st International Conference on Intelligent Transportation Systems (ITSC) (pp. 1787–1793). IEEE. So, R. H., & Griffin, M. J. (1995). Effects of lags on human operator transfer functions with head-coupled systems. Aviation, Space, and Environmental Medicine, 66, 550–556. So, R .H. Y., & Lo, W. T. (1999). Cybersickness: An experimental study to isolate the effects of rotational scene oscillations. Proceedings of the IEEE Virtual Reality Conference (pp. 237–241). Los Alamitos, CA: IEEE Computer Society. Stanney, K., Fidopiastis, C., & Foster, L. (2020). Virtual reality is sexist: But it does not have to be. Frontiers in Robotics & AI – Virtual Environments, 7, 4. https://doi​.org​/10​.3389​ /frobt​.2020​.00004 Stanney, K. M., & Hash, P. (1998). Locus of user-initiated control in virtual environments: Influences on cybersickness. Presence: Teleoperators and Virtual Environments, 7(5), 447–459. Stanney, K. M., & Hughes, C. (2021). Final report: Assessment of psychological and physiological effects of augmented reality: Development of the Dual-Adaptation Protocol for Augmented Reality (DAPAR). Orlando, FL: Design Interactive. Stanney, K. M., & Kennedy, R. S. (1997a). Cybersickness is not simulator sickness. Proceedings of the 41st Annual Human Factors and Ergonomics Society Meeting (pp. 1138–1142). Albuquerque, NM, September 22–26. Stanney, K. M., & Kennedy, R. S. (1997b). The psychometrics of cybersickness. Communications of the ACM, 40(8), 67–68. Stanney, K. M., & Kennedy, R. S. (1998). Aftereffects from virtual environment exposure: How long do they last? Proceedings of the 42nd Annual Human Factors and Ergonomics Society Meeting (pp. 1476–1480). Chicago, IL, October 5–9. Stanney, K. M., Kennedy, R. S., Drexler, J. M., & Harm, D. L. (1999). Motion sickness and proprioceptive aftereffects following virtual environment exposure. Applied Ergonomics, 30, 27–38. Stanney, K. M., Kennedy, R. S., & Hale, K. (2014). Virtual environments usage protocols. In K. S. Hale & K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (2nd edition, pp. 797–809). Boca Raton, FL: CRC Press. Stanney, K. M., Kingdon, K., Graeber, D., & Kennedy, R. S. (2002). Human performance in immersive virtual environments: Effects of duration, user control, and scene complexity. Human Performance, 15(4), 339–366. Stanney, K. M., Kingdon, K., & Kennedy, R. S. (2002). Dropouts and aftereffects: Examining general accessibility to virtual environment technology. The 46th Annual Human Factors and Ergonomics Society Meeting (pp. 2114–2118). Baltimore, MD, September 29–October 4, 2002. Stanney, K. M., Kingdon, K., & Kennedy, R. S. (2001). Human performance in virtual environments: Examining user control techniques. In M. J. Smith, G. Salvendy, D. Harris, & R. J. Koubek (Eds.), Usability evaluation and interface design: Cognitive engineering, intelligent agents and virtual reality (Vol. 1 of the Proceedings of HCI International 2001) (pp. 1051–1055). Mahwah, NJ: Lawrence Erlbaum. Stanney, K. M., Kingdon, K., Nahmens, I., & Kennedy, R. S. (2003). What to expect from immersive virtual environment exposure: Influences of gender, body mass index, and past experience. Human Factors, 45(3), 504–522.

178

Human Factors in Simulation and Training

Stanney, K. M., Lanham, S., Kennedy, R. S., & Breaux, R. B. (1999). Virtual environment exposure drop-out thresholds. The 43rd Annual Human Factors and Ergonomics Society Meeting (pp. 1223–1227). Houston, TX, September 27-October 1, 1999 Stanney, K. M., Lawson, B. D., Rokers, B., Dennison, M., Fidopiastis, C., Stoffregen, T., Weech, S., & Fulvio, J. M. (2020). Identifying causes of and solutions for cybersickness in immersive technology: Reformulation of a research and development agenda. International Journal of Human-Computer Interaction, 36(19), 1783–1803. Stanney, K. M., Nye, H., Haddad, S., Padron, C. K., Hale, K. S., & Cohn, J. V. (2021, in press). eXtended reality (XR) environments. In G. Salvendy & W. Karwowski (Eds.), Handbook of human factors and ergonomics (5th edition) (pp. 782–815). New York: John Wiley. Stanney, K. M., Salvendy, G., Deisigner, J., DiZio, P., Ellis, S., Ellison, E., Fogleman, G., Gallimore, J., Hettinger, L., Kennedy, R., Lackner, J., Lawson, B., Maida, J., Mead, A., Mon-Williams, M., Newman, D., Piantanida, T., Reeves, L., Riedel, O., Singer, M., Stoffregen, T., Wann, J., Welch, R., Wilson, J., & Witmer, B. (1998). Aftereffects and sense of presence in virtual environments: Formulation of a research and development agenda. Report sponsored by the Life Sciences Division at NASA Headquarters. International Journal of Human-Computer Interaction, 10(2), 135–187. Stauffert, J.-P., Niebling, F., & Latoschik, M. E. (2020) Latency and cybersickness: Impact, causes, and measures. A review. Frontiers in Virtual Reality, 1, 582204. https://doi​.org​ /10​.3389​/frvir​.2020​.582204 Treisman, M. (1977). Motion sickness: An evolutionary hypothesis. Science, 197(4302), 493– 495. https://doi​.org​/10​.1126​/science​.301659 Tyler, D. B., & Bard, P. (1949). Motion sickness. Physiological Review, 29, 311–369. Uliano, K. C., Kennedy, R. S., & Lambert, E. Y. (1986). Asynchronous visual delays and the development of simulator sickness. Proceedings of the Human Factors Society 30th Annual Meeting (pp. 422–426). Dayton, OH: Human Factors Society. Vailshery, L. S. (2021, Jan 22). Share of business executives adopting augmented or virtual reality technology worldwide as of December 2018, by stage. Statista. https:// www​.statista ​.com ​/statistics ​/1097137​/ar​-vr​-adoption​-levels​-among​-global​-business​ -executives/ Watson, G. S. (1998). The effectiveness of a simulator screening session to facilitate simulator sickness adaptation for high-intensity driving scenarios. Proceedings of the 1998 IMAGE Conference. Chandler, AZ: The IMAGE Society. Weech, S., Calderon, C. M., & Barnett-Cowan, M. (2020). Sensory down-weighting in visualpostural coupling is linked with lower cybersickness. Frontiers in Virtual Reality, 1, 10. https://doi​.org​/10​.3389​/frvir​.2020​.00010 Weech, S., Kenny, S., & Barnett-Cowan, M. (2019). Presence and cybersickness in virtual reality are negatively related: A review. Frontiers in Psychology, 10, 158. https://doi​ .org​/10​.3389​/fpsyg​.2019​.00158 Welch, R. B. (1978). Perceptual modification: Adapting to altered sensory environments. New York: Academic Press. Welch, R. B., & Mohler, B. J. (2014). Adapting to virtual environments. In K. S. Hale & K. M. Stanney (Eds.), Handbook of virtual environments: Design, implementation, and applications (2nd edition, pp. 627–646). New York, NY: CRC Press. Wendt, G. R. (1968). Experiences with research on motion sickness (NASA Special Publication No. SP-187). Pensacola, FL: Fourth Symposium on the Role of Vestibular Organs in Space Exploration. Widdowson, C., Becerra, I., Merrill, C., Wang, R. F., & LaValle, S. (2021). Assessing postural instability and cybersickness through linear and angular displacement. Human Factors, 63(2), 296–311. https://doi​.org​/10​.1177​/0018720819881254

Cybersickness in Immersive Training Environments

179

Wilson, M. L. (2016). The effect of varying latency in a head-mounted display on task performance and motion sickness. Clemson University. Retrieved from  http:// tigerprints​.clemson​.edu​/all​_dissertations​/1688/ Wilson, M. L., & Kinsela, A. J. (2017, September). Absence of gender differences in actual induced HMD motion sickness vs. pretrial susceptibility ratings. Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 61, No. 1, pp. 1313– 1316). Los Angeles, CA: SAGE Publications. Wilson, J. R., Nichols, S., & Haldane, C. (1997). Presence and side effects: Complementary or contradictory? In M. Smith, G. Salvendy, & R. Koubek (Eds.), Design of computing systems: Social and ergonomic considerations (pp. 889–892). Amsterdam, Netherlands: Elsevier Science Publishers, San Francisco, CA, August 24–29. Wilson, J. R., Nichols, S. C., & Ramsey, A. D. (1995). Virtual reality health and safety: Facts, speculation and myths. VR News, 4, 20–24. Wu, B., Yu, X., & Gu, X. (2020). Effectiveness of immersive virtual reality using head‐mounted displays on learning performance: A meta‐analysis. British Journal of Educational Technology, 51, 1991–2005. https://doi​.org​/10​.1111​/ bjet​.13023 Zhan, T., Xiong, J., Zou, J., & Wu, S. T. (2020). Multifocal displays: review and prospect. PhotoniX, 1, 1–31.

8

Distributed Debriefing for SimulationBased Training Cullen D. Jackson, Di Qi, Anna Johansson, Emily E. Wiese, William J. Salter, Emily M. Stelzer, and Suvranu DeJared Freeman

CONTENTS Introduction............................................................................................................. 182 Issues to Consider in Providing Distributed Debriefing for Simulation-Based Training................................................................ 184 The Rest of This Chapter............................................................................... 185 Debriefing Functions and Methods......................................................................... 186 Functions of Debriefs.................................................................................... 186 Methods of Debriefs...................................................................................... 187 Challenges of Distributed Debriefs......................................................................... 191 Performance Diagnosis.................................................................................. 191 Performance Recall, Comparison, and Extrapolation.................................... 193 Assessment and Display of Competence....................................................... 193 Requirements for Distributed Debriefs................................................................... 194 Communication.............................................................................................. 195 Collaboration................................................................................................. 195 Automated Data Capture............................................................................... 195 Data Presentation........................................................................................... 196 Data Selection................................................................................................ 196 Replay Perspective......................................................................................... 196 Expert Models of Performance...................................................................... 197 Flexible Delivery Style.................................................................................. 197 Post-Exercise Review.................................................................................... 197 Store Lessons Learned................................................................................... 198 Scalable.......................................................................................................... 198 Ease of Use.................................................................................................... 198 Current Techniques for Debriefing Distributed Teams........................................... 199 State of the Art in Distributed Debriefing...................................................... 199 Large-Scale Distributed Simulation Training Exercises................................ 199 Small-Scale Distributed Simulation Training Exercises................................200 DOI: 10.1201/9781003401360-8

181

182

Human Factors in Simulation and Training

Summary................................................................................................................. 201 Acknowledgments................................................................................................... 201 References............................................................................................................... 201

INTRODUCTION When we read the word “debrief,” we often imagine a group of soldiers around a map discussing the battle just fought, pilots slicing their hands through the air describing a dogfight, or a medical team discussing a difficult case. In each of these instances, the situation the team is discussing could have been a training exercise or a realworld operation. For the purposes of this chapter, we are focused on a version of the latter—debriefing for simulation-based training, and specifically on considerations for conducting debriefs when the team is working together in a distributed manner. A “distributed” team is one that is geographically dispersed—teammates could be in different rooms of the same building or thousands of miles apart. Regardless of the manner in which they are separated, team members in these situations usually are connected only by the technology available (i.e., phones, chat rooms, simulationbased trainers) as part of the training event. It also may be the case that small teams are situated together, and they are working with other distributed teams as a multilevel system (or team-of-teams) (Kozlowski & Klein 2000); in this case, teams would be interacting with other teams (or individuals within distributed teams) at a distance. A “debrief” (aka debriefing, after-action review [AAR]) is a facilitated discussion of training performance in which the basic goal is to enhance subsequent trainee performance, generally conducted soon after the training event (Sawyer et al. 2016), and it is considered the most fundamental aspect of simulation-based training for effective learning (Issenberg et al. 2005). During a distributed training event, the learners (or teams of learners) are separated from each other, and the instructors may be together or distributed among the learners. In the case of the former, the instructors may have similar observation viewpoints of the learners depending on whether they are observing the learners as a whole, in small groups, or one-to-one. In the case in which the instructors also are distributed, they definitely will have different perspectives on the event, and since they are not collocated, they (and the learners) will need to use some collaborative technology to interact during the debriefing; this could range from simple teleconferences to sophisticated computer-based approaches. “Simulation-based training” is a methodology that allows learners to immerse themselves in realistic situations for the purpose of gaining experience in a safe (physically, psychologically) manner compared to real operations or live exercises (Lateef 2010). Immersion occurs through interactions with real or simulated versions of the systems, people, teams, environments, organizations, etc. they would encounter in actual operational situations while working through realistic scenarios (or missions). Since the focus of this chapter is distributed debriefing, we are interested in simulation-based training that involves interacting with systems that collect data on the events occurring during the training and how trainees respond to them

Distributed Debriefing for Simulation-Based Training

183

during the exercise since some forms of technology are required to facilitate learner interactions. Debriefs for distributed simulation-based training now are de rigueur largely because distributed simulation-based training is a more prevalent learning modality, which has been driven by several factors. First, real-world missions and events have become increasingly complex. In the military domain, missions often involve joint operations between multiple US forces (e.g., Air Force, Navy, Army), allied/coalition (e.g., NATO) forces, and sometimes nongovernmental agencies (NGOs) like the Red Cross. The heterogeneous participants in such missions generally are geographically distributed, and thus it is difficult and expensive to coordinate co-located training for these groups. In some cases, it also fails to capture the actual distributed nature of the tasks being simulated. In part, this reflects broadening mission sets that include those not traditionally handled by the military (e.g., humanitarian aid). This broadening of missions and their joint/coalition emphasis also require a wider range of skills, including teamwork and coordination across organizational boundaries, and implies that one overall (often called “community” or “mass”) debrief may not address all training requirements across participants, teams/groups, and organizations. In the healthcare domain, there are events (e.g., mass casualties) that are similarly complex in requiring coordination across geographic and/or organizational boundaries, and training events need to reflect this complexity as well as appropriate debriefing opportunities and content across collaborating organizations and agencies. Second, pressures to reduce training costs are making it harder to support the expense of bringing people together for the purposes of training. For example, operating room-based team training exercises have proven to be expensive and difficult to conduct as they require a centralized facility and a team of expert clinicians to conduct the exercises, evaluate the participants, and debrief them verbally (Hippe et al. 2020). The situation is even more challenging when conducting healthcare training in resource-constrained countries overseas, including long travel distances and high costs for bringing medical instructors for onsite teaching. Third, rapid advances in enabling technologies make it relatively easy to use existing simulators and computing systems for distributed simulation. For computerbased simulation (e.g., simulated desktop systems, VR-based simulators), the prevalence of high-speed internet and inexpensive, cloud-based computation and storage solutions make it easier and more cost-effective to bring trainees together virtually. Additionally, high levels of network reliability, bandwidth, and speed are dropping in price far more quickly than new simulation technologies are being introduced, making the basic technology infrastructure needed for distributed simulations more available. In particular, DIS (distributed interactive simulation) and HLA (high-level architecture)—the two communications standards used for almost all military simulations—can run over wide-area networks as long as those networks have adequate performance characteristics. This means that simulators that often had to be in adjacent rooms to interoperate effectively can now be separated by thousands of miles. Fourth, restrictions on bringing people together for training (e.g., COVID-19 pandemic) necessitate innovations in transforming traditional in-person learning into distributed, simulation-based learning environments.

184

Human Factors in Simulation and Training

Issues to Consider in Providing Distributed Debriefing for Simulation-Based Training Because distributed simulation-based training involves multiple people in different locations, it necessarily will involve multiple simulators or simulated systems. Often, but not always, this also means that multiple different types of simulators or simulated systems (e.g., clinical systems, weapons platforms, and/or operational elements depending on the training domain) will be used. Unless all of the learners have the same training needs, there also will be an explicit (or implicit) hierarchy of training objectives for the event. Some of these training objectives will target individual roles, some across roles or for groups of roles, and some will focus on overall outcomes (i.e., the mission as a whole). Designing and implementing scenarios for distributed simulation can be more complex for more specialized training, and the debriefs obviously have crucial dependencies on the scenario. However, the important topics of training objective development and scenario design are beyond the scope of this chapter. Because of the differences in training objectives between learners, the debriefs for distributed simulation-based training events generally reflect (at least) two levels of the basic hierarchical structure of these objectives: separate debriefs for each role, element, or platform; and a community-wide, or mass, debrief that addresses coordination across roles and overall scenario objectives. Different pieces of information about performance and the scenario itself are required for each role and the community debrief, and different expertise or perspectives are generally required to deliver the debriefs, all of which create several complexities. First, it is desirable that an instructor (or a designated participant-instructor) be physically present for each separate role, element, or team in the distributed simulation. Because some important actions (or lack of actions) may not be recorded in sufficient detail by automated components of the distributed simulation for immediate post-exercise analysis, humans can provide a level of detail that can be quite valuable. For example, if participants communicate by voice and all voice communications are recorded, while language processing software may be able to transcribe these communications in near real time, it currently is not adequate to support pedagogically useful analysis rapidly. Furthermore, replay of all communications for post-exercise analysis requires too much time and must be synchronized with event replay during the review in order to be meaningfully linked with other actions. Second, identifying points for debriefing across roles/elements to be addressed in the community debrief, and deciding how to characterize those issues, requires information fusion across the distributed simulation platforms. Those issues will involve interaction, coordination, and/or communication across platforms or roles/ elements. The identification of such problems can be difficult, and the diagnosis and suggested remediation for problems will tend to be more so. This fusion must be addressed via procedures, technology, and preparation of the instructors. Because of the hierarchical nature of the various learning objectives being trained during these events, and the need to collect and combine information across different roles and systems, more time and energy must be expended to prepare the debriefs;

Distributed Debriefing for Simulation-Based Training

185

this is true for debriefing at the role/element, team/platform, and community levels. Instructor-observers must share their perspectives to gain insights into events and actions that they did not observe, identify performance issues that may span across roles and levels of hierarchy, and synthesize appropriate feedback for these various learners and stakeholders. Thus, we believe that more formal and more extensive use of computer-supported debrief preparation tools is needed to effectively conduct debriefs for distributed simulation-based training than for traditional co-located training. This is true to a lesser, but still significant, extent even if a single distributed team, element, or platform is being trained because the distributed instructors still must share insights and build up reasonably extensive shared situation awareness about the exercise. Computer-supported methods of rapidly moving through and analyzing performance data can be of particular value in distributed training because more cognitive processing will tend to be required by distributed instructors (or a single instructor observing multiple, distributed learners) than in co-located training. Importantly, while shared situation awareness typically may not be possible, even for distributed training of a single element, it is needed to understand the complete scenario timeline and the ways in which trainee performance propagates across time and roles/elements. Therefore, well-designed, computer-supported debriefing tools should assist in sharing the debriefing workload across instructors, allowing each to address what he or she has observed in detail, thus facilitating a more global awareness of performance across training participants. Moreover, the effectiveness of debriefing tools for distributed teams is highly dependent on the instructors’ ability to capture, evaluate, and analyze extensive performance data during the training scenario. Therefore, it also is desirable to incorporate computerized algorithms and methods to automate the data collection, classification and categorization of performance data, and the subsequent generation of summary information and visualization cues to facilitate post-event debriefing and reduce the cognitive loads for both instructors and trainees (Hanoun & Nahavandi, 2018). Ideally, these analysis and debriefing technologies also would facilitate collaboration between the distributed instructors since they operate under considerable time and task pressure: debriefs typically take place within one hour of completing the training exercise while experiences are fresh in participants’ and instructors’ minds. Task pressure comes from the fact that failing to address important aspects of the training and provide feedback to learners during the debrief can result in seriously impaired learning which attenuates the purpose of the training event.

The Rest of This Chapter In the following sections, we discuss the functions and methods for conducting debriefs in some detail. We then dissect the challenges and opportunities afforded by the increasing importance of distributed debriefs. Then, we address the requirements for distributed debriefs followed by a discussion of current methods for conducting them. We conclude the chapter with a more speculative discussion of the future of distributed debriefs for simulation-based training.

186

Human Factors in Simulation and Training

DEBRIEFING FUNCTIONS AND METHODS Debriefings and AARs fulfill diagnostic, instructional, and social functions. We define these functions and then turn to a brief review of the current techniques and technologies that attempt to support computer-mediated debriefs (i.e., debriefs that leverage computer systems for data collection, performance analysis and review, collaboration, and instruction).

Functions of Debriefs The principal function of debriefs is instructional: they must convey the right lesson to the right people at the right time. In order to do this, debriefs must help learners and instructors diagnose—identify and characterize—specific episodes of performance during the training event. That is, the debriefs should facilitate the recall of periods of correct and incorrect performance in context. Once diagnosed, episodes of correct performance can be used instructionally to reinforce those behaviors and generalize them to similar (but not identical) future circumstances. Similarly, specific instances of incorrect performance can be used to discourage repetition of those behaviors in the future, cue recall of correct performance knowledge, and associate it with similar circumstances. These activities should be targeted at trainees who need to learn and who have the capacity and motivation to do so. A debriefing is not just a process for gathering performance data and delivering lessons, it also helps participants to discover those lessons themselves as well as help both instructors and trainees discover performance failures (and successes) and diagnose their causes. In addition to helping to diagnose performance and facilitate instruction for continued performance improvement, debriefs should help instructors to link performance with learning objectives to help focus their instruction. For example, a medical team in simulation-based training to learn better teamwork skills may have to perform cardiopulmonary resuscitation (CPR) as part of the simulated case. During the debrief, the learners may diagnose that their chest compressions were too shallow over the course of the event and want to focus instruction on ways to improve. However, the training objective was to learn good teamwork skills, so the debrief should help to focus diagnosis and instruction on episodes of performance that link to that objective while also allowing for identifying and improving areas of critical performance that may be ancillary to the event’s purpose (e.g., improving CPR skills). Debriefs also serve a social function. For all involved, while they are an opportunity to demonstrate and assess technical competence, and learn how to discriminate good from bad performance and diagnose its cause, they also help develop social competence by practicing how best to convey critiques of oneself with candor and of co-participants or trainees with diplomacy. In addition to facilitating positive social interactions, unfortunately debriefs also may exacerbate the inherent constraints of prevailing social structures, namely the formation of social or “status” hierarchies. “Status” is the social “ranking” of an individual relative to others. It is the fundamental basis for social hierarchy because those higher in the hierarchy are seen as

Distributed Debriefing for Simulation-Based Training

187

more competent and legitimately owning their status, which in turn grants legitimacy to the resulting hierarchy (Galinsky et al. 2008). Research has demonstrated that people routinely use status characteristics, such as organizational role, time in service, ethnicity, race, profession, gender, and expertise (Bales et al. 1951; Berger et al. 1966; Berger et al. 1977) as the basis for generating performance expectations, consistent with culturally salient stereotypes. From a perspective of debriefing, these status characteristics could negatively influence setting performance objectives and diagnosing performance for discussion. In addition, they can drive interactions with learners and instructors that result in patterns of unequal contributions within the group (Silver et al. 1988; Silver et al. 2000), which inhibits the information-exchange process and limits the potential positive instructional impact of diversity within the group. There is evidence from more recent research to support that status characteristics confer (dis)advantage even in virtual groups (Bélisle & Bodur 2010; Principe & Langlois 2013), which has implications for debriefings of distributed simulationbased training. The good news is that strategies exist to not only mitigate the deleterious effects of status, but they can also enhance team learning outcomes and the team’s ability to diagnose performance failures, which is discussed later in the chapter. In sum, debriefing supports diagnosis of performance, recall of performance in training, understanding of expert performance, generalization to future situations, and assessment and display of competence.

Methods of Debriefs Techniques and technologies for computer-mediated debriefs and AARs have evolved to support these functions with varying levels of effectiveness. One method for supporting the instructional function of debriefs is through replaying the events of the training scenario. For example, replaying video footage of medical teams during training has served as a valuable source for debriefing following team training (Sawyer et al. 2016). Replay of the events encountered by the learners during the training scenario also has a strong effect on learning since serial replay eases the recall of events and learner performance because it conforms to the serial structure of episodic memory (Tulving 2002). Replay can also reinforce memory for normative sequences of events (Schank 1982), and thus help learners generalize from scenario-specific episodes of performance to similar future situations. Serial replay helps trainees recognize sequential actions that aggregate into failure (or success) as a scenario evolves, and therefore support performance diagnosis and the skills underlying diagnosis and, hopefully, prognosis. Finally, it should help to minimize hindsight bias in diagnosing performance since learners and instructors can see the recorded behaviors in context (Fanning & Gaba 2007). However, replay often is implemented in ways that provide an overall view of the scenario without the ability to “drill-down” to individual roles or more granular information. For example, replay is often implemented as a set of icons moving over an overhead view of the training environment (e.g., vehicle icons moving over a tactical map). While this is a useful representation for cueing recall of the overall tactical

188

Human Factors in Simulation and Training

state of forces and tactical actions of units, it does not help trainees understand the situation or learning environment from their perspective or that of other roles in the simulation. Replay systems rarely record and represent learners’ displays, instruments, or viewpoints/perspectives, nor do they support recall of responses because the systems often do not record and represent the trainee’s use of the simulators or simulated systems. Rare exceptions, mainly in the field of aviation training, include an F-16 distributed debriefing system developed by the Air Force Research Laboratory (AFRL) that combines a central tactical view with instrument displays on each side; an example of implementing “overlap,” a technique for maintaining visual momentum in a display (Bennett & Flach 2012). Similarly, the Dismounted Infantry Virtual After-Action Review System (DIVAARS, developed by the Army Research Institute and the University of Central Florida) provides multiple viewpoints of the simulation space during replay (Goldberg et al. 2003). Another system developed to teach the operation of anesthesia machines used an augmented reality (AR)-based debrief system to provide virtual overlay information on the real-world training environment and allowed playback of recorded training experiences through a user-controlled egocentric viewpoint (Quarles et al. 2008; Quarles et al. 2013). These systems help operators to extract information across various perspectives of the scenario while maintaining overall context (Woods’ visual momentum; Woods 1984), which better facilitates detailed discussions about how learners perceived the environment and the actions they took in it. While replay is useful to help learners understand the course of an overall situation, the serial nature of many of its implementations makes it more difficult for learners to relate parts of the scenario to other parts. To help relate non-sequential scenario events, it may be useful to show multiple instances of a class of events and learners’ behavioral responses, or to even relate events in one simulation to similar events in another. To do this, debriefing systems must allow for marking events so that learners and instructors can access them on-demand (also called “random access”) rather than sequentially. While some systems allow instructors to “bookmark” events for future review, in our experience, they are rarely used. Debrief systems rarely help instructors identify and navigate from one instance of a class of events to the next because simulation systems are rarely instrumented with measurement systems that can categorize events and the human responses to them, as well as link them to the learning objectives of the simulation scenario. One such system (Salter, et al. 2005) categorizes events, assesses performance in those events, displays those categories and assessments, and links each instance directly to its replay. This design gives trainers random access to multiple instances of a given class of events and thus supports trainees as they attempt to generalize from instances to the larger class. The previously described AR-based system for learning anesthesia machine operations also supports random access to scenario events and learner performance. This allows both instructors and trainees to navigate between different viewpoints to visualize key events in time or to better view information that may have been previously occluded (Quarles et al. 2008; Quarles et al. 2013), as well as use the standard replay controls of traditional video-based debriefing platforms.

Distributed Debriefing for Simulation-Based Training

189

The diagnostic function of debriefs typically is supported by the debriefing techniques of the instructors with some scaffolding support by debriefing technologies. Debriefing techniques encourage instructors and trainees to identify and analyze strengths and shortcomings in performance. The Navy and the Air Force, for example, decompose debriefs for large simulated and live exercises into independent debriefs of small elements or packages, and then conduct overall debriefs, supported by technology, involving the entire training audience (community or mass debriefs). The element (or role) debriefs typically identify specific performance failures that support diagnosis in the subsequent community debrief. The Army has codified its diagnostic method in a set of questions that learners explore during the debrief: What was supposed to happen? What happened? What accounts for the difference? (Dixon 2000), and a set of guidelines (e.g., “Call it like you see it,” “No thin skins”) that encourage participants to think critically and to be candid in their review of events. Similarly, the commercial aviation community espouses debrief methods that engage learners in identifying and diagnosing performance failures. Studies of these methods and their impact on diagnostic quality are rare. However, one analysis of debriefs in commercial aviation (Dismukes et al. 2000) found that instructors often failed to engage trainees in diagnostic (or any) discussions during the debrief, and instead dominated these sessions with monologues concerning their own observations. When techniques fail in this way, instructors have few diagnostic technologies on which to fall back. In general, debriefing systems are incapable of generating diagnoses because they do not incorporate expert behavioral models against which to compare trainee performance. Nor do most systems generally record data concerning trainee performance or compute measures that summarize that performance, attribute effects to individual performers, or relate causes to effects. That said, in the healthcare domain, there are several new and innovative systems that attempt to collect, diagnose, and summarize learner performance to facilitate more robust debriefs. A mixed reality AAR system has also been used in the training of medical procedures such as central venous access (CVA) (Lampotang et al. 2013), in which a physical human simulator is augmented with 3D virtual human anatomies such as the lungs and veins, allowing the user to visualize the needle insertion procedure inside the human body. In addition to integrating an augmented display, the operations of the instruments during the procedure are tracked by the simulator using an embedded six DoF (degree-of-freedom) magnetic sensor; this allows the trainee to observe their own data and reflect on their own performance. Computer-based multimedia (audio, video, text, and graphs) debriefing has been shown to be useful in teaching not only medical skills, but also nontechnical skills that do not relate to medical knowledge or technical procedures, but instead encompass situation awareness and interpersonal skills. The Interpersonal Scenario Visualizer (IPSVize) debriefing tool (Raij & Lok, 2008) was developed to allow medical students to review their interactions with virtual human patients. These interactions are captured, logged, and processed by the simulator to produce spatial, temporal, and social visualizations to drive discussion during the debrief. In addition, the system also evaluates students’ actions and provides feedback to help them gain insights into methods for improving their interactions with real patients.

190

Human Factors in Simulation and Training

In a web-enabled, scenario-based training program for debriefing emergency medical teams (T-TRANE, developed by Aptima, Inc. and the University of Maryland Shock Trauma Center) (Xiao et al. 2007), video segments are used for demonstrating good and poor examples of teamwork skills. Learners use these exemplars to identify instances of ineffective teamwork and discuss how the shortcomings could be resolved. Unfortunately, video taken of simulation-based training is limited by the viewpoint of the camera, and fine details of learner actions, as well as other team members’ behaviors, may be unobservable. These limitations hinder the ability of the instructors and learners to fully appraise the teamwork performance during debriefing. To counteract this constraint, a recent system collects multiple synchronized data streams to capture a multitude of intraoperative data, such as physiological parameters from both patients and healthcare professionals, and audiovisual data from an inroom wide-angle camera, laparoscope video, and wearable cameras (Goldenberg et al. 2017; OR Black Box). Additionally, automated data analysis enables the platform to generate a joint team performance report that could be used as a tool for structured postoperative multidisciplinary debriefing (van Dalen et al. 2021). If debriefs are meant to help trainees learn expert alternatives to their incorrect performance, they must present those alternatives in some way. Few high-end simulators (e.g., flight simulators and driving simulators with physical cockpits) can generate examples of expert performance because they do not incorporate computational models of expert behavior. Thus, alternative behaviors are highlighted largely through discussion; that is, alternatives are said, not shown. While this may be sufficient in some circumstances, research has demonstrated that verbally describing expert solutions to complex problems (e.g., team planning and execution of military air missions) produces learning outcomes that are reliably inferior to describing and “playing” solutions using video review (Scherer et al. 2003) or visual animations (Shebilske et al. 2009). In the latter paper, expert solutions were generated by an optimization model. In traditional intelligent tutoring systems, these solutions are generated by heuristic or rule-based models of expertise. Recently, eye-tracking technology has been used to track the eye movements of learners during simulationbased training. These data provide objective, detailed information to use to diagnose performance, and they also can be used to compare the learners’ gaze behaviors to expert-like gaze behaviors to better drive performance improvements. This use of eye-tracking data appears to be useful for improving patient safety practices such as those relating to patient identification (Henneman et al. 2014). Finally, debriefing techniques (but seldom the technologies) support the social processes by which participants assess the competence of their colleagues and assert their own. As stated previously, learners are differentiated in terms of their status characteristics (gender, race, ethnicity, education, professional rank, etc.). Because status characteristics confer social (dis)advantage pursuant to cultural stereotypes (Webster & Hysom 1998), learners can be thought of as “low-status” or “high-status” relative to other team members. Of note, what makes stereotypes so powerful is that both high-status and low-status team members believe them (e.g., not only do men believe that men are better in math and science, but many women also hold this

Distributed Debriefing for Simulation-Based Training

191

belief). Because of these status characteristics, learners in the simulation environment will perceive one member to be especially qualified to perform a task, and other team members will defer to that “high-status” member. As such, high-status group members often enjoy more opportunities to contribute, have more influence, and have their contributions more positively evaluated. In contrast, low-status group members will defer to high-status members and limit their contributions (Berger & Cohen 1972; Berger et al. 1977). Subsequently, lower-status team members will become less likely to offer other (sometimes critical) information to the group during the debrief because they perceive their information and insights as less valuable. This process, also known as the “burden of proof” process (Berger & Cohen 1972; Berger & Webster 2006), proceeds unless some event or some new information interrupts the process. Therefore, whether in-person or distributed, it is critical that debriefing techniques and technologies incorporate norms for equal participation (Cohen 1993), and that instructors are taught to be attuned to the structural features of the team and help direct interactions accordingly. The functions of debriefing—instructional, diagnostic, and social—are partially supported by debriefing systems, and debriefing techniques help instructors to fill the gaps, particularly supporting diagnostic and social functions. However, instructors often have difficulty applying good debrief technique, and the increasing need for distributed debriefing may make it even more difficult for them to use these methods, as we discuss below, which provides opportunities to design more of these functions into instructional and debriefing technologies.

CHALLENGES OF DISTRIBUTED DEBRIEFS Recent advances in technology have allowed for the development of coordinated simulation tools, which can be used to simultaneously train groups of individuals who are dispersed across several geographical locations. As outlined above, these distributed training exercises can be conducted with less cost and risk than traditional live training events, allowing diverse groups of individuals to collaboratively train whole mission exercises more frequently than was ever before possible (e.g., Dwyer, Fowlkes, Oser, & Salas 1996). While distributed training can produce more effective and routine training events, this approach to training can complicate debriefs from both technical and social standpoints. We discuss the potential challenges that distributed training can pose to debriefing within the framework of the five key functions of a debrief discussed previously, namely: diagnosis of performance, recall of performance in training, understanding of expert performance, generalization to future situations, and assessment and display of competence within the social setting of a debrief.

Performance Diagnosis The effectiveness of a debrief hinges on the ability of the instructors to support learners in diagnosing the underlying causes of performance failure (or success), and attributing those causes directly to individual or team behavior. As noted above,

192

Human Factors in Simulation and Training

serial replay capabilities are usually provided as a global representation of mission performance, which trainees observe to recall and learn the general flow of the simulated scenario. Traditionally, such replays do not include views of specific displays or instruments, which can provide the needed context of individual constraints and reasoning in diagnosing performance. When the training environment is extended to include multiple, diverse, and distributed training groups, the challenges of using this training approach become increasingly apparent. Because distributed simulation-based training substantially reduces the logistics and costs associated with live training events, more diverse trainees (or more trainee groups) can participate. For example, a distributed training event for the US Navy might include an E2-C Hawkeye aircraft (to provide surveillance coordination), a flight of F/A-18 Hornets (to provide suppression of enemy air defenses and strike ground targets), as well as ground control and intelligence support. The participant diversity in this example, represented by two air platforms and supporting ground elements, permits complexity in the type of mission used and generates corresponding complexity in the data generated from the simulated mission and the interdependencies of actions between elements to accomplish satisfactory mission performance. With current procedures and technologies, instructors and learners often diagnose successes and failures and analyze the interdependencies of actions to support these outcomes themselves with some technological support. Because most current instructional technologies cannot capture the subtle details of actions and communications between remotely located elements, the diagnosis of performance likely will be hindered in these cases. The debriefing process can be structured to encourage instructors and learners to collaboratively identify performance shortcomings and strengths, for which instructors are an essential guide in the diagnostic process. However, under a distributed training process, participants and instructors likely will not all be collocated, thus inhibiting the instructors for each role/element from observing and integrating performance information in real time across the team. In the best case, performance measures may be collected automatically through the simulators or other simulation systems into a common repository, or remotely by instructors observing performance via global views of the overall scenario. By distributing learners and instructors across locations, instructors also will face challenges in diagnosing performance, and ultimately may be less helpful in supporting performance diagnosis for trainees. The distribution of training participants also may force asynchronous communications and interactions between trainees and training sites. From a social structure and social processes standpoint, this constraint may further exacerbate limited contributions from low-status members, and these missed opportunities for unique observations across learners and roles could have potentially life-threatening implications in real-world situations. Therefore, to support these diagnostic processes and leverage diagnosis to improve future performance, learners and instructors must be successful in recalling mission performance, comparing that performance to expert behavior, and extrapolating behaviors to future situations. Unfortunately, each of these key processes can be inhibited by the dispersion of trainees across geographical locations, as we note below.

Distributed Debriefing for Simulation-Based Training

193

Performance Recall, Comparison, and Extrapolation Traditional replay techniques can provide great utility in supporting memory for the sequence of scenario events (Schank 1982) in traditional training exercises. Distributed training exercises can compromise this recall by increasing the load placed on instructor and trainee memory through two independent mechanisms. First, because distributed training exercises can involve richer interactions with heterogeneous training groups, the data and behaviors associated with these interactions increase in number and complexity. For example, communications in non-distributed training events may be limited to face-to-face voice communications, but in distributed simulation-based events, these same communications must occur over a network, and will involve communications between different types of roles and simulation systems. Second, distributed training exercises rely on simulation systems that have evolved into sophisticated data collection tools, which can exponentially increase the amount of data that are collected (Jacobs, Cornelisse, & Schavemaker-Piva 2006), and may need to be recalled by instructors. Although all these data may not be conventionally discussed in traditional debriefing processes, they may be included in distributed debriefs, which would increase memory demands and reduce the likelihood of recalling any truly granular performance data. This increased data complexity and deluge will require a different set of techniques and technologies that might include computational synthesis and drill-down capabilities into debriefing systems that support distributed training events. The comparison of these collected data to expert alternatives for diagnostic purposes is further complicated by the complexity of the interactions between distributed learners. As the number of individuals involved in the training exercise increases, the predictability of their interactions and the ability to optimize or simulate these interactions in an understandable way to instructors and trainees also becomes complex. In addition, the grouping of similar performance data becomes an essential component to understanding performance trends and extrapolating those trends to predict future behavior. Because the distributed training environment involves integrating many data from multiple, heterogeneous data streams, this training approach can reveal rich, informative patterns in trainee behavior. As in each of the critical debriefing processes discussed thus far, the sheer quantity of data and the heterogeneity of behaviors can complicate this process beyond that encountered with traditional training events.

Assessment and Display of Competence The final function that debriefs provide is supporting the social processes through which trainees can assess themselves and the team and provide feedback to their colleagues, as well as assert their own expertise. Whereas distributed debriefing processes can influence the functions described above by generating more complex and diverse performance data, the distribution of trainees and instructors during debriefing also constrains the social mechanism of appraisal quite differently.

194

Human Factors in Simulation and Training

Distributed debriefing, and the tools that are used to support communication during these processes, can affect social appraisal and display of competence in three key ways. First, information-sharing tools (e.g., collaborative desktops) and communication channels (e.g., video teleconference) can be useful in exchanging knowledge across disparate locations; however, these tools can still constrain the type of information that is readily shared beyond the ways that the hierarchical structure of the group may already constrain. These tools are especially useful at exchanging text-based, or pre-generated spatial content, but are not highly effective at facilitating the types of interactions that occur between learners who are collocated at a whiteboard and visualize and discuss scenario events and diagnose performance. Second, the microphones and sound quality associated with teleconferences and video teleconferences can inhibit fluid discussion and instigate misunderstanding between disparate participants, even with the most advanced systems. These communication technologies also are poor at transmitting any radio communications that occurred during the scenario and are being replayed during the debrief. Finally, these communication devices cannot capture the nonverbal communications (e.g., gestures, eye gaze, facial expressions) that can be used to effectively assess and display competence. While traditional, in-person training processes can rely on nonverbal information exchange, such as eye contact, distributed training environments strip this form of communication from the essential social interactions that occur between individual trainees, as well as between trainees and their instructors.

REQUIREMENTS FOR DISTRIBUTED DEBRIEFS In the preceding section, we identified several challenges that distributed debriefings might, and frequently do, encounter. However, standard procedures for conducting effective distributed debriefing with collaborative technologies have not yet been defined. While a range of processes could be used to prepare and deliver debriefs in the distributed environment, the utility of these approaches depends on both the context in which they are being used and the design of the training technology to support the instructor. It is tempting to discuss the technical and procedural requirements for distributed debriefs separately. However, due to their very nature, distributed debriefs combine both technology and process in ways that are difficult, if not impossible, to separate. To ensure that the technologies and the processes interact seamlessly, it is important to address the design of these pieces simultaneously and collaboratively. We discuss distributed debriefing requirements below, relating them to the previously defined debrief functions: diagnosis of performance, recall of performance training, understanding of expert performance, generalization to future situations, and assessment and display of competence. Importantly, while these requirements may be met by a single technology, it is likely that an integrated set of tools and institutional processes will be needed to conduct effective distributed debriefs for simulation-based training.

Distributed Debriefing for Simulation-Based Training

195

Communication Distributed training events require constant communication of information and data throughout the events, and the subsequent distributed debriefs also must support communication between the various locations involved, during both the preparation and delivery phases of the debrief. At a minimum, voice communication must be supported. Ideally, video conferencing will also be supported across all sites to allow for nonverbal communication and rapport development between participants. This is no small technological feat across multiple locations. While current technologies exist that support this requirement, there are numerous technical issues that pose some issues. Naturally, the frequency of technical issues increases with more distributed sites. Aside from technical issues, each distributed simulation event must choose and follow some basic guidelines for using these communication technologies, beginning with the start of the communication (i.e., who calls whom) and including turn-taking tips on reducing extraneous noise, and how electronic information will be shared. Communication is a critical requirement of any distributed debrief and all the previously defined functions require it.

Collaboration Closely tied to communication is collaboration. The distributed debrief must allow sites to coordinate on the content that should be debriefed, the strategy for debriefing, and the actual delivery of the debrief. All supporting information related to the execution of the simulation exercise—performance data, feeds from simulators or simulated systems, video feeds to support replay and perspective-taking—must be shared across sites. Collaboration technologies that allow all participants to view the same information simultaneously will reduce confusion and facilitate the creation of common ground between all instructors and learners. Optimally, the collaboration technology also will allow each site to take control of the information so they can interact with the information and help illuminate any performance results they deem appropriate from their perspective to the rest of the participants. Here again, rules on effective use of the collaboration technology are imperative as confusion in turntaking can quickly lead to a chaotic debriefing, both in the preparation and delivery phases. In addition, creating norms for equal participation across sites and roles will help alleviate negative influences of the status hierarchy (Cohen 1993), which will further support the collaborative environment by facilitating more equal interactions and contributions. As with communication, without collaboration mechanisms, none of the other debriefing functions can be fulfilled in a distributed fashion.

Automated Data Capture Distributed simulation-based events generally run on a very tight schedule, and instructors are not given much time to develop their debriefs post-event (e.g., 20–30 minutes is fairly standard for a large event). Any distributed debriefing technology

196

Human Factors in Simulation and Training

must facilitate rapid development, part of which is accessing available performance data and simulator feeds. Allowing instructors to view performance data specific to their element and common across all participants supports diagnosis, recall, understanding, generalization, and, ultimately, overall performance assessment. This technology should accept and process performance data and simulator feeds automatically or semi-automatically (in the case of any observer-based measures used during the event) with little direction on the part of the instructor.

Data Presentation Any performance data and measurements collected during the exercise have two potential presentation audiences: the instructors (during debrief preparation) and the learners (during the debrief itself). How this performance data is presented strongly affects instructors’ and learners’ ability to diagnose, understand, and, subsequently, assess their performance during the exercise. The distributed debrief technology should allow performance data to be presented at varying levels of detail, as it relates to all learners, subsets of learners, and individual learners. Drill-down capabilities are key as are the methods by which performance data are presented (e.g., replay, on a timeline, by event, textual representations, graphs) since they also can influence the interpretation of the information presented.

Data Selection Not all performance data is relevant for each individual learner, nor are all the data relevant to the entire group of participants. In order to facilitate diagnosis and understanding of performance, instructors must be able to easily select relevant performance data, simulator feeds or displays, communications feeds, and video (as available). Thus, distributed debriefing systems must allow instructors to identify and select key scenario events and associated performance data that are indicative of both good and poor performance. Subsequent review of performance during these key events should facilitate debriefs across sites at the role, group, and community levels.

Replay Perspective While viewing performance data is important, replaying exercise events is equally critical. Showing these events from multiple viewpoints can greatly facilitate diagnosis, recall, understanding, generalization, and assessment of trainee performance. This is particularly important for assessing coordination and teamwork. Viewing events on a map (or overhead representation of the training environment), from a first- (or third-) person view, can provide context and perspective to participants that otherwise would be difficult to obtain. Similarly, it may be useful to display simulation system/simulator controls and interaction artifacts (e.g., gauges and instrument panels) in order to provide a more common understanding of element capabilities across trainees. The distributed debriefing system must allow instructors to replay

Distributed Debriefing for Simulation-Based Training

197

selected exercise events from these multiple viewpoints to the extent they are available. Additionally, the technology must allow instructors to choose when and how these viewpoints are presented in order to best fit the overall structure of the debrief.

Expert Models of Performance Viewing and analyzing performance in comparison to defined standards or models of expert performance can greatly assist instructors in diagnosing trainee performance. Discussing how the trainees’ performance compared to those standards can assist in understanding what went well or poorly and why. The distributed debriefing technology should present alternative (or expert) models of performance for each role, either in the form of quantitatively modeled behavior or performance categories. When the domain knowledge of the training task has been clearly identified, the most simple and effective way to deliver expert models is through a rule-based system (Grosan & Abraham, 2011) in which the expert performance is represented by a set of rules and coded into the system to mimic the behavior of human experts under different circumstances (Qi et al. 2020). Expert performance models created using machine learning techniques are generally considered more flexible, and suitable, for capturing experts’ technical skills in complex procedures than a set of fixed rules. In fact, a machine learning model trained with expert data can simulate expert performance simultaneously with the learners in the simulation-based training event, demonstrating effective feedback of learning (Rhienmora et al. 2011, Wijewickrema et al. 2018).

Flexible Delivery Style The way in which a distributed debrief is conducted varies according to the institution sponsoring the training exercise, the domain(s) being trained, and the instructors conducting the training. It therefore is important that any distributed debriefing technology not unduly constrain or force instructors into presenting feedback to trainees in a specific style; this requirement can be tricky to fulfill. Certainly, some instructional strategies may be more effective than others, and, indeed, some instructors may be more effective than others. However, all other things being equal, instructors should be allowed to tell the performance narrative (Fiore et al. 2007) in the manner that best suits their needs. For example, it may be appropriate to conduct the debrief in a narrative style while allowing instructors to drill down to specific training objectives and aspects of performance as needed. In this way, the debrief can be tailored to each training audience in a way that maximally supports learning for that community and training environment.

Post-Exercise Review Once the distributed training exercise is complete, the debrief should be available for offline review by each distributed site. Additionally, each learner and instructor should receive a performance report after the exercise is complete. These should facilitate a variety of post-exercise analysis activities, such as more in-depth review

198

Human Factors in Simulation and Training

of each role’s (or site’s) performance, evaluation of the training effectiveness of the simulated scenarios, evaluation of instructor effectiveness, and evaluation of the effectiveness of the debrief.

Store Lessons Learned Distributed exercises accomplish much more than just teaching individual learners in that particular moment of training. Each distributed simulation-based event results in a variety of lessons learned (e.g., about trainees, instructional strategies and training materials, and instructors) that should be used when planning the next distributed training event. These lessons should be reviewed by local units/ departments to facilitate continued learning opportunities, and they should be used to prepare for real-world situations like those that were trained. Retaining and using institutional knowledge is a difficult process in any environment, and the distributed debriefing system must facilitate institutional learning by providing a mechanism to accumulate these lessons learned, distribute them, and use them when designing the next exercise and developing an associated performance assessment plan.

Scalable Distributed simulation training exercises come in all shapes and sizes. At their simplest, they will involve two different learners, each located at a different location. The roles of the learners may, or may not, be similar. At the other end of the spectrum, large-scale distributed simulation events, such as the US Air Force’s Virtual Flag exercise, may involve ten different sites with hundreds of trainees, thousands of training activities at multiple hierarchical levels (e.g., strategic, operational, tactical), and dozens of interconnected systems ranging from high-fidelity aircraft simulators to sophisticated computer-based simulations; adding to this complexity, some sites may host a variety of different types of simulators. Distributed debriefing solutions should consider the scale of the distributed events they intend to support so the complexity of the training is appropriately supported by the technology in both useful and usable ways.

Ease of Use Instructors may be involved in a distributed exercises only sporadically, so the debriefing system needs to be easy to use. Time spent immediately prior to training events, particularly large-scale exercises, typically focuses on reviewing relevant techniques and procedures, becoming familiar with the training scenario being executed during the exercise, and mitigating any issues with the distributed technology. There typically is little time to become familiar with how the technology works. Therefore, any distributed debriefing technology must allow instructors to quickly learn (or re-learn) the system with minimal difficulty or instruction; learners also may need to use the front end of the system during the debrief so the user interface should be as intuitive to use as possible.

Distributed Debriefing for Simulation-Based Training

199

CURRENT TECHNIQUES FOR DEBRIEFING DISTRIBUTED TEAMS Many existing simulation environments provide the technical building blocks upon which many of the requirements listed above can be fulfilled, and while some have been realized by existing technologies, we are unaware of any fully implemented distributed debriefing system that addresses all these requirements.

State of the Art in Distributed Debriefing Today’s distributed simulation training has come a long way. Both the US Navy and the US Air Force regularly conduct complex virtual training exercises that involve a variety of simulated platforms and locations (e.g., the Navy’s Operation Brimstone and the Air Forces’ Virtual Flag Exercises). Research and development efforts across all the US military services continue to develop additional simulation environments for use locally and in a distributed manner. In addition, the development of specifications and protocols, like the Distributed Debrief Control Architecture (DDCA) (SISO 2016) and the Distributed After-Action Playback and Review (DAAPR) (Streit 2020), hold the promise for better integration and improved interoperability for debriefing systems that support distributed simulation-based training. However, these frameworks mostly support allowing existing debrief systems to operate together and share information, and those current systems do not support all the necessary functions for distributed debriefs discussed in this chapter; in particular, the specifications mostly support the interoperation of those systems’ scenario and video replay functions.

Large-Scale Distributed Simulation Training Exercises During large-scale virtual training exercises hosted by the US Air Force and US Navy, the distributed debriefs heavily rely on common tools found throughout the uniformed services; for example, standard video teleconferencing applications are used to connect sites during briefs and debriefs. Performance data collected during the event generally are presented to all trainees as a slideshow during a community (or mass) debrief; elements and smaller units sometimes will conduct a short debrief (sometimes called a “hot wash”) as they put together their thoughts and lessons learned to be presented at the mass debrief. During the community debrief, networked collaboration software is used to share slides between sites and save them to commonly accessible shared network drives. The diagnosis of performance and delivery of the debriefs are typically left up to the individual instructors for each unit, although some overall guidance on areas of good and poor performance may be provided by an internal assessment team. Finally, the slides largely contain textual information with some still images, although generally the images are not diagnostic, but rather serve either to provide graphical reminders to participants or to break up the monotony of the text-heavy slides. This method of developing and conducting distributed debriefs certainly has advantages. The use of commonly available software applications and collaboration

200

Human Factors in Simulation and Training

tools minimizes the maintenance required by each site and alleviates the need to learn another application by already busy instructors. Additionally, the free-form nature of the slide show allows instructors to add whatever content is desired. On the other hand, these commonly available tools do not allow instructors to take full advantage of the plethora of simulation data available to them. However, even if they had these data, they would not have time to analyze and integrate the lessons learned into the debrief given typical time frames. A technology focused more on providing instructors with immediate access to data, quick analytics, and summary feedback regarding the trainees’ performance may help them best use available data while still developing the debriefs rapidly. This innovation would allow them to address more instructional points than they can currently.

Small-Scale Distributed Simulation Training Exercises Smaller-scale, distributed training events are experimenting with distributed debriefing technologies. The US Navy’s DDSBE AAR (developed by Aptima, Inc.) and US Air Force Lab’s (AFRL) Distributed Mission Training Collaborative Briefing and Debriefing System are two such examples; they are primarily used to conduct distributed debriefs across two sites (Salter et al. 2005). Both systems support communication and collaboration between sites, the replay of events using an interactive tactical map, and the view of various instruments from individual simulators. In addition, the DDSBE AAR collects and presents formal performance data collected during the training exercise, and instructors can select specific performance data points for discussion and presentation during the debrief, which synchronized with other data sources and shared across sites. In addition to these systems, MAK Technologies has developed several applications that enable data logging (via MAK Data Logger) across simulators connected to HLA and DIS protocols (via VR-Link) as well as visualization software to enable replay of captured data (via VR-Vantage Stealth). However, it is unclear how well these systems instantiate the requirements discussed in this chapter to fully realize the promise of distributed debriefs for simulation-based training. These smaller-scale distributed debriefing tools certainly show promise for becoming a scalable solution that also could be used during large-scale distributed training exercises. However, additional work needs to be done to ensure that collaboration methods are scalable across multiple sites; incorporate multiple viewpoints during replay; ensure formal performance data are collected and available for all trainees; and support data and analytics use after the exercise. A relatively new concept in simulation-based medical education is telesimulation (TS), in which telecommunication and simulation resources are used to provide education, training, and assessment for learners at distributed locations. With telesimulation, the internet is used to connect simulators between an instructor and trainees in different locations. In an early study, researchers were able to assess the effectiveness of TS in teaching the Fundamentals of Laparoscopic Surgery (FLS) to surgeons in Africa with instructors in Canada (Okrainec et al. 2010). Since then, there has been a rapidly growing interest in applying TS to other medical areas such

Distributed Debriefing for Simulation-Based Training

201

as emergency medicine and anesthesia (McCoy et al. 2017). Of course, with telesimulation comes teledebriefing, which involves an instructor in a remote location using cameras, microphones, and basic videoconferencing software to provide feedback to learners while observing them conduct the simulation in real time (Ahmed et al. 2014). Teledebriefing with telesimulation could help reduce costs and eliminate the barriers of both distance and time (Honda & McCoy 2019), thus making it an ideal tool for delivering debriefs to geographically separate healthcare teams. A more recent study has demonstrated the feasibility and effectiveness of providing real-time training in a mass casualty incident (via telesimulation) with integrated debriefing (via teledebriefing) to healthcare providers overseas using Google Glass (McCoy et al. 2019).

SUMMARY Currently, most developers of distributed debriefs focus either on the technology or on researching academic issues surrounding debriefing processes for distributed learners. Little information is publicly available describing efforts to combine these two strands in meaningful ways. In order to be truly successful, distributed training exercises must conduct debriefs that rigorously and formally promote learning. We believe that, by considering the current challenges to these issues and the requirements presented above, distributed debriefing technologies and techniques can begin to meet this lofty goal.

ACKNOWLEDGMENTS We would like to acknowledge Xinwen Zhang and Samuel Alfred for their contribution to the literature review for this updated chapter.

REFERENCES Ahmed, R., King Gardner, A., Atkinson, S. S., & Gable, B. (2014). Teledebriefing: Connecting learners to faculty members. Clinical Teacher 11(4):270–273. Bales, R. F., Strodtbeck, F. L., Mills, T. M., & Roseborough, M. E. (1951). Channels of communication in small groups. American Sociological Review 16(4):461. Bélisle, J., & Bodur, H. O. (2010). Avatars as information: Perception of consumers based on their Avatars in virtual worlds. Psychology and Marketing 27(8):741–765. Bennett, K. B., & Flach, J. M. (2012). Visual momentum redux. International Journal of Man-Machine Studies 70(6):399–414. Berger, J. M., Cohen, B. P., & Zelditch, M. (1966). Status characteristics and expectation states. In J. Berger, M. Zelditch, & B. Anderson (Eds.), Sociological Theories in Progress (pp. 29–46). New York: Houghton Mifflin. Berger, J., Cohen, B. P., & Zelditch, M. (1972). Status characteristics and social interaction. American Sociological Review 37(3):241–255. Berger, J. M., Fisek, M. H., Norman, R. Z., & Zelditch, M. (1977). Status Characteristics and Social Interaction: An Expectation States Approach. New York: Elsevier Scientific Publishing Company.

202

Human Factors in Simulation and Training

Berger, J., & Webster, M. (2006). Expectations, status, and behavior. In P. J. Burke (Ed.), Contemporary Social Psychological Theories (pp. 268–300). Stanford, CA: Stanford University Press. Cohen, E. E. (1993). From theory to practice: The development of an applied research program. In J. Berger & M. Zelditch (Eds.), Theoretical Research Programs: Studies in the Growth of Theory (pp. 385–415). Palo Alto, CA: Stanford University Press. Dismukes, R. K., McDonnell, L. K., & Jobe, K. K. (2000). Facilitating LOFT debriefings: Instructor techniques and crew participation. International Journal of Aviation Psychology, 10:35. Dixon, N. M. (2000). Common Knowledge: How Companies Thrive by Sharing What They Know. Cambridge, MA: Harvard University Press. Dwyer, D. J., Fowlkes, J., Oser, R. L., & Salas, E. (1996). Case study results using distributed interactive simulation for close air support training. Proceedings of the 7th International Training Equipment Conference (pp. 371–380). Arlington, VA: ITEC Ltd. Fanning, R. M., & Gaba, D. M. (2007). The role of debriefing in simulation-based learning. Simulation in Healthcare 2(2):115–25. https://doi​.org​/10​.1097​/SIH​.0b013e3180315539. PMID: 19088616. Fiore, S. M., Johnston, J., & McDaniel, R. (2007). Narrative theory and distributed training: Using the narrative form for debriefing distributed simulation-based exercises. In S. M. Fiore & E. Salas (Eds.), Toward a Science of Distributed Learning (pp. 119–145). Washington, DC: American Psychological Association. Galinsky, A. D., Magee, J. C., Gruenfeld, D. H., Whitson, J. A., & Liljenquist, K. A. (2008). Power reduces the press of the situation: Implications for creativity, conformity, and dissonance. Journal of Personality and Social Psychology 95(6):1450–1466. Goldberg, S. L., Knerr, B. W., & Grosse, J. (2003). Training dismounted combatants in virtual environments. Paper presented at RTO HFM Symposium on Advanced Technologies for Military Training. Genoa, Italy. 13–15 Oct 2003. Accessed on DTIC https://apps​ .dtic​.mil​/sti​/pdfs​/ADA428918​.pdf on 28 August 2021. Goldenberg, M. G., Jung, J., & Grantcharov, T. P. (2017). Using data to enhance performance and improve quality and safety in surgery. JAMA Surgery 152(10):972–973. Grosan, C., & Abraham, A. (2011). Rule-based expert systems. Intelligent Systems Reference Library 17:149–185. Hanoun, S., & Nahavandi, S. (2018). Current and future methodologies of after action review in simulation-based training. 12th Annual IEEE International Systems Conference, SysCon 2018 – Proceedings. Henneman, E. A., Cunningham, H., Fisher, D. L., Plotkin, K., Nathanson, B. H., Roche, J. P., … Henneman, P. L. (2014). Eye tracking as a debriefing mechanism in the simulated setting improves patient safety practices. Dimensions of Critical Care Nursing 33(3):129–135. Hippe, D. S., Umoren, R. A., McGee, A., Bucher, S. L., Bresnahan, B. W. (2020). A targeted systematic review of cost analyses for implementation of simulation-based education in healthcare. SAGE Open Medicine 8:1–9. Honda, R., & McCoy, C. E. (2019). Teledebriefing in medical simulation. In StatPearls. Retrieved from https://www​.ncbi​.nlm​.nih​.gov​/ books​/ NBK546584/ Issenberg, S. B., McGaghie, W. C., Petrusa, E. R., Lee Gordon, D., & Scalese, R. J. (2005). Features and uses of high-fidelity medical simulations that lead to effective learning: A BEME systematic review. Medical Teacher 27(1):10–28. https://doi​.org​/10​.1080​ /01421590500046924. PMID: 16147767. Jacobs, L., Cornelisse, E., & Schavemaker-Piva, O. (2006). Innovative debrief solutions for mission training and simulation: Making fighter pilots training more effective. Proceedings

Distributed Debriefing for Simulation-Based Training

203

of the Interservice/Industry Training, Simulation, and Education Conference (I/ ITSEC). Orlando, FL. Kozlowski, S. W. J., & Klein, K. J. (2000). A multilevel approach to theory and research in organizations: Contextual, temporal, and emergent processes. In K. J. Klein & S. W. J Kozlowski (Eds.), Multilevel Theory, Research, and Methods in Organizations: Foundations, Extensions, and New Directions (pp. 3–90). San Francisco, CA: Jossey-Bass. Larnpotang, S., Lizdas, D., Rajon, D., Luria, I., Gravenstein, N., Bisht, Y., … Robinson, A. (2013). Mixed simulators: Augmented physical simulators with virtual underlays. Proceedings – IEEE Virtual Reality. Lateef, F. (2010). Simulation-based learning: Just like the real thing. Journal of Emergencies, Trauma, and Shock 3(4):348–352. https://doi​.org​/10​.4103​/0974​-2700​.70743 McCoy, C. E., Sayegh, J., Alrabah, R., & Yarris, L. M. (2017). Telesimulation: An Innovative Tool for Health Professions Education. AEM Education and Training 1(2):132–136. McCoy, C. E., Alrabah, R., Weichmann, W., Langdorf, M. I., Ricks, C., Chakravarthy, B., Lotfipour, S. (2019). Feasibility of telesimulation and Google glass for mass casualty triage education and training. Western Journal of Emergency Medicine 20(3):512. Okrainec, A., Henao, O., & Azzie, G. (2010). Telesimulation: An effective method for teaching the fundamentals of laparoscopic surgery in resource-restricted countries. Surgical Endoscopy 24:417–422. Principe, C. P., & Langlois, J. H. (2013). Children and adults use attractiveness as a social cue in real people and avatars. Journal of Experimental Child Psychology 115(3):590–597. Qi, D., Ryason, A., Milef, N., Alfred, S., Abu-Nuwar, M. R., Kappus, M., De, S., & Jones, D. B. (2020). Virtual reality operating room with AI guidance: Design and validation of a fire scenario. Surgical Endoscopy 35:779–786. Quarles, J., Lampotang, S., Fischler, I., Fishwick, P., & Lok, B. (2008). Collocated AAR: Augmenting after action review with mixed reality. Proceedings – 7th IEEE International Symposium on Mixed and Augmented Reality 2008, ISMAR 2008. Quarles, J., Lampotang, S., Fischler, I., Fishwick, P., & Lok, B. (2013). Experiences in mixed reality-based collocated after action review. Virtual Reality 17:239–252. Raij, A. B., & Lok, B. C. (2008). IPSViz: An after-action review tool for human-virtual human experiences. Proceedings – IEEE Virtual Reality. Rhienmora, P., Haddawy, P., Suebnukarn, S., & Dailey, M. N. (2011). Intelligent dental training simulator with objective skill assessment and feedback. Artificial Intelligence in Medicine 52(2):115–121. Salter, W. J., Hoch, S., & Freeman, J. (2005). Human factors challenges in after-action reviews in distributed simulation-based training. Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting. Sawyer, T., Eppich, W., Brett-Fleegler, M., Grant, V., & Cheng, A. (2016). More than one way to debrief. Simulation in Healthcare 11(3):209–217. https://doi​.org​/10​.1097​/SIH​ .0000000000000148 Schank, R. C. (1982). Dynamic Memory: A Theory of Reminding and Learning in Computers and People. Cambridge: Cambridge University Press. Scherer, L. A., Chang, M. C., Meredith, J. W., & Battistella, F. D. (2003). Videotape review leads to rapid and sustained learning. The American Journal of Surgery 185(6):516–20. https://doi​.org​/10​.1016​/s0002​-9610(03)00062-x. PMID: 12781877. Shebilske, W., Gildea, K., Freeman, J., & Levchuk, G. (2009). Optimizing instructional strategies: A benchmarked experiential system for training. Theoretical Issues in Ergonomics Science 10(3):267–278. Silver, S. D., Cohen, B., & Rainwater, J. (1988). Group structure and information exchange in innovative problem solving. Adv Gr Process 5:169–194.

204

Human Factors in Simulation and Training

Silver, S. D., Troyer, L., & Cohen, B. P. (2000). Effects of status on the exchange of information in team decision-making: When team building isn’t enough. Advances in Interdisciplinary Studies of Work Teams 7:21–51. Simulation Interoperability Standards Organization (SISO). (2016). Standard for Distributed Debrief Control Architecture (SISO-STD-015-2016). Orlando, FL: SISO, Inc. Streit, A. (2020). DDAPR: The dapper way to debrief together. Paper presented at International Training Technology Exhibition and Conference (IT2EC) 2020. London UK. Tulving, E. (2002). Episodic memory: From mind to brain. Annual Review of Psychology 53: 1–25. van Dalen, A., Jansen, M., van Haperen, M., van Dieren, S., Buskens, C. J., Nieveen van Dijkum, E., Bemelman, W. A., Grantcharov, T. P., & Schijven, M. P. (2021). Implementing structured team debriefing using a Black Box in the operating room: Surveying team satisfaction. Surgical Endoscopy 35(3):1406–1419. https://doi​.org​/10​ .1007​/s00464​- 020​- 07526-3 Webster, M., & Hysom, S. J. (1998). Creating status characteristics. American Sociological Review 63(3):351–378. Wijewickrema, S., Ma, X., Piromchai, P., Briggs, R., Bailey, J., Kennedy, G., & O’Leary, S. (2018). Providing automated real-time technical feedback for virtual reality based surgical training: Is the simpler the better? Artificial Intelligence in Education. London. Woods, D. D. (1984). Visual momentum: A concept to improve the cognitive coupling of person and computer. International Journal of Man-Machine Studies 21:229–244. Xiao, Y., Schimpff, S., Mackenzie, C., Merrell, R., Entin, E., Voigt, R., & Jarrell, B. (2007). Video technology to advance safety in the operating room and perioperative environment. Surgical Innovation 14(1):52–61.

9

Performance Assessment in Simulation Steve Hall, Michael Brannick, and John L. Kleber

CONTENTS Subjective Methods of Performance Measurement................................................206 Purpose of Performance Measures................................................................207 Special Properties of Performance Measures in Simulators..........................207 Defining and Assessing Reliability.........................................................................208 Data Requirements.........................................................................................208 Qualitative versus Quantitative......................................................................209 A Qualitative Index............................................................................ 210 Quantitative Indices........................................................................... 211 Special Problems with Simulators.......................................................................... 218 The Gouge..................................................................................................... 218 Instructor Attitudes........................................................................................ 219 Objective Methods of Performance Measurement.................................................. 219 Automated Data Collection Systems...................................................................... 220 Flight Technical Error............................................................................................. 221 Deviation-Based Metrics............................................................................... 221 Root Mean Square Error (RMSE)................................................................. 222 Number of Deviations and Time-Outside Standard....................................... 223 Time within FAA Practical Test Standard..................................................... 223 Non-FTE Measures................................................................................................. 225 Rates of Change............................................................................................. 225 Control Input.................................................................................................. 226 Summary................................................................................................................. 226 Note ....................................................................................................................... 227 References............................................................................................................... 227

Performance assessment is a key element in simulation. In a training context, performance assessment will lay the foundation for feedback to the pilot or flight crew, and in a research context, performance assessment is typically the key to assessing the impact of various factors of interest, such as training or equipment design. There is no single way to measure performance and practical issues typically limit the type and amount of performance data that can be collected during a simulation session. DOI: 10.1201/9781003401360-9

205

206

Human Factors in Simulation and Training

For convenience we classify measures along two dimensions. First, we describe measures as being either subjective or objective. Subjective measures are provided directly by human judges. For example, an instructor might rate a crew satisfactory on a paper-and-pencil scale of Mission Analysis. Objective measures are provided by simulators as a result of recordings or calculations. For example, performance might be defined in terms of deviation from a desired flight path or flight parameter (e.g., airspeed, heading) or in terms of external pilot behaviors such as pushing a button. For the second dimension, we describe measures as being either qualitative or quantitative. By qualitative, we mean measures that are categorical in nature. For example, one might either simply pass or fail a maneuver. For a different example, a pilot’s behavior might be allocated by an instructor to a category such as assertiveness or decision-making. Quantitative measures, on the other hand, indicate magnitude. For example, altitude and vertical speed are both quantitative. We also consider numerical ratings made by judges to be quantitative if the ratings indicate degree, so that a pilot given a rating of 4 is indicated by an instructor as being more proficient than another pilot given a rating of 3 under similar circumstances. The choice of performance assessment technique should be driven by the purpose of the simulation and practical limitations. Training scenarios may require feedback for specific flight crew behaviors. Such feedback may include both subjective appraisals of quality of behavior and objective information about the frequency or timeliness of behaviors. Research endeavors typically seek to examine the impact of specific factors on some outcome, such as flight technical error (FTE) or perceived workload. The former can be efficiently quantified with objective data, while the latter is typically assessed with subjective workload survey instruments. The purpose of this chapter is to present both subjective and objective measures of performance that are commonly used in simulation. A thorough discussion of measurement reliability accompanies the subjective measurement discussion, whereas the objective section emphasizes logistical and practical issues associated with automated methods of performance assessment. While this chapter uses flight training and flight simulation as the context of interest, the principles and guidelines presented in this chapter can be applied to any number of simulated training events where performance measurement is of interest.

SUBJECTIVE METHODS OF PERFORMANCE MEASUREMENT In general, humans are excellent at sensing and perceiving information. We can also become very adept at knowing what to perceive, that is, we learn what is (and is not) worthy of attention in a given situation. In many cases, if we want to evaluate human performance, there is no other choice but to use expert judgment. For example, if we want to evaluate the quality of coordination of two pilots working together in a cockpit, we will most likely be forced to rely on the judgment of an instructor pilot. For these reasons and others, humans are often used as performance measurement devices. Not surprisingly, the most commonly used measures of work performance are based on human judgment (Landy & Farr, 1983).

Performance Assessment in Simulation

207

Purpose of Performance Measures Simulators are typically used to train and evaluate skilled performance. For example, a flight simulator can be used to teach navigation skills. Performance measures, therefore, should support the training and evaluation functions of simulators. Numerical (quantitative) evaluations of proficiency are typically of primary interest in simulators. Such measures are useful in evaluating and documenting individual skill levels for certification or for adequacy of preparation in dealing with the real (not simulated) task. Although numerical evaluations are not very useful in providing developmental feedback in training, they are useful in evaluating the quality of a training program. That is, evaluations of proficiency can be used in program evaluation research.

Special Properties of Performance Measures in Simulators Often the individual using the simulator (the target of evaluation) is aware of the evaluation, and the evaluation has consequences for the target (e.g., certification). This can cause a type of bias known as the Hawthorne effect (McCambridge et al., 2014). In such circumstances, the performance measure should be considered one of maximal performance rather than typical performance. Because of the consequence of the measurement and because the performance measure is related to skills that the target considers important, such evaluations can be threatening or anxiety-provoking for the target. In some cases, the apprehension that results from being evaluated can interfere with performance of the task, thus resulting in a poorer evaluation. Not only does the simulator encourage maximal performance but it does so for a limited time. Typically, the simulator is used to teach or evaluate specific skills. The instructor and student will use the simulator for a time brief enough to allow them to maintain their attention on that specific task. This means that ratings evaluating simulator performance are less taxing for memory than typical performance appraisal ratings, which often cover job performance for a year. Further, for a given session, the experience in the simulator is designed to target one or more skills, such as navigation or coordination. Such training and evaluation design considerations help both the judge and target focus on a limited range of behaviors compared to typical job performance ratings. There are both advantages and disadvantages to the use of humans as judges of performance in simulators. From the standpoint of measurement, one of the main drawbacks to using human judges is that such judges may disagree with one another (see Guion, 2011). Kenny (1991) developed a general model of consensus and accuracy of ratings of interpersonal perception. The factors in the model included (a) the amount of information given to the judges, (b) the degree to which the judges attend to the same behaviors, (c) the degree to which the judges make similar inferences or interpretations of the same behaviors, (d) reliability or consistency of the target (ratee) behavior, (e) degree to which the judges base their ratings on irrelevant behavior, and (f) degree of communication of impressions of the target prior to rating.

208

Human Factors in Simulation and Training

Of Kenny’s factors, items (b), (c), and (d) appear the most important for performance ratings in simulators. We discuss factors (b) and (c) here, and factor (d) later in the chapter. The amount of information given to different judges (factor a) tends to be similar for a given scenario using a given simulator. For example, two judges might watch the same recording of pilots flying a simulator. Even though two judges watch the same participants in the same simulator at the same time, they may disagree in their evaluation of what they saw for several reasons. The judges may attend to different behaviors, so that one sees something that another does not (factor b). For example, only one of two instructors may notice that the copilot has become lost. The judges may interpret the same behavior to mean different things (factor c). Again, one judge may find a behavior overly confrontational, but another judge may find the same behavior to be appropriately assertive. When judges record behaviors, make ratings, or otherwise produce evaluations, they may do so in idiosyncratic ways. One judge might assign an assertive behavior to an interpersonal dimension, but another might classify the same behavior as belonging to decision-making, flexibility, or some other dimension (for general discussions of performance ratings, see Landy and Farr, 1980, 1983).

DEFINING AND ASSESSING RELIABILITY There is a very substantial literature on the reliability of measurement (e.g., Crocker & Algina, 1986; Nunnally & Bernstein, 1994; Wigdor & Green, 1991; Coulacoglou & Saklofske, 2017). There is a somewhat more manageable literature on the reliability of judges (e.g., Shrout & Fleiss, 1979; Hallgren, 2012). Because judges commonly disagree, anyone who uses judges to evaluate performance in simulators should conduct a study to compute one or more indices that quantify how well the judges agree with one another. In this section, we illustrate the more commonly used indices of how well judges agree with one another. We provide recommendations for choosing suitable estimators for the most commonly occurring situations in practice. The choice of disagreement index depends primarily on two issues: (a) whether the judgments are qualitative or quantitative, and (b) whether differences in means across judges are important or meaningful.

Data Requirements To study how well judges agree with one another, we have to collect data. Ideally, we should have a large, representative sample of judges and a large, representative sample of targets (pilots, teams, technicians, or whoever gets judged). Judges and targets should be crossed so that each judge sees each and every target and provides the judgment of interest for each (e.g., whether a pilot passes a specific test or an evaluation of the degree to which a team showed good coordination). The benefit of such a study is that you will actually learn what you want to know, that is, your evaluation of how well judges agree will be accurate. Unfortunately, such data collection can be rather expensive. Practical constraints may force a less informative design.

209

Performance Assessment in Simulation

TABLE 9.1 Ratings from Three Judges on Ten Targets Judge Target

1

2

3

1 2 3 4 5 6 7 8 9 10 M

5 4 3 5 2 4 2 1 5 4 3.5

5 3 3 4 1 3 2 1 4 4 3.0

5 3 3 5 3 3 2 2 5 5 3.6

SD

1.43

1.33

1.26

At a minimum, we have to have at least two judges and some number of targets; the minimum possible number of targets is two, but some larger number is really needed for the calculations to be meaningful: say ten targets for the sake of argument (see Flack et al., 1988; Sim & Wright, 2005, for choosing the number of targets). Table 9.1 shows hypothetical data for three judges (represented as columns) on ten targets (represented as rows). The judges have recorded their judgments in the form of numbers ranging from 1 to 5. These judgments can be thought of either as categorical (polite, assertive, etc.) or quantitative (e.g., 1 = poor to 5 = excellent) for purposes of illustration.

Qualitative versus Quantitative Qualitative ratings are categorical. An example of a categorical rating is one in which the judge merely indicates whether a pilot passes or fails a simulated task. Sorting teams of pilots on the basis of their working style into groups with labels such as cooperative, confrontational, rational, and polite would be another example. Qualitative ratings are labels applied to performance that either indicate group membership or falling above or below some threshold to indicate passing or failing in a task. Quantitative ratings indicate the magnitude or degree of something. Ratings made on a scale from “poor” to “excellent” indicate increasing proficiency and can be assigned numbers (e.g., 1 to 5) that correspond to degree of proficiency. Some argue that judges’ ratings are not measures in the same sense as measures provided by thermometers or airspeed indicators (for more detail on the issues, see Annett, 2002). Regardless of one’s position on the issue, we feel that it is useful to act as if ratings are quantitative measures because the results of doing so are helpful in practice.

210

Human Factors in Simulation and Training

A Qualitative Index If the judgments are categorical (qualitative), then disagreement can be quantified using percentage agreement and other indices of association. Although percentage agreement is easy to compute and (on the face of it) easy to interpret, it can be misleading. Suppose that one judge assigns “pass” to 100% of the targets, and another judge assigns “pass” to 80% of the targets. Then percent agreement will be 80%, which looks good on the face of it. However, because there is no variance in the first judge’s ratings, there is no statistical association between the two sets of judges’ ratings. The statistic we recommend for the analysis of categorical judgments is called Cohen’s kappa or just kappa for short (Cohen, 1960). Kappa adjusts percentage agreement for chance agreement, so that the agreement is reduced if a large amount of it would be expected by chance. Suppose that the data in Table 9.1 were categorical. To compute kappa, we would first compute a contingency table that shows the agreement in assignment to categories (see Table 9.2). Note that for the first target, all three judges agreed that the target was a “5”; so “3” is recorded in the fifth column. For the second target, two of the judges called it a “3” and one called it a “4,” so a “2” is written in the third column and a “1” is written in the third column and a “1” is written in the fourth column. The rest of the rows proceed in a similar manner. The formula for kappa is K=



P( A) - P( E ) , 1 - P( E )

where P(A) is the proportion of times that judges agree, and P(E) is the proportion of times that we would expect the judges to agree by chance (Siegel & Castellan,

TABLE 9.2 Contingency Table Category Target 1 2 3 4 5 6 7 8 9 10 Cj pj

1

2

3

4

2 3

1 1

1

1

2

3 1

5

Si

3

1 0.33 1 0.33 0 0.33 1 0.33 0.33 0.33

2

1 2

1

2 1 8 0.267

3

5

8

1 2 6

0.1

0.167

0.267

0.2

211

Performance Assessment in Simulation

1988, p. 285). Note that, in general, we have k judges, N targets, and m categories. In our example, we have k = 3 judges and N = 10 targets for a total of kN, or 30 judgments. To compute chance agreement, we hypothesize that row frequencies will be proportional to column totals. We find column totals by adding across the rows for each column. The totals are shown in the second-to-last row of Table 9.2, labeled Cj. To find the proportion of judgments in each category, we divide the column totals by the total number of judgments, that is, Cj/kN. The result is shown in the last row of Table 9.2, labeled pj. The proportion of agreement expected by chance is P( E ) =



å

j 1

p2j

= 0.12 + 0.1672 + 0.2672 + 0.22 + 0.2672 = 0.22. Now we need to compute the proportion of agreement, P(A). One way to do so is to first compute agreement for each target, Sj. We can do so with the following equation: S1 =



1 k (k - 1)

å

m 1

n1 j (n1 j - 1)

=1/(3)(2)[0 + 0 + 0 + 0 + (3)(2)] = 6/6 = 1. The observed proportion of agreement among the judges is average agreement over targets: P( A) =





P( A) =

1 N

å

N 1

Si

1 + 0.33 + 1 +  + 0.33 + 0.33 = 0.50. 10

Finally, our value of kappa is

K=

0.5 - 0.22 = 0.36, 1 - 0.22

which is a rather modest level of agreement. Quantitative Indices Is the Difference in Means Meaningful? There are several indices of interjudge reliability that we can use when the ratings are quantitative. If the difference in means among judges is not meaningful or not

212

Human Factors in Simulation and Training

important, we can use the correlation coefficient or a certain type of intraclass coefficient (the fixed case). On the other hand, when the difference in means among judges is meaningful, we can use another intraclass coefficient (the random case). We will define the indices and explain the reasons for the choices among indices as we go along. Correlation Coefficient Suppose that the data in Table 9.1 are quantitative and indicate instructor ratings of the level of proficiency of targets in completing a task in a simulator. One index of the degree to which judges agree about the relative standing of targets is the correlation coefficient, sometimes called Pearson’s r (e.g., Guion, 2011, p. 240): r=



åz z

X Y

N

,

where zX =



X -MX , SD X

and N is the number of pairs (targets), X is the raw score given by a judge, M is the mean rating for a judge, and SD is the standard deviation (sample, not population estimate) of the judge’s ratings. The correlation coefficient is computed once for each pair of judges and indicates the degree to which the judges’ scores rise and fall together across targets. The correlations among the three judges’ scores are shown in Table 9.3. All three correlations are quite large, indicating substantial agreement among the judges. Notice, however, that, as shown in Table 9.1, Judges 1 and 3 tended to give higher scores than Judge 2. The correlation can be quite high even though the judges have very different mean ratings. The ratings from a very lenient judge and a very severe judge will be highly correlated so long as they both tend to rate the same targets relatively high and relatively low, even though they do not agree on the specific numbers that are assigned to a target. Note that the correlations appear to indicate higher agreement among the judges than did kappa. This is because, if the judgments are categorical, any disagreement is as substantial as any other disagreement. (If

TABLE 9.3 Correlation Matrix Judge

1

2

1

1

2

0.93

1

3

0.86

0.86

3

1

Performance Assessment in Simulation

213

it misses, an inch is as good as a mile.) The correlation, however, essentially gives credit for being close. For research purposes, the correlation coefficient is often used to show the degree of association between two variables. In such a context, a difference in means is often of no importance. However, in applied contexts, we typically collect data to make decisions about people. In such a circumstance, differences in means across judges become very important. For example, one judge may pass or certify a performance that another judge would fail. Clearly, disagreements of this nature would be important. Intraclass Correlations Intraclass correlations (ICCs) can estimate how well judges agree with one another while taking mean differences into account. There are several different ICCs that can be computed. All of them are related to the analysis of variance (ANOVA). We will illustrate the use of two of these. To compute the ICCs, we first need results from an ANOVA in which the ratings are the dependent variable, and the judge and target are the independent variables. Table 9.4 shows the way in which the data would be input to a computer, and Table 9.5 shows the ANOVA results. Notice that we have labeled one sum of squares BMS for between targets mean square, another is labeled JMS for between judges mean square, and the last is labeled EMS for error mean square. In this design, there is only one observation per cell, so the interaction and error terms are not separately estimable (Shrout & Fleiss, 1979). Shrout and Fleiss (1979) described the computation of two general classes of ICCs, namely, random and fixed. The typical ANOVA interpretation of random and fixed effects would be that random ICCs consider judges to be sampled from some larger population, whereas the fixed ICCs consider the judges in the study to be sampled from only the judges of interest. However, the main difference between the two formulas is the way in which mean differences in judges are handled. The random ICCs reduce the index of agreement for differences in means across judges, but the fixed ICCs do not. Therefore, the choice of ICCs is better informed by the way in which data will be collected and used in practice. If, in the actual use of the ratings (that is, when the simulator is used for actual training, performance evaluation, etc., and the judges’ evaluations actually count), the same judge or judges evaluate all targets, then a fixed ICC should be used. On the other hand, if different judges evaluate different targets, then a random ICC should be used. In most practical applications, there are multiple judges, and each target is rated by only one judge. Therefore, the random ICC will usually apply in practice. In our current study, we have three judges (the jargon is that k = 3). We can estimate the reliability of a single judge, or we can estimate the reliability of the average of all three judges. We can do this for both the random- and fixed-effects cases. The computations are illustrated in Table 9.6 for all four possibilities (random versus fixed case, and one versus three judges). Notice that reliability estimates are larger for the fixed effects case than for the random effects case. This is because the JMS term appears in the denominator of the random effects estimates but not for the estimates in the fixed effects estimates. The JMS term is the estimate of variance due to

214

Human Factors in Simulation and Training

TABLE 9.4 Data Layout for ANOVA and ICC Computation Rating

Judge

Target

5.00

1

1

4.00 3.00 5.00 2.00 4.00 2.00 1.00 5.00 4.00 5.00 3.00 3.00 4.00 1.00 3.00 2.00 1.00 4.00 4.00 5.00 3.00 3.00 5.00 3.00 3.00 2.00 2.00 5.00

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3

2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9

5.00

3

10

TABLE 9.5 ANOVA Summary Table Source

Sum of Squares

Degrees of Freedom

Mean Square

Label

Judge

2.07

2

1.03

JMS

Target

44.97

9

5.00

BMS

3.93

18

0.22

EMS

Judge*Target

215

Performance Assessment in Simulation

TABLE 9.6 Computation of Intraclass Correlations Formulas

Numbers

One random judge BMS - EMS BMS + (k - 1)EMS + k ( JMS - EMS )n

5 - 0.22 = 0.84 5 + 2(0.22) + (3(1.03 - 0.22) / 10)

One fixed judge BMS - EMS BMS + (k - 1)EMS

5 - 0.22 = 0.88 5 + (2)0.22

All k (three) random judges BMS - EMS BMS + ( JMS - EMS )n

5 - 0.22 = 0.94 5 + (1.03 - 0.22)10

All k (three) fixed judges BMS - EMS BMS

5 - 0.22 = 0.96 5

differences in means among the judges. The difference in results between the fixed and random cases will depend on the size of the differences in mean ratings from the different judges. Number of Judges Note also in Table 9.6 that increasing the number of judges from one to three increases the reliability estimate. As we noted previously, the typical case in practice is for a single judge to rate each target. That single judge is usually not the same for all targets. In such an instance, the reliability estimate that would apply is the estimate for one random judge. In our study, the value for that estimate was 0.84, which is respectable for some purposes. However, we might want a better reliability if the rating has serious consequences for target (e.g., if the rating causes the target to lose time on the job). We can use a variant of the Spearman-Brown formula to estimate the number of judges we would need to obtain any given level of reliability. The formula we need is (Shrout & Fleiss, 1979) m=



r *(1 - r L ) , r L (1 - r *)

where r L is our estimate of one (random, in our case) judge, r * is our aspiration level, and m is the resulting number of judges. We have to round off m to the next highest integer, as judges cannot be expressed in fractions. Suppose we wanted to achieve a reliability of 0.90; we would find that

m=

0.9(1 - 0.84) = 1.71 @ 2. 0.84(1 - 0.9)

216

Human Factors in Simulation and Training

Therefore, we would need two (random) judges to assure a reliability of at least 0.90. To recapitulate the distinction between fixed and random judges in practical applications, suppose we have a pool of five instructors, of whom two will be available to rate each crew on each simulation, but the same two instructors will not always be paired; the random judges formula would then apply because differences in the judges’ means would influence the ratings. On the other hand, suppose we have only two instructors available, and these two must evaluate each and every crew. In that case, the fixed judges formula would apply because differences in means among the judges would not influence the ratings. There is also a third possibility, that is, there is some absolute standard that is based on the ratings (e.g., there is a numerical scale from 1 to 5, with a passing point of 3). In such a case, the calibration of the judges becomes of interest. However, we are unaware of well-developed psychometric approaches to evaluating such a calibration. In such instances, calibration would hinge upon having a gold standard of performance, and these are rarely available in practice. If such instances were widely available, then human judgment would probably be unnecessary. Other Designs for Assessing Agreement among Judges For studying interjudge reliability, we advocate that each judge evaluates each and every target. Such designs present logistic challenges for researchers, particularly in terms of getting multiple judges together. It is often possible to make recordings (e.g., videotape) of targets’ responses to simulations to ease the burden of gathering multiple judges at once. Once recorded, the target can be evaluated more or less at the judge’s leisure. More complex designs in which some nesting of targets within judges can also be used (see Crocker & Algina, 1986; Cronbach et al., 1972; DeShon, 2002; Shavelson & Webb, 1991). However, you will probably have to hire a statistician to analyze the data and compute the reliability estimate. There are also methods that can be used when each judge sees only a single target. Such methods include ICC(1) (Shrout & Fleiss, 1979) and rwg (James et al.,1984). However, we recommend against using such methods because they have serious flaws as indices of agreement between judges. ICC(1) can be used in situations where judges can be grouped in some way and differences can be compared across groups; essentially, this amounts to using ANOVA between subjects to estimate reliability. The problem with this design is that we do not know if the differences come from the targets or the judges (or both) because different judges see different targets. The rwg method compares a distribution of judges’ ratings having variance against a distribution where scores are distributed uniformly. However, this is not a method that provides reliability estimates (Kozlowski & Hattrup, 1992). Enhancing Reliability Reliability of measurement is essential for the measures to be useful in training or performance evaluation (e.g., Baker & Salas, 1992, 1997). The main approaches to improving the reliability of subjective performance measures are (1) increasing the number of judges, (2) changing the task, and (3) training the judges. We have already described how to estimate the number of judges needed to obtain any desired level

Performance Assessment in Simulation

217

of reliability. Unfortunately, increasing the number of judges is often prohibitively expensive. A single judge may be all that is available. Changing the Task An excellent discussion of rating scales and formats can be found in Guion (2011, pp. 449–465). As Guion noted, the consensus of researchers in this area is that format effects due to appearance are small. That is, it makes little difference whether the scales are shown horizontally or vertically, whether the scales have five or seven response categories, or whether numbers or words are used to label the response options. This is not to say that careful scale development is not important. It is important because it specifies the task (the content of the items) ultimately given to the judge. Reliability among judges can be improved by making the judges’ task easier. One way to do this is to make the ratings more easily observable, quantitative, and behavioral. That is, relatively concrete, observable behaviors are easier to evaluate or record than relatively abstract concepts that must be inferred from subtle patterns of behavior. It is easier to count incidents of team members shouting at one another than it is to infer the degree of hostility being felt in the same team. Although reliability is increased by making the judges’ task simpler, there is often a price to be paid for this simplicity. Often the simpler ratings are deficient, meaning that they do not fully capture the construct that was intended. Hostility can be expressed in many different ways, of which shouting at team members is only one. If the judges consider only shouting, then their reliability (agreement) should be good. However, the resulting measure will be an index specifically of shouting and will be a deficient measure of hostility. Rater Training Rater training is often a good method to use to improve agreement between judges. There are many kinds of rater training available. Probably the current favorite is frame of reference training (Bernardin & Beatty, 1984; Roch et al., 2012). Good training programs involve training in attention (teaching the set of observed behaviors relevant to the construct of interest) and standards for evaluation (Bernardin & Buckley, 1981). For example, suppose we are interested in crew decision-making. During a simulated mission, the crew encounters an equipment problem (say a boost pump failure). How does the crew handle this problem? Training might include attending to how long it takes the crew to break out the checklist, whether they skip steps in the checklist, whether there is concurrence among crew members on a specific step, and so forth. Training on the evaluative part might include what behaviors indicate satisfactory, above-average, and below-average performance on the problem. We recommend training that is specific to the simulation and evaluation form being used. Generic training, or training aimed at reducing common rating errors, is not likely to improve the reliability of judges’ ratings. Table 9.7 shows some possible formats for items dealing with decision-making. Some of the items are very simple checklist items such that the judge merely records whether the behavior was observed. Another item asks the judge how the crew did on the event (boost pump failure) as a whole. The final item asks about

218

Human Factors in Simulation and Training

TABLE 9.7 Example Rating Scales Checklist Items Check the Appropriate Box for Each Item

Yes

No

1. Checklist out within 30 seconds of boost pump failure

[]

[]

2. Checklist complete within 2 minutes of start of checklist 3. Problem fuse correctly identified Global evaluations Circle the number that indicates your opinion

1. Overall handling of boost pump failure

[] [] [] [] 1 = Poor 2 = Below average 3 = Average 4 = Above average 5 = Excellent 12345

2. Decision-making

12345

decision-making in general. Notice how the judge’s cognitive work increases as the generality of the item increases. For the specific checklist items, the judge just has to attend to a specific behavior and record what happens. For the boost pump item, the behaviors are circumscribed by the boost pump event, but the judge must evaluate the proficiency of the crew, which involves comparing the behaviors of the crew to some standards of quality of performance. The final item asks the judge to recall and integrate behaviors from multiple events into an overall evaluation of decisionmaking. In addition to the boost pump failure, there may be other events built into the scenario. The judge has to remember how the crew handled the other events and somehow aggregate multiple behaviors before rendering an overall judgment. As we mentioned previously, as scales become more specific, reliability increases, but so does deficiency.

SPECIAL PROBLEMS WITH SIMULATORS The Gouge Many kinds of training and evaluation involving simulators require one or more scenarios that are developed to evaluate or teach specific things. Once developed, the scenarios are typically used for a period of months or even a year or more. As people experience the scenarios, they may tell others what they encountered. After a period of time, those encountering the simulation are not at all surprised by what happens during the simulation. When some people can prepare in advance, but others cannot, the measures obtained will most likely mean different things for the two groups because they are being measured under different conditions. This can be a problem, particularly after the scenario becomes well known.

Performance Assessment in Simulation

219

One potential solution to such a problem is to develop rapidly reconfigurable event sets. The idea is for the instructor or simulator operator to quickly change the scenario in such a way that the participants will not know what to expect. The benefit is that the participants cannot avoid a proper evaluation by preparing for a specific event. There are disadvantages to such an approach as well. Time and effort are required to develop multiple scenarios intended to tap similar skills and behaviors. Perhaps more seriously, there is some doubt about whether different events can provide comparable information. For example, in the assessment center literature, exercises that appear to tap similar skills produce scores that are not highly correlated with one another. It is difficult to guess how participants will respond to an event built into a scenario. There is no empirical basis to guide the choice of events, so a successful outcome for rapidly reconfigurable event sets is uncertain.

Instructor Attitudes Psychologists typically have an inductive approach to decision-making. They like to collect the relevant behaviors, evaluate them, and come to a decision about a person. So, for example, if psychologists have to certify a person as competent to complete a task, they will first analyze the task to determine what behaviors need to be completed. Then they will determine conditions for successful performance, that is, define performance standards. They will create the simulation to allow for the proper conditions and behaviors, and then watch the participant in the simulation. They will record and evaluate the participant’s behaviors. If the behaviors meet the predetermined standard, they will certify the person. The recorded data are the basis of the decision. Instructors, like managers, often do not share the psychologist’s mindset. To many, the rating forms do not serve as a source of evidence from which a decision can be made. Rather, the forms serve as documentation of a decision that has already been rendered. Some instructor pilots, for example, feel that the important thing is their word or decision about whether a pilot is competent to fly. To them, the forms are a waste of time; all that is necessary is a “yes” or “no” from the instructor pilot. Such instructors will complete the forms if they are forced to, but one should not expect highly reliable data from such an individual. We recommend that subjective performance assessment forms be developed as much as possible in cooperation with the judges who will be using them. If the judges find them overly burdensome, difficult to understand, or irrelevant to the purpose of the simulation, they will not be motivated to use them properly. Without the cooperation of the judge, reliability of the ratings will be poor.

OBJECTIVE METHODS OF PERFORMANCE MEASUREMENT Some may see objective measures1 of performance as a relatively simple way to assess “true” pilot ability. Objective measures may seem simpler to use and easier to validate, but such is not always the case. There are real limits to the utility of objective data in assessing pilot performance, and the costs associated with getting

220

Human Factors in Simulation and Training

objective data are sometimes prohibitive. On the other hand, objective measures can provide valuable information about pilot performance for certain flight maneuvers; furthermore, the quantitative performance measurement process can be perfectly replicated across pilots, sessions, researchers, simulation platforms, and even in real aircraft. There are situations where subjective measures of performance are not feasible or precise enough for the research at hand. Certain aspects of flight performance, such as “stick and rudder” skill, are arguably best measured with data provided directly by the simulator. Certain tasks, such as instrument approaches, lend themselves to assessment with objective data. This is not to say that subjective performance assessment is of no value in such situations; on the contrary, both measurement approaches should be used when possible, as the two approaches will each provide unique information about different aspects of flight performance. Ironically, using objective measures of flight performance is sometimes more difficult than using subjective measures. Even though there are benefits to using objective data (e.g., reliability, precision, and standardization), there is a potentially high cost involved in obtaining and analyzing objective data. The difficulty involved in using objective measures of performance will vary depending on the flight tasks being assessed—whether or not additional parameters such as team functioning are being assessed—and the simulation platform and software being used. End users of objective performance measures may disagree on the actual meaning of the outcome data or how to use such measures most effectively. In applied situations, pilots, instructors, and managers may be reluctant to use or even record objective measures of performance. The bottom line is that using objective measures in flight performance research is not easy, not simple, and not necessarily superior to the use of subjective methods, but they can certainly add value in many situations. The focus of this section is on the objective measurement of FTE during specific flight maneuvers. Other aspects of pilot behavior and performance, such as workload or situational awareness, are not considered.

AUTOMATED DATA COLLECTION SYSTEMS Many simulation platforms have the capacity for automated data collection. Most simulation packages, either directly or through a third-party add-on, can output flight parameter information at some sampling rate. Output formats vary from text files to graphs of the aircraft’s position through time. Text file output is usually done so that each line of data represents a given set of flight parameters at a given point in time. Data collection rates are usually flexible and are sometimes set as high as 60 samples per second. Some systems allow the user to specify the parameters to be saved, while others dump a predefined set of parameters. In some cases, instantaneous rate-ofchange data can be collected, allowing the researcher to examine flight performance in terms of “smoothness.” Systems vary in terms of what data can be collected, but usually both flight parameter data (e.g., position, airspeed, and course deviation indicator [CDI] needle deflection) and pilot input data (e.g., control surface input and button presses) can be collected. Also, the data are usually time stamped, thus allowing the researcher to

Performance Assessment in Simulation

221

coordinate specific events and activities across data collection platforms. These time stamps facilitate synchronizing flight data with other sources of data such as cockpit video and audio recordings and eye-tracking information. A word of caution to the researcher: not all data collection programs will use the same reference clock for time stamping purposes! Data collection software is readily available for the Elite and Microsoft Flight Simulator (MSFS) software packages. Elite sells a data collection module separate from the simulation software, while FlightRecorder is freely available from the internet for use with MSFS. Similarly, Frasca’s latest simulators are configured from the factory to collect flight parameter data. Surprisingly, some of the more advanced simulation systems that we have worked with do not readily make such data available, but usually these data can be accessed with some minor programming. In some situations, physically getting the data from the simulator is easier than getting permission from the manufacturer to access the data in the first place. Most data collection programs are designed to record raw data, which often results in a rather lengthy data file. To date, we have not seen a commercially available product that will scientifically analyze flight performance, but we have seen custom-made packages that analyze FTE in real time given specific flight criteria. Unfortunately, these packages are not readily available, leaving most researchers with the lengthy data files saved by the commonly available data collection packages.

FLIGHT TECHNICAL ERROR Objective data can be used to generate various measures of FTE. FTE quantifies the degree to which the position or orientation of the aircraft deviates from some ideal state. In other words, FTE metrics focus on the pilot’s ability to make the aircraft attain some predetermined goal specified by an external agency; hence, FTE is probably best used as a measure of “stick and rudder” skill. FTE and pilot performance are certainly not synonymous. There is much more to safely operating an aircraft than FTE can quantify. On the other hand, a pilot’s ability to maintain precise control of the aircraft is a prerequisite for successful operations. Data feeds from simulators are ideal for FTE computations, but deriving meaningful FTE-based performance metrics requires a great deal of planning. The following sections detail various forms of FTE measures and their potential uses.

Deviation-Based Metrics Deviation metrics are the backbone of most FTE measures. As the name implies, a deviation metric compares a given flight parameter to a specific value for that parameter. For example, the pilot may be instructed to maintain an altitude of 5,000 feet, and actual altitude data collected at some sampling rate are compared to this target value. The difference between the two values is the raw outcome of interest. A similar metric can be computed using positional data. The pilot is instructed to maintain a flight path, specified via Global Positioning System (GPS) or perhaps a

222

Human Factors in Simulation and Training

VOR (very high frequency omnirange) radial, and the actual position of the aircraft is compared to the target flight path.

Root Mean Square Error (RMSE) Deviation data must be aggregated in some way to create a single outcome metric of interest. One of the most common methods of aggregation is to compute the RMSE. It works by first squaring each deviation, averaging these squared deviations, and taking the square root of the average to return to the original units. By doing so, the polarity of the deviations is eliminated, and gross deviations are exaggerated. If an observed flight path has no deviations from the desired flight path, RMSE will equal zero. Given this lower bound of zero, the distribution of RMSE will not be normal and violate a basic assumption for most parametric statistic procedures. To address this issue, RMSE data can be transformed using the natural logarithm function, and this will usually result in a distribution of RMSE data that appears normal (see the data in Figure 9.1). The data in the figure were collected during a flight simulation study where pilot performance was assessed as a function of keeping the aircraft centered on the glide slope during a simulated instrument approach. Deviation from the glide slope was measured using the deviation (in dots) of the glide slope needle on the CDI, where a value of zero dots indicates perfect alignment with the glide slope. The raw data are presented in the left panel of the figure and clearly show the skewed nature of RMSE (this skewed shape can be reproduced with virtually any RMSE data). On the right, the data were transformed using the natural log function, producing a more normal distribution. This allows the use of ANOVA and other general linear models to be applied to the performance data. A problem with such transformations is that

FIGURE 9.1  Illustration of raw RMSE data (left panel) and transformed RMSE data (right panel).

Performance Assessment in Simulation

223

they obscure the meaning of the data so that the transformed outcome measures are no longer directly interpretable. This is especially problematic for deviationbased metrics expressing FTE in terms of average deviation and so forth; but nontransformed RMSE values have limited real-world interpretability and are not negatively impacted through transformations. RMSE cannot be interpreted as a simple average deviation because the square transformation gives more weight to gross deviations. Similarly, RMSE values cannot be simply compared to prescribed performance standards, such as the Federal Aviation Administration (FAA) practical test standards (PTS) to determine whether some criterion level of performance was met. On the other hand, RMSE is a very sensitive measure of flight performance and is more likely than other indices of performance to show differences across different instrumentation systems, training programs, or other manipulated scenarios. Changes in performance can be assessed using RMSE data but observed differences in RMSE values across groups or treatments are not directly interpretable. This issue can be addressed by reporting effect sizes as opposed to just reporting means and F tests (Cohen, 1994). Researchers should consider using either Cohen’s d, which expresses group mean differences in terms of pooled standard deviation units, or omega squared, which is an estimate of variance accounted for in the outcome metric by some manipulated variable.

Number of Deviations and Time-Outside Standard Rantanen and Talleur (2001) discussed several other deviation-based performance metrics. The number of deviations (ND) outside of a tolerance indicates the pilot’s ability to control the velocity of the aircraft during a tracking task. It is like counting the number of times a car runs off a racetrack. Low ND values are desired but may be misleading because the aircraft could stray outside tolerance only once but stay outside of the desired flight path for the duration of the flight. As such, ND value must be interpreted considering the amount of time spent outside the tolerance, that is, time in deviation (TD). TD values add information beyond RMSE and ND, and low TD values reflect higher levels of performance.

Time within FAA Practical Test Standard The notion of objective criterion performance measurement can be taken a step further by comparing the amount of deviation to some preset standard of performance. For example, a researcher could use FAA practical test standards to define acceptable and unacceptable levels of deviation. The time within standard (TWS) metric is conceptually similar to the TD metric discussed earlier, the main difference being that TWS focuses on the amount of time spent within a standard, and TD focuses on the amount of time spent outside the standard. The goal of the TWS metric is to quantify performance relative to known and accepted standards in such a way that the metric is conceptually friendly and directly interpretable.

224

Human Factors in Simulation and Training

The FAA has dictated standards of performance for the various pilot ratings and endorsements for various phases of flight. For example, the ATP (airline transport pilot) standard for an instrument approach is ¼-scale deflection on the localizer and glide slope, and +/– 5 knots on airspeed. In contrast, the standard for an instrument approach under the instrument rating test standards is ¾-scale deflection on the localizer and glide slope, and +/10 knots on airspeed. These criteria were designed to be applied by instructor pilots but can easily be applied to objectively measured flight data. The end results are measurements of the proportion of time spent within either the ATP or instrument rating standards, whether or not a specific landing approach met the ATP or instrument rating standard, and, at the group level, the proportion of pilots in a sample that performed at the ATP or instrument rating level. TWS data can be used in many ways. For example, the TWS for a single maneuver being performed by a single pilot can be established and used to provide easily understood feedback to the pilot (e.g., 75% of the last approach was within ATP standards). Such data could be compared across trials to establish the learning curve for a particular pilot (see Mengelkoch et al., 1971) or to evaluate changes in performance over long periods of time (see Taylor et al., 2007). To enhance the quality of the feedback, TWS data can be compartmentalized across various flight parameters, such as glide slope tracking, localizer tracking, airspeed, etc. TWS data can also be used to estimate the proportion of a specific pilot population that can fly to a specified FAA standard by quantifying maneuver performance in a binary fashion. A pilot who completes a maneuver with a TWS score less than 1 would be failed on that maneuver. Once a group of pilots has been tested and scored on that maneuver, it is easy to compute the proportion of the sample that was able to successfully complete the maneuver. These data can be generalized to some broader population overall, assuming that random sampling procedures are followed. Such estimates can be compared across treatment groups to determine the real-world impact of a system on average pilot performance. This kind of treatment effect is directly interpretable and will likely be preferred by program sponsors and the aviation sector in general. In either use, the advantage of TWS over RMSE is that the participant and end users (including program sponsors) can examine the TWS numbers to determine whether the impact of a treatment or intervention is practically meaningful. Although the TWS metric has the benefit of being easily and directly interpreted, it does have the drawback of being prone to ceiling effects. This is especially likely when the standard being applied is an easy standard or if the pilots under evaluation are high performers. In such cases, a more sensitive measure, such as RMSE, will be required to differentiate the performances. On the other hand, if performance in some area by some population of pilots is already at a high level, various interventions designed to improve pilot performance cannot have a practical positive impact. Another drawback to the TWS metric is that it is not a sensitive measure of performance, meaning that small improvements in performance or heterogeneous improvements in performance are not likely to be detected when using this metric even if the improvement in performance is real. Where TWS is likely to detect differences across groups of pilots or across system implementations is when

Performance Assessment in Simulation

225

experimental conditions designed to increase workload and/or stress are added to the study design. Under “normal” flight operations, most recreational and almost all professional pilots can obtain perfect TWS scores. Maintaining perfect TWS during stressful situations such as troubleshooting in-flight emergencies, managing heavy traffic, or flying through inclement weather is much less likely. Thus, two treatment conditions can be compared by examining the drop in TWS scores as flight conditions move from easy to difficult. Any intervention that allows pilots to maintain higher TWS scores during difficult flight scenarios can arguably be deemed as being operationally superior (all other factors, such as cost, being equal). The researcher may choose to examine specific phases of flight (e.g., takeoff, cruise, approach, and landing) or performance during specific maneuvers (e.g., turns about a point, stalls, and level flight). Objective measurement techniques based on real-time data dumping allow the researcher to examine specific phases of flight or even segment-specific maneuvers to decompose performance across the various components of a maneuver or phase of flight. It is important to remember that different phases of flight require different KSAOs (knowledge, skills, abilities, and other characteristics) to complete successfully, and knowing that a pilot is able to fly an instrument approach well does not necessarily mean that he or she will perform well in other phases of flight. Similarly, changes in training or aircraft instrumentation may impact performance differently across different maneuvers or phases of flight. Systems designed to enhance performance under IFR (instrument flight rules) conditions may not enhance performance under VFR (visual flight rules) conditions.

NON-FTE MEASURES Not all aspects of pilot performance can be directly measured via positional or orientation data. There are several pilot tasks that can be measured via automated data collection mechanisms that can provide additional information about pilot performance.

Rates of Change Most simulators can provide rate of change data associated with roll, pitch, and altitude. Some aircraft operators may wish to establish criteria about how fast the pilot changes the orientation of the aircraft to enhance passenger comfort, or more aptly, reduce the likelihood that passengers become airsick. The maximum rate of change can be extracted for a given maneuver and compared to some maximum allowed rate. If the pilot exceeds the maximum allowed rate, the pilot will fail that maneuver. If performance is being scored on a point system, points might be deducted for exceeding the maximum rate. Operators may also wish to evaluate vertical velocity at touchdown in the interest of equipment longevity and passenger comfort. Harsh landings (i.e., touchdowns with a high vertical speed component) are not only uncomfortable for passengers but can also cause structural damage to the aircraft. Some operators use onboard flight recorders to evaluate real-time flight data following incidents in flight (such as

226

Human Factors in Simulation and Training

passenger injury or aircraft damage). These same protocols can be used in simulation-based pilot performance evaluation.

Control Input Some flight scenarios require quick and decisive action by the pilot, such as a specific button or switch activation, throttle adjustment, or yoke input. Other research projects might wish to investigate the frequency or magnitude of surface controls or the sequence of control activations. Again, most simulators can provide such data. Switch and button activation are usually recorded in terms of button or switch position for a given time frame, whereas control surface and power plant settings are typically given as numbers on some scale. The key to using such data is to construct scenarios that require specific inputs relative to specific events. The analyst must know when the event was initiated, and the response made by the pilot. Control input data can also be used to flag segments in a flight scenario. For example, the researcher may be interested in FTE on approach once flaps are dropped to 10°. Switch activation data are especially useful when used to evaluate human interface design for communication and navigation equipment. Such evaluations can be performed using the “virtual” devices included as part of the software, where the pilot uses a keyboard or mouse to control the virtual device. More sophisticated simulators will use physical switches to control the function of virtual equipment. Similarly, physical mock-ups of radio stacks are commercially available, allowing the pilot to interface with a set of physical communications equipment. If funding is not a problem, the researcher can procure a specific avionics package and interface the package into the flight simulation software. This usually requires the construction of both hardware and software interfaces to allow the equipment to communicate with the software.

SUMMARY There are a variety of ways to define and measure pilot performance. The desired usage of the data should drive the choice of measurement. Aircraft operators and pilot training centers tend to prefer subjective measures provided by SME (subject matter experts) evaluations of performance, whereas researchers tend to prefer objective measures. Each has its strengths and weaknesses, and the optimal choice is not always clear. Regardless of the method chosen, steps should be taken to ensure that the measure is psychometrically sound. Whenever human judges provide ratings, disagreements among judges are very likely, and efforts must be made to understand the magnitude of such disagreements, that is, we must estimate reliability. If the magnitude of such disagreements is large, then action must be taken to improve reliability. Although reliability is seldom a problem with objective measures (at least when it comes to the calibration of the machine; the pilot’s consistency of performance may be another matter), the researcher should be careful to properly interpret observed

Performance Assessment in Simulation

227

differences. It is easy to overstate the practical significance of observed differences in FTE measures; we suggest that the researcher translate any observed differences in performance to real-world consequences (e.g., reduced risk and enhanced passenger comfort).

NOTE 1. The phrase “objective measures” in this chapter refers specifically to measurement based on flight data provided by the simulator.

REFERENCES Annett, J. (2002). Subjective rating scales: Science or art? Ergonomics, 45, 966–987. Baker, D. P., & Salas, E. (1992). Principles for measuring teamwork skills. Human Factors, 34, 469–475. Baker, D. P., & Salas, E. (1997). Principles for measuring teamwork: A summary and look toward the future. In M. T. Brannick, E. Salas, & C. Prince (Eds.), Team performance assessment and measurement: Theory, methods, and applications (pp. 343–368). Mahwah, NJ: Erlbaum. Bernardin, H. J., & Beatty, R. W. (1984). Performance appraisals: Assessing human behavior at work. Boston: Kent Publishing. Bernardin, H. J., & Buckley, M. R. (1981). Strategies in rater training. Academy of Management Review, 6, 205–212. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003. Coulacoglou, C., & Saklofske, D. H. (2017). Psychometrics and psychological assessment: Principles and applications. Cambridge, MA: Academic Press. Crocker, L., & Algina, J. (1986). Introduction to classical & modern test theory. New York: Holt, Rinehart, & Winston. Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements. New York: Wiley. DeShon, R. P. (2002). Generalizability theory. In F. Drasgow & N. Schmitt (Eds.), Measuring and analyzing behavior in organizations: Advances in measurement and data analysis (pp. 189–202). San Francisco, CA: Jossey-Bass. Flack, V. F., Afifi, A. A., Lachenbruch, P. A., & Schouten, H. J. (1988). Sample size determinations for the two rater kappa statistic. Psychometrika, 53, 321–325. Guion, R. M. (2011). Assessment, measurement, and prediction for personnel decisions (2nd ed.). New York: Routledge. Hallgren, K. A. (2012). Computing inter-rater reliability for observational data: An overview and tutorial. Tutorials in Quantitative Methods for Psychology, 8(1), 23. James, L. R., Demaree, R. G., & Wolf, G. (1984). Estimating within-group interrater reliability with and without response bias. Journal of Applied Psychology, 69, 85–98. Kenny, D. A. (1991). A general model of consensus and accuracy in interpersonal perception. Psychological Review, 98, 155–163. Kozlowski, S. W. J., & Hattrup, K. (1992). A disagreement about within-group agreement: Disentangling issues of consistency versus consensus. Journal of Applied Psychology, 77, 161–167. Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87, 72–107.

228

Human Factors in Simulation and Training

Landy, F. J., & Farr, J. L. (1983). The measurement of work performance: Methods theory and applications. New York: Academic Press. McCambridge, J., Witton, J., & Elbourne, D. R. (2014). Systematic review of the Hawthorne effect: New concepts are needed to study research participation effects. Journal of Clinical Epidemiology, 67(3), 267–277. https://doi​.org​/10​.1016​/j​.jclinepi​.2013​.08​.015 Mengelkoch, R. F., Adams, J. A., & Gainer, C. A. (1971). The forgetting of instrument flight skills as a function of the initial level of proficiency (Report No. NAVTRA DEVCEN 71-16-18), Port Washington, NY: U.S. Naval Training Center. Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill. Rantangen, E. M., & Talleur, D. A. (2001). Measurement of pilot performance during instrument flight using flight data recorders. International Journal of Aviation Research and Development, 1(2), 89–102. Roch, S. G., Woehr, D. J., Mishra, V., & Kieszczynska, U. (2012). Rater training revisited: An updated meta‐analytic review of frame‐of‐reference training. Journal of Occupational and Organizational Psychology, 85(2), 370–395. Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage. Shrout, P. E., & Fliess, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420–428. Siegel, S., & Castellan Jr., N. J. (1988). Nonparametric statistics for the behavioral sciences. New York: McGraw-Hill. Sim, J., & Wright, C. C. (2005). The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Physical Therapy, 85(3), 257. https://doi​.org​/10​.1093​/ptj​ /85​.3​.257 Taylor, J. L., Kennedy, Q., Noda, A., & Yesavage, J. A. (2007). Pilot age and expertise predict flight simulator performance: A 3-year longitudinal study. Neurology, 68(9), 648–654. Wigdor, A. K., & Green B. F. (Eds.). (1991). Performance assessment for the workplace (Vol. I). Washington, DC: National Academy Press.

10

Performance Measurement Issues and Guidelines for Adaptive, SimulationBased Training Phillip M. Mangos and Joan H. Johnston

CONTENTS Introduction............................................................................................................. 229 Research Advances................................................................................................. 230 Adaptive SBT Implementation............................................................................... 231 A Confirmatory Performance Measurement Framework for Adaptive SBT.......... 233 Dimensions and Essential Characteristics of Performance Measures.................... 236 Validity........................................................................................................... 236 Criterion Relevance....................................................................................... 239 Reliability...................................................................................................... 239 Measure Invariance........................................................................................240 Objectivity and Intrusiveness......................................................................... 242 Diagnosticity.................................................................................................. 243 Measurement Principles for Adaptive Training......................................................244 Principle 1: Ensure that Performance Measure Development Is Guided by Sound Theory................................................................................................. 247 Principle 2: Consider and Exploit Measurement Affordances................................ 249 Principle 3: Ensure Usefulness of Measures for Evaluating Training Effectiveness.................................................................................................. 251 Summary and Conclusions..................................................................................... 251 References............................................................................................................... 252

INTRODUCTION In our original chapter we sought to advance discussion of applying psychometric principles to learning assessment in adaptive simulation-based training (SBT) (Mangos & Johnston, 2009). We discussed how the training potential of simulation DOI: 10.1201/9781003401360-10

229

230

Human Factors in Simulation and Training

systems is increased by focusing on their capability for learning assessment and that important training outcomes can be achieved by treating an SBT system as a tool for generating customized training content around a core of embedded, potentially highfidelity assessment “items.” Specifically, more accurate diagnoses about the causes of suboptimal performance would allow a higher proportion of training time focused on correcting unique skill deficiencies; thus achieving desired performance levels in less time, and improving generalization and maintenance of trained skills to job settings. Our discussion led to a set of guidelines for the development and evaluation of performance measures in adaptive SBT based on a theory-driven, confirmatory performance measurement framework. We explored the essential characteristics of performance measures within the context of this framework that included supporting a confirmatory strategy for measuring and evaluating performance in adaptive training contexts, and sound inferences regarding hypothesized relationships among training objectives, performance episodes, and outcomes. We discussed several dimensions along which performance measures can vary, described desirable characteristics of measures along each dimension, and introduced a set of principles articulating how individual performance measures can be used to support adaptive training. This chapter updates our initial effort by briefly discussing research trends and advances in the psychometrics of assessment, and provides exemplars of adaptive SBT implementation and updated guidelines. It is beyond the scope of this chapter to fully explore these topics; therefore, comprehensive references are provided for further exploration.

RESEARCH ADVANCES Progress on adaptive SBT capabilities could not have been achieved without the 40+ years of research and development invested by the US Department of Defense. Two notable programs of research that began in the 2000s focused on rapid advancement of adaptive learning assessments, psychometric accuracy of assessments, and nonproprietary, government-owned assessment authoring software. In 2003, the Defense Advanced Research Programs Agency (DARPA) initiated the DARPA games for a training program (DARWARS) with the goal of creating a vast on-demand, multiuser, online military adaptive SBT environment (Chatham, 2007; O’Neil, Baker, Wainess, Chen, Mislevy, & Kyllonen, 2004). Ultimately, DARWARS succeeded in advancing intelligent tutoring system technologies and game-based learning (Bell, Johnston, Freeman, & Rody, 2004; Chatham, 2007; Smith & Bowers, 2019). The DARWARS Ambush! trainer prototype enabled the US Army to embed its Games for Training (GfT) curriculum across Soldier training sites (Roberts & Diller, 2014). The DARWARS tactical language trainer transitioned to operational deployment as the Virtual Culture Awareness Trainer (VCAT), which we describe in more detail later in the chapter (Johnson, 2010; Johnson & Lester, 2016). Starting in 2009, the US Army launched a decade-long program to develop adaptive learning principles (see Durlach and Lesgold (2012) and Spain, Priest, and Murphy (2012)), and a software architecture—the Generalized Intelligent Framework for Tutoring (GIFT) —to instantiate learning principles, accelerate production, and

Performance Measurement Issues and Guidelines

231

reduce the cost of adaptive instruction (Sottilare, 2013). Significant accomplishments include access to free, government-owned, online/cloud-based software for designing, authoring, and implementing adaptive instruction; and authoring low-cost individualized, adaptive SBT, and extensive documentation of research findings through annual symposiums, expert workshops, and design guidelines publications (see giftutoring​.o​rg). Currently, the Army is extending the GIFT architecture to build adaptive SBT for teams (Johnston, Sottilare, Sinatra, & Burke, 2018). Challenges include identifying team competencies and assessments, developing learning assessments that provide a comprehensive and unbiased picture of how a team performed, and devising methods and technologies for reducing and converting raw performance data into meaningful feedback for the team (Johnston et al., 2018). With continued expansion and maturation of learning technologies, the widespread use of sophisticated adaptive SBT for multi-team systems is assured; nevertheless, there are major barriers to implementation. A recent treatise on human–computer interface grand challenges includes a lengthy discussion on the need for research to focus on advancing adaptive learning design processes, metrics, and evaluation tools that accurately assess student learning progress (Stephanidis, Salvendy, Antona, Chen, Dong, Duffy, Fang, et al., 2019).

ADAPTIVE SBT IMPLEMENTATION The implementation of adaptive simulations has greatly expanded and extends well beyond traditional K-12 math and physics courses to more complex skills such as military combat (Domeshek, Ramachandran, Jensen, Ludwig, Ong, & Stottler, 2019; McCarthy, 2008), cross-cultural interactions and foreign language (Johnson, 2010; Johnson & Lester, 2016), and medical skills (see also, Maheu-Cadotte, Cossette, Dubé, Fontaine, Lavallée, Mailhot, et al., 2020). As noted above, the US military has made a major contribution to adaptive simulation development because it recognized the return on investment in delivering more effective training to its active and reserve military 24-7 (Fletcher, 2009; Kulik & Fletcher, 2016; Ma, Adesope, Nesbit, & Liu, 2014). By the 1990s, the US Navy was able to transition early advances in adaptive SBT to develop and maintain critically important sailor skills needed to operate computerized systems in combat ships. Three examples are the Radar System Controller Intelligent Training Aid (RITA ITA), the Anti-submarine Warfare/Antisurface Warfare Tactical Air Controllers (ASTAC) ITA, and the Tactical Action Officer Intelligent Tutoring System (TAO-ITS). The RITA ITA was designed to develop operator skills in conducting standard procedural tasks and responding to information in the radar’s visual display, including recognizing and managing data presentation such as chaff, jamming, and clutter (McCarthy, 2008; McCarthy, Pacheco, Banta, Wayne, & Coleman, 1994). McCarthy et al. (1994) found a majority of sailors reported the RITA ITA improved these skills and expressed a desire that it be incorporated into their curriculum. Following implementation, the Navy reported they were able to increase the throughput of trainees without additional instructors and improve on-the-job performance (McCarthy et al., 1994).

232

Human Factors in Simulation and Training

Following the success of RITA ITA, the Navy developed an ASTAC ITA to address a high sailor dropout rate (greater than 25%) due in part to an outdated training system. An ASTAC provides air control for fixed and rotary-wing aircraft on board Navy ships and the ASTAC ITA was developed to increase practice opportunities on these tasks (McCarthy, Wayne, & Deters, 2013). McCarthy et al. (2013) conducted a study indicating that the Navy’s implementation of several costsavings measures that included the ASTAC ITA likely resulted in a cost avoidance of almost $1 million per year and a lower ASTAC dropout rate. The underlying assessment architecture used in both ITAs is the ExpertTrain™ system (McCarthy et al., 2013). Key simulation events are the learning objectives; the expert performance model assesses trainee responses to the event, compares them to expected actions, updates the learner model, and triggers an instructional response that determines whether to provide feedback to the learner. The ExpertTrain™ learner model was attributed to Murray’s (1991) endorsement-based modeling approach that enables combining different types of behavioral assessments of varying reliability. For example, the ASTAC ITA collects trainee test responses that include event-based computer inputs, speech acts, and practical tests with the simulation after training. The ITA creates a repository of behavioral evidence for each learning objective that enables selecting an array of data to determine mastery of learning objectives, and for adapting the training environment to move ahead. About the time the RITA ITA was under development, the Navy had also determined that better training was needed to improve preparing its Tactical Action Officers (TAO) for combat decision making. It invested in the TAO ITS, which is an adaptive team-training system that uses intelligent agents to play the role of simulated crew members (Domeshek et al., 2019). The tutor evaluates trainee performance in real time, infers whether they used proper tactical principles, and provides real-time feedback and coaching. A learning management system enables instructors to review, evaluate and remediate trainees, provide detailed feedback, and grade performance. More recently, the TAO ITS was integrated into the Northrup Grumman developed PC-based, Open architecture for Reconfigurable Training Systems (PORTS) that runs the Combined Tactical Training and Analysis System. The enhancement to TAO ITS in PORTS is a speech-enabled graphical user interface that enables trainees to verbally command and receive information with simulated crew members. The ITS architecture includes expert, curriculum, tutor and student models, with an assessment method that combines data from training conditions (e.g., tactical actions made by the TAO and other friendly and hostile actors), event sequences in the simulation, and student responses to identify gaps in knowledge and skills (Domeshek et al., 2019). By the mid-2000s, adaptive SBT was expanded to solve training needs resulting from the large numbers of service members being deployed and redeployed to Iraq and Afghanistan. For example, as noted earlier, the DARPA DARWARs program enabled advances in authoring, intelligent agents, and natural language processing technologies to enable rapid development, testing, and deployment of the DARWARS tactical language trainer which uses interactive avatars to train local culture and customs across a variety of geographic locations (Johnson, 2010; Johnson & Lester,

Performance Measurement Issues and Guidelines

233

2016). The Department of Defense Language and National Security Education Office now hosts the VCAT as an operational training readiness tool. The DARWARS program had a significant impact on moving gaming platforms into mainstream training venues, with VCAT being an early example of what is now termed Serious Games (SGs) for training (Interservice/Industry Simulation, Training, and Education Conference, 2021). Consequently, by the mid-2000s, the SGs industry had established a niche in adaptive learning environments (Laamarti, Eid, & Saddik, 2014) and since then has seen dramatic growth across most public and private sector industries. Clement (2021) reported the SGs commercial market value had risen to an estimated US$3.5 billion in 2018, and was projected to be $24 billion by 2024. Recent meta-analyses have found the effectiveness of SGs is about the same as other training methods (Johnson, Deterding, Kuhn, Staneva, Stoyanov, & Hides, 2016; Maheu-Cadotte et al., 2020). Many examples of SGs can be found at the Interservice/Industry Simulation, Training, and Education Conference (I/ITSEC) (2021) website for SGs. The I/ITSEC SGs Challenge was established to improve training effectiveness by encouraging developers to incorporate such learning principles as measurable, challenging objectives, adaptive assessments, gameplay dynamics to increase user engagement, positive and negative feedback on progress, and making them easily accessible online 24/7 (Cannon-Bowers & Bowers, 2010). We propose that a key enabler of SG success is the focus on training a small set of well-defined, observable task skills which control the cost of developing assessment and measurement. In summary, the growth in commercially viable SGs continues unabated and will likely increase exponentially with the expansion of intelligent agent technologies and access to big data. Continued success, however, requires ensuring the learning assessments used are valid and reliable.

A CONFIRMATORY PERFORMANCE MEASUREMENT FRAMEWORK FOR ADAPTIVE SBT The trends summarized previously—the use of performance information for diagnosis, use of simulation for distributed training, and increased automation and customization of training content—present a number of challenges for the development of performance measures. The hardware and software technologies contributing to these trends have evolved at a rapid pace. As a result, trainers have the opportunity to extract vast amounts of raw data from a simulation, employ data mining algorithms to seek out consistencies in the data, or rapidly reconfigure training scenarios based on an arbitrary notion of the trainee’s skill deficiencies. In other words, it is very easy for the trainer to set up an ad hoc training strategy with accompanying ad hoc performance measures and little guidance in terms of defining training objectives, identifying critical skills to be trained, developing valid performance measures that convey individual differences in the critical skills (and allow changes in the skills to be modeled over the course of training), and evaluating training effectiveness. More generally, there is a distinct requirement to integrate useful and valid assessment architectures within these learning technologies. As exemplified by the GIFT

234

Human Factors in Simulation and Training

architecture and multiple SBT and tutoring technologies that use it, assessment provides a foundational utility for virtually every learning application within these systems. A deeper understanding of how these symptoms accelerate student learning reveals the critical distinction between performance measurement (i.e., nonevaluative description of learning relevant behaviors), assessment (i.e., use of performance information to make inferences regarding the latent knowledge, skills, abilities, or other personal characteristics that drive observable performance), and diagnosis (i.e., identifying the root causes of suboptimal performance). In an effort to provide such guidance, we propose a comprehensive measurement framework useful for guiding the development and evaluation of performance measures in adaptive SBT contexts (see Figure 10.1). Specifically, we define the criteria for a performance measurement framework and use it to provide specific principles and guidelines for the development and use of performance measures. The essential characteristic of performance measures conforming to this framework is that they support a confirmatory strategy for measuring and evaluating performance in adaptive training contexts. It is confirmatory in that it is a closed-loop system of predictions and observations that enable sound inferences regarding hypothesized relationships among training objectives, performance episodes, and outcomes. This highlights the need for a process-based approach to training planning, execution, and evaluation where learner performance is measured continuously and training outputs directly lead to revised inputs. Similar representations of the training process, its phases, and the role of performance measures have been proposed in other research and development

FIGURE 10.1  Performance measurement framework for simulation-based training.

Performance Measurement Issues and Guidelines

235

efforts. For example, Zachary, Cannon-Bowers, Bilazarian, Krecker, Lardieri, and Burns (1999) proposed a model of embedded training systems that describes the cycle through which historical training data and job performance standards are combined to create performance objectives; training objectives are used to script scenario events and derive performance measures; and performance measures are used as the basis for performance diagnosis, feedback, and revision of trainee performance history. Our model augments and extends this framework, specifically: (1) Knowledge, skills, abilities, and other personal characteristics (KSAOs) necessary for effective job performance are derived from job/task analytic data, rated on independent dimensions (e.g., importance and criticality), and linked empirically to specific tasks. (2) Critical KSAOs are used to define performance standards and drive the development of performance measures embedded within scenario content. (3) Performance standards are used to determine instructional objectives, which in turn are used to script and generate training scenarios with embedded performance measures. (4) Observed performance on the embedded measures provides estimates of the underlying KSAOs represented by the measures. (5) Performance standards and KSAO estimates jointly determine performance feedback and subsequent training objectives. (6) Revised training objectives drive the generation of new training content targeting deficient KSAOs. (7) Performance measures obtained throughout training are used to evaluate the training system itself, continuously improving it with innovative instructional methodologies with the ultimate goal of maximizing training transfer. A number of features of the framework should be noted. First, a feedback loop is integrated into the training delivery cycle portion of the model that emphasizes performance measures are error-prone indicators of latent KSAOs and the training delivery cycle reflects an ideographic approach by iteratively assessing relevant KSAOs and strategically inserting scenario elements designed to evoke learning throughout training. We propose that KSAOs can be estimated indirectly via repeated performance measurement under changing scenario conditions. The confirmatory process of repeatedly exposing trainees to varying, graded training content follows the logic of adaptive testing and allows for repeated, increasingly accurate measurements of latent KSAOs (Embretson & Reise, 2013). The resulting KSAO measurements drive subsequent generation of scenario content tailored to the learner’s deficient KSAOs and permit the possibility of modeling performance changes throughout training. Second, the model emphasizes the complementary roles of deliberate practice and performance feedback. Performance feedback provides a mechanism for highlighting specific gaps in the learner’s knowledge structure, correcting misconceptions, and guiding students through effective solutions to specific problems. However, we believe that a complementary mechanism should

236

Human Factors in Simulation and Training

serve the express purpose of generating training content that targets specific skill deficiencies. Adaptive SBT can provide such a capability by allowing for focused, deliberate practice within a realistic simulation environment, a necessary capability to support expert performance (Ericsson, Hoffman, Kozbelt, & Williams, 2018). Although the focus of our discussion is on performance measurement issues in adaptive training, it should be noted that the same issues are relevant for the development of tailored feedback. Third, the model assumes that training content will be customized to target the KSAOs of a single trainee. This limitation stems from the fact that the theory and methods necessary to develop tailored training are still in early stages of development. Thus, it would be difficult to generate training content that is simultaneously tailored to the needs of multiple individuals, each with unique skill deficiencies, performing interdependent tasks in a team setting. Although research has pointed to the potential use of intelligent agents to simulate the actions of teammates resulting in a simulated team environment and allowing practice of critical teamwork behaviors (Zachary, Santorelli, Lyons, Bergondy, & Johnston, 2001).

DIMENSIONS AND ESSENTIAL CHARACTERISTICS OF PERFORMANCE MEASURES As adaptive SBT continues to rapidly mature, so does the need to identify and develop desirable characteristics of performance measures, which includes psychometric quality (DiCerbo, Shute, & Kim, 2019; Katz, LaMar, Spain, Zapata-Rivera, Baird, & Greiff, 2017). To address this gap, we identified six dimensions—validity, criterion relevance, reliability, measure invariance, objectivity and intrusiveness, and diagnosticity—along which performance measures in adaptive SBT can vary. The idea is for the SBT to create high levels of each of these dimensions. This provides a basis for identifying desirable characteristics of performance measures and recommendations for improving measurement practices.

Validity Performance measurement refers to the process through which behaviors observed within a job or training environment are translated into a comprehensive summary snapshot of an individual’s performance in a particular setting, providing the foundation for subsequent performance evaluation, management, or prediction. Performance measures are commonly referred to as criteria because they provide a basis for arriving at evaluative judgments about individual job performance and often serve as the dependent variable (or criterion) in validity research. The concept of a set of interrelated, observable behaviors that are relevant for accomplishing higher-order job or training objectives is consistent with the notion of a psychological construct, a conceptual term used heuristically to articulate, describe, and predict a set of related, covarying behaviors associated with a phenomenon of theoretical interest (e.g., intelligence, personality, anxiety, and expertise) (Binning & Barrett, 1989; Cronbach & Meehl, 1955; Edwards & Bagozzi, 2000). A measure, in contrast,

Performance Measurement Issues and Guidelines

237

is an observed score recorded using some measurement method (e.g., self-report, interview, observation, and objective measurement) that represents an empirical analog of a construct (Edwards & Bagozzi, 2000). Discussions of the difference between constructs and measures in the context of personnel selection, training, and performance appraisal, though limited, have distinguished between an ideal, hypothetical criterion construct (i.e., the domain of all behaviors important for attaining organizational, job, or training objectives) and the actual measures used to representatively sample this domain of behaviors (Binning & Barrett, 1989; Borman, Grossman, Bryant, & Dorio, 2017). Stated more precisely, the behavioral domain comprising the hypothetical criterion construct can be considered an expression of the collective KSAOs important for organizational, job, or training effectiveness, often derived from job/task analyses, and formalized in explicit training or learning objectives. However, despite the performance criterion/ measure distinction, there is no explicit unifying model useful for articulating the meaning of the validity of a performance measure. Validity studies have emphasized the predictive, criterion-related validity of a selection system or training intervention, that is, the degree to which performance on a selection test or during training predicts subsequent real-world performance. Less attention has been paid, however, to what is meant by the validity of the criterion measure in and of itself. Discussions of a measure’s construct validity often invoke the idea of a nomological network which is an expected pattern of relationships among constructs and between constructs and empirical observations (e.g., measures). Traditionally, evidence for a measure’s construct validity has been provided in the form of significant correlations, with measures purporting to measure the same or similar construct (i.e., convergent validity), and nonsignificant correlations, with measures purporting to measure dissimilar constructs (i.e., discriminant validity). Such evidence can be derived using analytical techniques such as the multimethod-multitrait matrix (Campbell & Fiske, 1959), and the resulting pattern of relationships has been termed the nomological network for the target measure. The traditional approach to construct validity has focused primarily on the degree of empirical support for a nomological network and the resulting quality of the inferences that may be drawn from it (Cronbach & Meehl, 1955). However, reliance on a nomological network as a foundation for assessing the validity of a measure introduces a potential problem that may be especially pronounced with respect to the validity of performance measures, specifically, that the meaning of a construct and its measures emerge as a result of their configuration within the nomological network (Borsboom, Mellenbergh, & van Heerden, 2004). This introduces the tautological fallacy of using the network to implicitly define the constructs of which it is composed strictly in terms of their relationships with each other and without reference to theoretical terms (Borsboom et al.). An alternate view of validity, proposed by Borsboom and colleagues (2004), poses two criteria for a valid measure: the construct exists in the real world and variations in the construct cause analogous variations in the measure. This view transfers the locus of evidence for validity from the observed relationships among measures to the response processes that convey the causal effect of a psychological construct on its measure (i.e., substantive validity; Messick, 1995). Consider measures used in

238

Human Factors in Simulation and Training

the natural sciences—a common thermometer may be considered a “valid” measure of temperature because thermal energy exists in the ambient environment and transfers the energy to the thermometer and causes the mercury to rise to a degree that depends completely on the amount of energy transferred. No allusion to a nomological network is necessary because evidence for the validity of the instrument lies entirely in the causal sequence of events linking variations in the construct with variations in the measurement instrument. This “back to basics” view that validity basically reflects how much a measure is synchronized with its intended, invisible, underlying construct is especially relevant to and necessary for the development of performance measures for adaptive SBT. It corresponds with the notion of evidence-based measurement being used increasingly in the context of SBT and ITS assessment. First, this conception of validity requires a detailed theory of response processes with respect to a set of KSAOs. That is, one must explicitly describe the behaviors associated with a given KSAO, and the specific behaviors that are indicative of various levels of effectiveness with respect to that KSAO. This constraint is often satisfied in SBT and ITS research, because both require the articulation of an expert performance model that describes the ideal behavioral patterns to be expected with respect to critical KSAOs vis-à-vis the training scenario content. In fact, many of the adaptive SBT applications described above operate off an expert model in which optimal sequences or patterns of responses are used as a reference model against which trainee performance is compared. Thus, by adopting a detailed theory of response processes, users can maximize the diagnostic potential of adaptive SBT by articulating responses to scenario content that are associated with various effectiveness levels of specific KSAOs. Second, it may be difficult to “build” nomological networks for performance measures, that is, to predict relationships between the target measures and other measures that target similar and dissimilar constructs, which are useful for construct validity inferences. We believe this to be especially true because, ideally, performance measures should be developed to target specific, nonredundant sets of behaviors such that no two measures assess the same constructs. In such cases, a performance measure’s nomological network would be composed exclusively of other measures with which one would not expect significant relationships. Third, it would be difficult to examine patterns of covariance among performance measures under dynamic scenario conditions because this could confound inferences regarding the validity of the individual performance measures and the effects of scenario modifications on changes in targeted KSAOs. Finally, researchers have noted the potential utility of response processes to support validation of measures assessing complex, multidimensional constructs (Ployhart, 2006). In order to understand whether a performance measure captures the latent construct it is designed to measure, one must understand the response processes associated with the underlying construct. This is especially true for complex constructs such as situational awareness, multitasking performance, dynamic workload management, and team communications assessed in adaptive SBT systems. A validation strategy that considers the response processes associated with relevant KSAOs is likely to support the development and refinement of performance measures consistent with the proposed measurement framework. Valid performance measures

Performance Measurement Issues and Guidelines

239

accurately represent the latent KSAOs driving observable training performance by capturing essential KSAO-specific response processes. This allows for comparison of observed performance against theory-driven benchmarks of expert performance, accurate estimation of latent KSAOs, and fine-grained customization of subsequent training content.

Criterion Relevance Another critical issue is the degree to which performance measures comprehensively, yet efficiently, capture the domain of behaviors they represent. This notion invokes the concepts of criterion relevance, deficiency, and contamination (Borman, Grossman, Bryant, & Dorio, 2017). Criterion relevance refers to the idea that performance measures should correspond to the actual performance demands of the training situation. Measures should accurately assess only the KSAOs they represent and not other irrelevant sources of variance. Two potential problems that may compromise criterion relevance are criterion contamination and deficiency. Criterion contamination refers to the degree to which a performance measure taps variance unrelated to performance demands. For example, contamination might occur if one used an automated speech recognition capability to assess the quality of team communications but the program required users to memorize keywords not typically used in the training scenario in question. In this case, the measure tapped an irrelevant construct (i.e., working memory) rather than the intended construct (i.e., communication quality). Criterion deficiency occurs when a performance measure fails to sample important training behaviors. Deficiency might occur if one intended to measure the quality of the trainee’s communications during a simulation but the scenario content was designed such that too few or only a limited variety of team communication opportunities were offered. This can cause critical problems with transfer of training. Although criterion relevance is implicitly defined as the absence of criterion contamination and deficiency, an additional consequence is enhanced parsimony and efficiency of measurement. A measurement strategy conforming to the proposed framework will include a suite of uncorrelated measures, each measuring a unique, relevant aspect of performance, that collectively provide a comprehensive, representative sample of the hypothetical criterion domain. Additionally, it is possible to maximize criterion relevance by ensuring that training objectives are based on thorough, accurate job analytic results. A comprehensive job analysis methodology will provide the blueprint detailing which KSAOs should be included in the training context, by providing empirical estimates of KSAO importance and task–KSAO linkages. Consequently, training objectives based on thorough job analysis results will help ensure that only relevant criteria are included in training scenarios.

Reliability Adoption of a perspective in which performance measures are characterized as errorprone surrogates of a hypothetical performance domain encourages examination of

240

Human Factors in Simulation and Training

the sources of unsystematic variance. In the language of classical test theory, validity usually refers to measurement accuracy, whereas reliability refers to measurement consistency (e.g., across test items, measurement intervals, or raters). The two concepts usually go hand in hand, emphasizing the notion that reliability is a necessary, but not sufficient, condition for validity. The logic underlying this notion is that measurement error in the form of unreliability must be minimized to assess the true magnitude of the statistical relationship between variables when provided as evidence for validity (Crocker & Algina, 1986; Nunnally, 1978). A related concept, method variance, refers to measurement error associated with using different methods to measure a construct, often in the context of construct validation (e.g., multitrait-multimethod matrices; Bagozzi & Yi, 1990; Coovert, Craiger, & Teachout, 1997). In the context of adaptive, simulation-based performance measurement, traditional, classical test theory notions of measurement error and reliability assessment take on a new meaning. Most importantly, method variance means something different when measuring performance in adaptive training contexts, given the fact that individuals are observed in continuously changing measurement contexts, because the performance construct itself is changing. That is, the latent KSAOs represented by performance measures are expected to improve (hopefully) over the course of training, driving subsequent scenario modifications. So, both the latent trait and the environment/intervention affecting the latent trait are changing simultaneously. Thus, the traditional conception of reliability as consistency across performance measures or measurement opportunities may not be sufficient as an indicator of measurement error in the context of adaptive training. As with any performance measurement system, reliability is a prerequisite to validity. Classical test theory notions of internal consistency as reliability may be useful in adaptive training contexts if one could establish the dimensionality of the performance measures. However, this may be limited by the fact that the performance constructs of interest in adaptive training contexts are often complex and multidimensional. Additionally, test–retest reliability cannot be meaningfully computed when the latent construct influencing test scores changes over time (as with adaptive training). It is possible, however, to develop and test covariance structures (e.g., latent growth curve models) that model performance changes over time and allow for the inclusion of scenario events as time-varying covariates (Bollen & Curran, 2006). Such analyses may provide useful insight into how reliably performance measures are functioning across time and contexts. However, such analyses have intense statistical power requirements (highlighting the need for adequate numbers of subjects) and can be employed only when there are repeated opportunities to observe performance (reiterating the requirement for multiple observational opportunities).

Measure Invariance Measurement invariance relates to the need for and “apples-to-apples” comparison between measured skills or cognitive attributes and task requirements. According to the proposed measurement framework, one must observe a trainee’s behavior across

Performance Measurement Issues and Guidelines

241

measurement contexts in order to make meaningful, reliable inferences about his or her latent KSAOs. Given that adaptive training entails continuous modification of the training environment, an essential property of performance measures is the ability to assess critical training behaviors across a range of varying scenario content. Measure invariance refers to the degree to which performance measures retain their essential measurement properties and thus can be used to make meaningful comparisons across observations and under transient scenario contexts. Psychological measures usually contain multiple assessment items varying in content and along critical psychometric parameters (e.g., difficulty, discrimination). Consistent with the notion of a psychological construct as a theory of behavioral consistency over varying contexts (Cronbach & Meehl, 1955), the specific pattern of responses across items allows observation of behavioral consistency and, consequently, inference of the individual’s standing on the latent construct (Embretson, 2006). As Embretson (2006) has noted, in response to the ongoing search for nonarbitrary metrics in psychological research, a number of innovative psychometric theories and methodologies have emerged from the psychological and educational assessment domains. It is possible to draw from these theories and methods to develop performance measures that provide meaningful information about the performance construct across changing task or scenario conditions. A notable advantage of assessment methods such as computerized adaptive testing is the use of psychometric models derived from modern test theory (i.e., item response theory) to produce trait score estimates that do not depend on population distributions (Embretson & Reise, 2013). In contrast to traditional, normative assessment strategies, in which test scores are norm referenced via classical test theory analyses (Crocker & Algina, 1986), trait scores derived from adaptive tests gain meaning by direct comparison of estimated trait levels to item parameters (e.g., the item’s difficulty level). This enhances diagnosticity, retains desirable psychometric properties of the test while reducing the length of test administrations, and allows meaningful performance comparisons across individuals who received different item sets. With the resulting measurement correspondence, we can make better and more accurate decisions about what training or assessment content is going to provide the most appropriate level of challenge given the person’s currently measured skill level. The same standards are relevant for adaptive SBT. Use of normative performance scores (e.g., mean performance across a scenario) does not allow direct observation of what specific elements of a scenario influenced performance, and thus prevents unequivocal inferences of the individual’s standing on the latent KSAO’s driving performance. That is, it does not allow one to measure which specific aspects of the scenario posed the greatest level of difficulty for the trainee. The logic of modern test theory demands that test items be scaled according to their difficulty. Analogously, one may scale scenario content according to its difficulty level by defining the minimum level of the underlying skill the trainee must possess to successfully “pass” a scenario event. Taking this logic one step further, it is also possible to create scenario content to target a very specific, exact cognitive skills or personal attributes needed for successful performance. This reflects the modern test theory concept of dimensionality. Knowing which dimensions are being measured, and to what degree they

242

Human Factors in Simulation and Training

are being challenged in the scenario provides a foundation for more accurate and meaningful sequencing of scenario content around the trainee’s iteratively measured skill level. Subsequently, performance measures can be constructed around statistical models that allow aggregation of individual performance observations across performance episodes of varying difficulty. However, for performance measures to be useful in such a context, it would be necessary for them to sensitively capture critical behaviors across the spectrum of difficulty levels and across a wide range of scenario content. Similar endeavors have been attempted, with some evidence of success, to model learner performance in complex problem-solving domains, using item response theory and other probabilistic models, such as Bayesian networks (Embretson, 1997, 1998; Levy & Mislevy, 2004; Mangos, Campbell, Lineberry, & Bolton, 2012; Mislevy, 1995; Mislevy & Wilson, 1996; Pirolli & Wilson, 1998). Such a process, if applied to adaptive SBT, would allow for a fine-grained analysis of how individuals respond to specific elements that comprise a scenario, address how behavioral responses correspond to specific skill sets or changes within an individual’s knowledge structure, and support scenario generation that adaptively challenges deficient knowledge and skills.

Objectivity and Intrusiveness Simulation-based training performance measures can differ according to their level of objectivity. Such differences may depend on characteristics of the simulation environment, which can range from realistic, immersive, and automated to artificial and contrived. The level of realism and automation inherent within the simulation can influence whether assessments can be performed passively from within the simulation environment or whether an external intervention is required to observe and record performance. The resulting degree of objectivity has implications for the quality of the inferences to be made regarding the latent KSAOs underlying observable performance. Whereas objective measures afford direct observation and measurement of specific behaviors useful for assessing and evaluating performance, subjective measures introduce an additional source of error variance in the form of rater error, potentially undermining the quality of the inferences to be drawn regarding an individual’s performance (Borman, Grossman, Bryant, & Dorio, 2017; Landy & Farr, 1980). Additionally, obtrusive measures can create a source of criterion contamination by distracting the trainee’s attention away from scenario content in order to attend to the performance measure. An example of this may be seen in typical cases of aviation or unmanned systems training where instructors sit by students as they perform the simulation, and asked situation awareness-related questions as the trainee is attempting to fly the aircraft. It is possible to create opportunities for direct, naturalistic performance measurement when measures are embedded directly into the simulation environment. Virtual, augmented, and mixed-reality simulations continuously push the envelope with respect to the level of physical fidelity, enabling increasingly realistic representations of real-world scenarios and high levels of presence (Mangos et al., 2012). Automated performance measures embedded in such systems are capable

Performance Measurement Issues and Guidelines

243

of passively recording critical behaviors without disrupting the training exercise. Often, it is possible to glean data in either raw or aggregated form directly from the simulation environment. This capability has been enhanced by a trend toward common data standards imposed by frameworks such as GIFT and ADL. Such data, when aggregated in a meaningful way, can be used to form direct, unobtrusive performance measures. However, subjective measures are often needed to assess constructs such as situational awareness that are difficult to assess purely with raw performance data. In such cases, a combination of objective and subjective measures can be used to provide a more comprehensive portrayal of performance effectiveness. However, this could introduce a number of additional measurement challenges, including the identification of highly skilled subject matter experts (SMEs) to serve as raters, intensive SME training to ensure accurate, reliable ratings of behaviors that often reflect highly specialized skills, difficulties in attending to all relevant performance information, criterion deficiency, and interrater agreement measurement. Use of multiple raters combined with intensive rater training, behaviorally based performance measures, and stringent interrater agreement and reliability criteria can help mitigate such difficulties.

Diagnosticity Simulation-based training measures can differ further according to their capabilities for translating raw performance information into assessment information useful for diagnosing skill deficits. In some training contexts, a single performance measure can be used as an indicator of the trainee’s standing on the latent KSAO it represents (e.g., number of algebra problems completed correctly as a measure of mathematical ability). Such measures are termed reflective in that they reflect or represent the manifestation of a single construct. Often, however, we are interested in constructs that represent the composite of multiple component variables. Measures for such constructs are termed formative, given that the construct is formed or induced by its measures (Edwards & Bagozzi, 2000). Formative measures are increasingly common in SBT environments because they are commonly used to target more complex constructs, such as situational awareness, teamwork skills, multitasking ability, and communication effectiveness—constructs formed by aggregating measures of more basic constructs (e.g., working memory performance on a single task dimension, and attention to a single visual/auditory stimulus). A critical issue with respect to diagnosticity is the extent to which performance measures, whether reflective or formative, provide insights regarding why an individual is performing at a suboptimal level. Measures that allow comparison of patterns of responses to a theoretical model of response processes, that is, measures conforming to the Borsboom et al. (2004) validity framework outlined earlier, are likely to be useful from a diagnostic standpoint. Such measures support inferences regarding an individual’s skill deficiencies by virtue of the observed pattern of responses. Thus, a measure that is “valid” in terms of its ability to capture KSAO-specific response processes is also likely to be diagnostic. A second issue with respect to diagnosticity is the fact that the use of purely aggregate measures without attention to the composite

244

Human Factors in Simulation and Training

measures of which they are formed could confound diagnostic inferences. Aggregate measures are useful in adaptive training contexts by providing a summary index of the complex variety of behaviors that occur within the simulation. However, the individual measures of which an aggregate measure is composed can reflect unique constructs or rater perspectives. Aggregation treats meaningful variance associated with unique perspectives or constructs as measurement error, introducing a form of aggregation bias (James, 1982; Morgeson & Campion, 2000; Sanchez & Levine, 2000) and rendering difficult the drawing of inferences regarding why a deficient score on an individual performance measure was observed. Thus, aggregate measures may not provide the diagnostic precision necessary to customize training content to target skill deficiencies. In such cases, aggregate measures may be more useful for providing performance feedback to trainees, whereas narrower, individual performance measures may be necessary for structuring adaptive training content. An especially promising measurement solution related to the issue of diagnosticity is the branch of psychometrics related to cognitive diagnostic assessment. This field goes beyond the usual emphasis of what level of the trait is being shown by a test respondent to diagnosing the actual, specific cognitive deficits and misconceptions that lead to an incorrect response. This methodology is closely related to the concepts of evidence-based measurement and principled, structured problem-solving. These models provide a mechanism for drilling down and isolating an individual’s skill deficiencies based on unique patterns of responses to strategically sequenced test items (Henson, Templin, & Willse, 2009). With the right combination of test items, these models effectively show the probability of having a specific deficiency (e.g., conceptual misunderstanding) with respect to the skill or knowledge domain being measured. For example, they have significant utility in identifying whether a student has reached specific developmental cognitive milestones, such as the acquisition of basic mathematical properties for addition and subtraction, based on specific patterns of incorrect responses when solving math problems. The utility of cognitive diagnostic models in isolating trainee skill deficiencies highlights their potential effectiveness when applied to the measurement of advanced, executive-level cognitive constructs (i.e., superordinate processes necessary to orchestrate subordinate mental processes, such as attention and working memory). In more advanced military task domains, such as command and control missions, a failure to attend to cues at the right time may be the result of a number of executive cognitive skill deficiencies, including fatigue-related vigilance decrement, channelized attention, or tunnel vision, overwhelming task-switching costs, or general mental overload. This renders the models readily extensible to diagnosing failures in advance cognitive skills within the context of military mission tasking, providing a basis for future training scenario adaptation.

MEASUREMENT PRINCIPLES FOR ADAPTIVE TRAINING Table 10.1 summarizes the critical dimensions of performance measures, the associated challenges with respect to the proposed confirmatory measurement framework, and the desirable characteristics of measures along each dimension. The primary

245

Performance Measurement Issues and Guidelines

TABLE 10.1 Strategies for Addressing Performance Measurement Challenges Performance Measure Dimension

Challenges

Measurement Strategies

Validity: The degree to which a performance measure accurately represents a performance construct in the real world (i.e., the latent KSAOs driving observable performance), and variations in the construct cause analogous variations in the measure (i.e., essential KSAOspecific response processes) (Borsboom et al., 2004)

Articulate a model of response processes and ensure measures are capable of capturing critical responses as specified by the model

1. Ensure measures are sensitive to subtle variations in the latent KSAOs driving observable performance, capturing behavioral patterns corresponding to different levels of the latent KSAOs 2. Employ specific performance benchmarks derived from theory-driven expectations about “expert” or optimal performance (e.g., expert performance model)

Criterion relevance: The degree to which the domain of behavior is captured by performance measures

Minimize criterion contamination and deficiency as potential sources of measurement error

Reliability: The degree to which measurement is consistent across context, raters, and time. Method variance: variance that is due to using different methods, e.g., observation, multiple raters, self-report

Hypothesize a model of behavioral consistency across scenario contextsModel changes in the latent KSAOs over time

1. Ensure measures comprehensively, accurately, and parsimoniously sample the criterion domain2. Map measures directly to specific training objectives guided by real-world performance demands (e.g., derived from job/task analyses) 1. Ensure repeated performance measurements across multiple observational opportunities2. Assess internal consistency reliability when the dimensionality of the measures can be established3. Assess reliability within a latent growth curve modeling approach that incorporates the presence or absence of scenario content as time-varying covariates; ensure adequate levels of statistical power (Continued)

246

Human Factors in Simulation and Training

TABLE 10.1 (CONTINUED) Strategies for Addressing Performance Measurement Challenges Performance Measure Dimension

Challenges

Measure invariance: The degree to which performance measures provide invariant measurements across transient scenario contexts

Develop performance measures that provide meaningful, interpretable metrics across performance contexts

Measure objectivity: The degree to which the measure allows direct versus indirect observation and measurement of specific behaviors. Intrusiveness: The degree to which the trainee is made aware of the measurement process

Employ a measurement strategy that does not interfere with performanceIntegrate objective/ subjective measures with different loci of assessment into a coherent suite of measures

Diagnosticity: The process of translating raw performance data into assessment information

Meaningfully aggregate raw performance data into a summary index useful for performance assessment and subsequent diagnosis and scenario modifications

Measurement Strategies 1. Use modern test theory as a psychometric framework to scale scenario content according to its difficulty level2. Ensure performance measures are sensitive to critical behaviors across a wide range of scenario content and difficulty levels 1. Use automated, embedded performance measures based on raw or aggregated data that can be obtained directly from simulation environment2. Combine subjective measures, when it is necessary to use them, with objective measures3. Employ multiple raters, rater training, behaviorally based performance measures, and stringent interrater agreement and reliability criteria when subjective rater judgments are necessary 1. Adopt a validity model that considers response processes2. Ensure that performance measures have the necessary precision to make diagnostic inferences; use aggregate measures mainly for performance feedback, and narrower, individual measures as a basis for structuring training content

challenge with respect to validity is to articulate a model of response processes and ensure that the measures are capable of capturing critical responses as specified by the model. With respect to criterion relevance, the challenge is to minimize criterion contamination and deficiency as potential sources of measurement error. For reliability, the challenge is to hypothesize a model of behavioral consistency across scenario contexts and to model changes in the latent KSAOs over time. For measurement invariance, the challenge is to develop performance measures that provide meaningful, interpretable metrics across performance contexts. For objectivity

Performance Measurement Issues and Guidelines

247

and intrusiveness, the challenge is to employ a measurement strategy that does not interfere with performance and to integrate objective measures with different loci of assessment into a coherent suite of measures. For the diagnosticity dimension, the challenge is to meaningfully aggregate raw performance data into a summary index useful for performance assessment and subsequent diagnosis and scenario modifications. In addition to the specific guidelines provided for each dimension, we offer an additional set of more general principles relevant to the effective application of performance measures in adaptive training contexts, which are as follows: ensure that performance measure development is guided by a sound theoretical framework, identify and exploit measurement affordances of the adaptive training environment, and consider training evaluation strategies early in the performance measure development process.

PRINCIPLE 1: ENSURE THAT PERFORMANCE MEASURE DEVELOPMENT IS GUIDED BY SOUND THEORY We believe that the primary ingredient for sound performance measures is grounding in a sound theoretical framework. Substantial theory development in the areas of learning, skill acquisition, practice, cognitive modeling, and psychometrics has resulted in robust, detailed theories useful for guiding training designs. The decision to customize training content or feedback delivery, automate aspects of scenario generation or performance measurement, or vary the pacing, content, or sequencing of training content should have a clearly defined theoretical rationale drawn from these lines of research. This body of knowledge will be useful for articulating how specific training interventions will influence immediate learning and performance as well as long-term retention and transfer performance. Individualized instruction in the form of adaptive training and intelligent tutoring systems has a growing theoretical and empirical research base supporting its effectiveness. A major stimulus for this research base emerged from theorizing on the concept of aptitude-treatment interactions (ATIs), which suggest that instructional interventions are effective to the extent they meet the individual needs of the learner (Cronbach & Snow, 1981; Snow & Lohman, 1984, 1993). Research on ATIs has revealed that the effectiveness of instruction is influenced by specific individual differences, endorsing an idealized model for instruction in which instructional events are customized to challenge, accommodate, or adapt to a given learner’s unique skills (Snow, 1994; Snow & Lohman, 1993). This line of research provides a useful theoretical lens for developing adaptive training content and performance measures. The notion that individuals experience the training environment differently depending on latent ability levels suggests that it is possible to systematically measure the latent abilities underlying performance, model their changes throughout training, and customize training to these evolving ability levels using adaptive instruction. Thus, it is possible to consider adaptive training as a tool for inducing a continuous sequence of ATIs throughout training, thereby providing a consistently high level of challenge without compromising motivation or overwhelming the trainee.

248

Human Factors in Simulation and Training

The research literature on ATIs emphasizes characteristics of instructional conditions that could differentially influence learning, depending on skills, abilities, and personal attributes unique to the learner. Indeed, an early application of ATIs was to inform selection for training or instruction, an application based on the tenuous assumption that the latent skills and abilities underlying training performance remained stable during training. ATI research effectively cast training performance as a between-subjects phenomenon—some training interventions are effective for a subpopulation of individuals with certain levels of a requisite skill, whereas other interventions are more effective for other, more or less skilled subpopulations. However, a critical element of adaptive training, as implied by the cyclical representation of training delivery in the proposed measurement framework, is the ability to customize instructional content, and thus model the effects of instruction on a single individual over time. This can complement ATI research by providing the within-subjects perspective needed to describe and predict how an individual learner’s performance (as an indicator of the latent KSAOs that are being targeted during training) varies throughout training (Alliger & Katzman, 1997). Research addressing individual patterns of progress through distinct cognitive stages during the learning process can provide additional insight into the mechanisms underlying ATIs (Embretson, 1997; Schoenfeld, Smith, & Arcavi, 1993). Often, the performance changes that occur when learning a complex task do not reflect a single, unitary “instance” of learning. Instead, individuals frequently experience a series of learning events in which they demonstrate effective problem-solving after experiencing impasses that emerge in the task environment (Annett, 1991; Van Lehn, 1996). A learning event corresponds to the discovery of a problem’s solution after the learner experienced errors or difficulties. Impasses signal faults in the learner’s knowledge structures, prompting the learner to divert attention from problem-solving to the discovery of new knowledge and questions about domain knowledge itself (Van Lehn). Paralleling the ATI literature, research has indicated that the nature and timing of learning events depend on idiosyncratic experiences with impasses and on levels of stable individual differences in the requisite KSAOs relevant for learning (Ackerman, 1987; Campbell & DiBello, 1996; Ohlsson, 1996; Snow, 1994). These theoretical notions hint enticingly at the possibility of structuring adaptive training content to control the occurrence of learning events or even instigate a sequence of learning events throughout training (Mangos et al., 2012). It is possible to recast the concept of ATIs and learning events as deliberate outcomes of training rather than as chance phenomena confined to basic cognitive research. However, a critical contingency in applying such promising theoretical concepts to training design is that performance measures must be designed to capture these elusive phenomena, both of which represent, ultimately, subjective experiences (Schoenfeld et al., 1993; Van Lehn, 1996). This emphasizes the key issue that performance measurement is instrumental for the translation of theory into training design. Valid performance measures will be sensitive to response processes indicative of targeted KSAOs, and models of response processes can only be constructed within the framework of a specific theory of learning.

Performance Measurement Issues and Guidelines

249

PRINCIPLE 2: CONSIDER AND EXPLOIT MEASUREMENT AFFORDANCES Considerable variability exists in the types of performance domains for which adaptive training has been developed and, consequently, in the scenario-generation methods and simulation content used to represent these domains. As stated earlier, whereas a general challenge of job performance measurement efforts has been to identify and exploit objective performance measurement opportunities, the challenge for simulation-based performance measurement has been to reduce the abundance of objective data into meaningful diagnostic patterns. This challenge is complicated further by the need to articulate and test relationships among training interventions, the psychological constructs targeted during training, and observable performance. The event-based training approach offered a promising solution to these challenges by incorporating “trigger events” into scenario content (Dwyer, Fowlkes, Oser, & Lane, 1997; Fowlkes, Dwyer, Oser, & Salas, 1998). Responses to these events served as indicators of the individual’s standing on the relevant skill being trained. This logic provides an interesting parallel to the domain of computerized assessment. As mentioned earlier, computerized adaptive testing is an assessment method that provides iterative estimation of the targeted KSAO (Olson-Buchanan & Drasgow, 1999). Computerized adaptive testing uses the individual’s responses to initial test items to provide hypothetical estimates of the underlying ability level. Subsequent items are selected on the basis of the likelihood of their providing additional diagnostic information about the underlying ability, considering both initial ability estimates and item parameters (e.g., item difficulty and discrimination) (Embretson & Reise, 2013). This form of testing relies on the logic that assessments represent a form of experimentation in which test items (representing the independent variable) elicit cumulative information about some underlying trait (the dependent variable) that influences test behavior (Embretson & Reise). This perspective is equally applicable to SBT research. Initial formulation of the event-based approach treated scenario content as an instrument for embedding individual trigger events (Cannon-Bowers & Bowers, 2009). In the context of adaptive training, however, it is possible to reconceptualize the scenario as a palette for generating a continuous stream of simulation-based assessment content useful for iterative estimation of latent skills and subsequent scenario generation. Simulationbased training scenarios are often scripted with the primary objective of realistically recreating real-world tasks or problems. Assessment content is often an afterthought in such a model, and trainers are left to force assessment opportunities out of the resulting scenario content. However, one can maximize the assessment potential of the SBT environment by considering its measurement affordances in advance, and by developing scenario content around these affordances. Several additional lines of assessment research may provide specific guidance for exploiting the measurement affordances and realizing the assessment potential of the simulation environment. One assessment method—situational judgment tests (SJTs)—provides descriptions of problem situations likely to be experienced in the task environment along with potential solutions ranging in effectiveness

250

Human Factors in Simulation and Training

as response options. SJTs purport to measure the trainees’ expectations of the effectiveness of different performance options, given realistic task cues, essentially treating the quality of these expectations as an indicator of expertise (Chan & Schmitt, 2002). The parallels between SJTs and SBT are obvious; indeed, SJTs were developed as a low-fidelity alternative to more sophisticated assessments at a time when limited computing and simulation capabilities prevented higher-fidelity assessments. It is possible now, however, to draw from SJT methods to develop simulation-based versions of SJT items to make use of the measurement opportunities afforded by SBT. An additional area of research, on the measurement construct of a “time window,” provides useful guidance for measuring and assessing performance in light of the opportunities for trainee actions offered by the simulation task (Rothrock, 2001). A time window is a measurement construct useful for decomposing simulation content according to which specific activities can be performed within a given period of time. For example, in the air defense warfare domain, the presence of task cues (e.g., three unknown radar tracks on the radar screen) and the actions of one team member (e.g., the Air Intercept Coordinator illuminating one track as hostile) can engage a window of opportunity for another team member to perform a variety of actions differing in their effectiveness (e.g., ignore or engage the track). The time window defines the time period bounded by the emergence of cues or operator actions that constrain performance and the execution of some action by the target performer. The development of the time window as a formal measurement construct is based on the premise that a functional relationship exists between action constraints and time availability, a notion grounded further in the theory of situated cognition (Hutchins, 1995; Lave, 1988). By explicitly defining operational and time constraints on performance, the time window construct is likely to be a useful tool for reducing vast amounts of objective simulation data and allowing useful inferences on the meaning of performance in light of task constraints, supporting the confirmatory measurement framework described previously. A third line of inquiry revolves around research on mathematical modeling of human performance (Campbell & Bolton, 2005; Campbell, Buff, & Bolton, 2006; Dorsey & Coovert, 2003). This research focuses on the development of formal mathematical models of human behavior with respect to situational cues and action affordances of the simulation environment. Typically, the mathematical model specifies the relationship between terms reflecting aspects of the environment (e.g., presence or absence of specific cues) and some aspect of performance (e.g., decision making). Recent efforts have compared the effectiveness of various mathematical modeling techniques (e.g., fuzzy logic, multiple regression) and have applied mathematical modeling specifically to the development of customized feedback in SBT (Campbell & Bolton, 2005; Dorsey & Coovert, 2003). Use of mathematical modeling techniques to drive adaptive scenario generation and performance measurement would be a natural extension for this research. Specifically, because mathematical models reflect explicit, quantitative hypotheses about performance under different situational cues, they provide a mechanism for developing expert performance models as a basis for assessing and diagnosing individual performance.

Performance Measurement Issues and Guidelines

251

The requirement for a closed loop, confirmatory measurement strategy is especially relevant considering the proliferation of primarily data-driven machine learning models and algorithms in SBT applications (Oswalt & Cooley, 2019). As illustrated with the event-based approach training, preplanned events, and measurement hooks can create the foundation for a confirmatory strategy. With advances in artificial intelligence and machine learning, there is an inherent risk in relying too heavily on data-driven recommendations for training content adaptations, or inferences about underlying trainee’s skill levels. Without a theoretical framework to drive expectations about what data outputs to expect within the greater context of the training situation, this risk can lead to arbitrary decisions that are too specific to the particular context.

PRINCIPLE 3: ENSURE USEFULNESS OF MEASURES FOR EVALUATING TRAINING EFFECTIVENESS A final guideline concerns the training and transfer validity of the SBT system as a whole. Ideally, performance measures will allow assessment of performance vis-à-vis specific training objectives, which should reflect real-world performance demands and be derived from job/task analyses. However, a potential limiting factor in the usefulness of performance measures for evaluating training effectiveness is the disconnect between the strategies used to assess SBT performance and those used to assess on-the-job performance. The former often employs finer-grained, objective measures, whereas the latter often uses broader, subjective measures (e.g., supervisor or multisource ratings). The resulting difference in the levels of analysis could limit the magnitude of the training validity coefficients that were observed to be useful for evaluating training effectiveness. Use of performance measures consistent with the measurement framework described earlier (e.g., invariant to changing scenario contexts) may prove useful as a foundation for evaluating training effectiveness. Performance assessed using scenario-invariant measures takes on evaluative meaning only after considering the difficulty of the scenario content in which performance was observed. Thus, this framework allows for experimental manipulation of the difficulty levels of various situational elements to address how these elements influence immediate and longterm learning and performance. For example, assessment research suggests that highly discriminating test items with moderate (e.g., 50%) difficulty levels give the most diagnostic information about a person’s actual trait level. The proposed framework allows for analogous research to address how situational parameters influence performance in the context of training, as well as training evaluation research to support inferences regarding long-term outcomes.

SUMMARY AND CONCLUSIONS The purpose of this chapter is to describe the role of performance measurement in adaptive SBT contexts in light of emerging technological and methodological

252

Human Factors in Simulation and Training

innovations. We have presented criteria for a confirmatory measurement framework to emphasize the necessity of sound performance measurement as the foundation for automated, adaptive training content and feedback delivery in SBT. Additionally, we have identified a number of dimensions along which performance measures can vary and the desirable characteristics of performance measures along each dimension to support the criteria for confirmatory performance measurement. Performance measures that conform to this framework are likely to provide high utility as a result of their diagnosticity, objectivity, unintrusiveness, comprehensiveness, and efficiency. However, perhaps the greatest advantage of implementing such measures is the ability to draw sound, causal inferences regarding relationships among scenario content, the latent psychological constructs targeted during training, and observable performance. The use of “valid” performance measures—that is, measures capable of transmitting the causal influence of the latent performance construct on observable performance—provides a necessary foundation for adaptive scenario modifications useful for iteratively assessing and correcting deficient skills as they change over the course of training. One effect of the rapid evolution of SBT systems has been the tendency to resort to ad hoc measurement strategies to reduce the large amounts of objective performance data resulting from these systems. We believe that this measurement framework provides a useful set of specific, quantifiable standards to counteract this trend, providing a key mechanism for improving long-term learning and retention.

REFERENCES Ackerman, P. L., 1987. Individual differences in skill learning: An integration of psychometric and information processing perspectives. Psychological Bulletin, 102, 3–27. Alliger, G. M., & Katzman, S., 1997. When training affects variability: Beyond the assessment of mean differences in training evaluation. In J. K. Ford, S. W. J. Kozlowski, K. Kraiger, E. Salas, & M. S. Teachout (Eds.), Improving training effectiveness in work organizations (pp. 223–246). Mahwah, NJ: Lawrence Erlbaum Associates. Annett, J., 1991. Skill acquisition. In J. E. Morrison (Ed.), Training for performance: Principles for applied human learning (pp. 13–52). Chichester: John Wiley and Sons. Bagozzi, R. P., & Yi, Y., 1990. Assessing method variance in multitrait-multimethod matrices: The case of self-reported affect and perceptions at work. Journal of Applied Psychology, 75, 547–560. Bell, B., Johnston, J., Freeman, J., & Rody, F., 2004. STRATA: DARWARS for deployable, on-demand aircrew training. In the Proceedings of the Interservice/Industry Training, Simulation & Education Conference (pp. 1–9). Arlington, VA: National Training & Simulation Association. Binning, J. F., & Barrett, G. V., 1989. Validity of personnel decisions: A conceptual analysis of the inferential and evidential bases. Journal of Applied Psychology, 74, 478–494. Bollen, K. A., & Curran, P. J., 2006. Latent curve models: A structural equation perspective. Hoboken, NJ: John Wiley and Sons. Borman, W. C., Grossman, M. R., Bryant, R. H., & Dorio, J., 2017. The measurement of task performance as criteria in selection research. In J. L. Farr, N. T. Tippins, W. C. Borman, D. Chan, M. D. Coovert, R. Jacobs, P. R. Jeanneret, J. F. Kehoe, F. Lievens, S. M. McPhail, K. R. Murphy, R. E. Ployhart, E. D. Pulakos, D. H. Reynolds, A. M.

Performance Measurement Issues and Guidelines

253

Ryan, N. Schmitt, & B. Schneider (Eds.), Handbook of employee selection (2nd Ed., pp. 429–447). Routledge. Borsboom, D., Mellenbergh, G. J., & van Heerden, J., 2004. The concept of validity. Psychological Review, 111, 1061–1071. Campbell, G. E., & Bolton, A. E., 2005. HBR validation: Integrating lessons learned from multiple academic disciplines, applied communities, and the AMBR project. In K. A. Gluck, & R. W. Pew (Eds.), Modeling human behavior with integrated cognitive architectures: Comparison, evaluation, and validation (pp. 365–396). Mahwah, NJ: Lawrence Erlbaum Associates. Campbell, G. E., Buff, W. L., & Bolton, A. E., 2006. Viewing training through a fuzzy lens. In A. Kirlik (Ed.), Adaptation in human-technology interaction: Methods, models, and measures (pp. 149–162). Oxford: Oxford University Press. Campbell, R. L., & DiBello, L., 1996. Studying human expertise: Beyond the binary paradigm. Journal of Theoretical and Experimental Artificial Intelligence, 8, 277–293. Campbell, D. T., & Fiske, D. W., 1959. Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105. Cannon-Bowers, J., & Bowers, C., 2009. Synthetic learning environments: On developing a science of simulation, games, and virtual worlds for training. In S. W. J. Kozlowski & E. Salas (Eds.), Learning, training, and development in organizations (pp. 250–282). Routledge. Cannon-Bowers, J. A., & Bowers, C., 2010. Serious game design and development: Technologies for training and learning. Hershey, PA: IGI Global. Chan, D., & Schmitt, N., 2002. Situational judgment and job performance. Human Performance, 15, 233–254. Chatham, R. E., 2007. Games for training. Communications of the ACM, 50(7), 36–43. Clement, J., 2021. Serious games market revenue worldwide 2018–2024. Statista. https:// www​.statista​.com​/statistics​/733616​/game​-based​-learning​-industry​-revenue​-world/ Coovert, M. D., Craiger, J. P., & Teachout, M. S., 1997. Effectiveness of the direct product versus confirmatory factor model for reflecting the structure of multimethod-multirater job performance data. Journal of Applied Psychology, 82, 271–280. Crocker, L., & Algina, J., 1986. Introduction to classical and modern test theory. New York: Holt, Rinehart, and Winston. Cronbach, L. J., & Meehl, P. E., 1955. Construct validity in psychological tests. Psychological Bulletin, 52, 281–302. Cronbach, L. J., & Snow, R. E., 1981. Aptitudes and instructional methods: A handbook for research on interactions. New York: Irvington. DiCerbo, K., Shute, V., & Kim, Y. J., 2019. The future of assessment in technology rich environments: Psychometric considerations. Learning, design, and technology: An international compendium of theory, research, practice, and policy. In M. J. Spector, B. B. Lockee, & M. D. Childress (Eds.), Learning, design, and technology: An international compendium of theory, research, practice, and policy (pp. 1–21). New York: Springer International Publishing. Domeshek, E., Ramachandran, S., Jensen, R., Ludwig, J., Ong, J., & Stottler, D., 2019. Lessons from building diverse adaptive instructional systems (AIS). In R. Sottilare, & J. Schwarz (Eds.), Adaptive Instructional Systems: First International Conference (pp. 62–75). Cham, Switzerland: Springer. https://www​.stottlerhenke​.com ​/solutions​/ education​-and​-training​/intelligent​-simulation​-based​-tutoring​-nets​-efficiencies​-training​ -tactical​-action​-officers/ Dorsey, D. W., & Coovert, M. D., 2003. Mathematical modeling of decision making: A soft and fuzzy approach to capturing hard decisions [Special issue]. Human Factors, 45, 117–135.

254

Human Factors in Simulation and Training

Durlach, P. J., & Lesgold, A. M. (Eds.), 2012. Adaptive technologies for training and education. Cambridge: Cambridge University Press. Dwyer, D. J., Fowlkes, J. E., Oser, R. L., & Lane, N. E., 1997. Team performance measurement in distributed environments: The TARGETs methodology. In M. T. Brannick, E. Salas, & C. Prince (Eds.), Team performance assessment and measurement: Theory, methods, and applications (pp. 137–154). Hillsdale, NJ: Lawrence Erlbaum Associates. Edwards, J. R., & Bagozzi, R. P., 2000. On the nature and direction of relationships between constructs and measures. Psychological Methods, 5, 155–174. Embretson, S. E., 1997. Multicomponent item response models. In W. J. Van der Linden, & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 305–322). New York: Springer-Verlag. Embretson, S. E., 1998. A cognitive design system approach for generating valid tests: Approaches to abstract reasoning. Psychological Methods, 3, 300–396. Embretson, S. E., 2006. The continued search for nonarbitrary metrics in psychology. American Psychologist, 61, 50–55. Embretson, S. E., & Reise, S. P., 2013. Item response theory. London, UK: Psychology Press. Ericsson, K. A., Hoffman, R. R., Kozbelt, A., & Williams, A. M. (Eds.), 2018. Cambridge handbooks in psychology. The Cambridge handbook of expertise and expert performance. Cambridge: Cambridge University Press. Fletcher, J. D., 2009. Education and training technology in the military. Science, 323(5910), 72–75. Fowlkes, J. E., Dwyer, D. J., Oser, R. L., & Salas, E., 1998. Event-based approach to training (EBAT). International Journal of Aviation Psychology, 8, 209–221. Henson, R. A., Templin, J. L., & Willse, J. T., 2009. Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191. Hutchins, E., 1995. Cognition in the wild. Cambridge, MA: MIT Press. Interservice/Industry Simulation, Training, and Education Conference, 2021. Serious games showcase and challenge: Serious games. Arlington, VA: National Simulation Training Association. http://sgschallenge​.com ​/serious​-games James, L. R., 1982. Aggregation bias in estimates of perceptual agreement. Journal of Applied Psychology, 67, 219–229. Johnson, W. L., 2010. Serious use of a serious game for language learning. International Journal of Artificial Intelligence in Education, 20(2), 175–195. Johnson, D., Deterding, S., Kuhn, K. A., Staneva, A., Stoyanov, S., & Hides, L., 2016. Gamification for health and wellbeing: A systematic review of the literature. Internet Interventions, 6, 89–106. Johnson, W. L., & Lester, J. C., 2016. Face-to-face interaction with pedagogical agents, twenty years later. International Journal of Artificial Intelligence in Education, 26(1), 25–36. Johnston, J., Sottilare, R., Sinatra, A. M., & Burke, C. S. (Eds.), 2018. Building intelligent tutoring systems for teams: What matters. Bingley: Emerald Publishing Ltd. Katz, I. R., LaMar, M. M., Spain, R., Zapata-Rivera, J. D., Baird, J. A., & Greiff, S., 2017. Validity issues and concerns for technology-based performance assessments. In R. Sottilare, A. Graesser, X. Hu, & G. Goodwin (Eds.), Design recommendations for intelligent tutoring system: Assessment methods (Vol. 5, pp. 209–224). Aberdeen Proving Grounds, MD: U.S. Army Research Laboratory. Kulik, J. A., & Fletcher, J. D., 2016. Effectiveness of intelligent tutoring systems: A metaanalytic review. Review of Educational Research, 86(1), 42–78. Laamarti, F., Eid, M., & El Saddik, A., 2014. An overview of serious games. International Journal of Computer Games Technology. https://doi​.org​/10​.1155​/2014​/358152 Landy, F. J., & Farr, J. L., 1980. Performance rating. Psychological Bulletin, 87, 72–107.

Performance Measurement Issues and Guidelines

255

Lave, J., 1988. Cognition in practice: Mind, mathematics and culture in everyday life. Cambridge: Cambridge University Press. Levy, R., & Mislevy, R. J., 2004. Specifying and refining a measurement model for a computer-based interactive assessment. International Journal of Testing, 4, 333–369. Ma, W., Adesope, O. O., Nesbit, J. C., & Liu, Q., 2014. Intelligent tutoring systems and learning outcomes: A meta-analysis. Journal of Educational Psychology, 106(4), 901. Maheu-Cadotte, M. A., Cossette, S., Dubé, V., Fontaine, G., Lavallée, A., Lavoie, P., Mailhot, T., & et al., 2020. Efficacy of serious games in healthcare professions education: A systematic review and meta-analysis. Simulation in healthcare: Journal of the Society for Simulation in Healthcare. https://journals​.lww​.com ​/sim​ulat​ioni​nhea​lthcare​/ Abstract ​/9000​/ Efficacy​_of​_ Serious​_Games​_in​_Healthcare​.99412​.aspx Mangos, P. M., Campbell, G., Lineberry, M., & Bolton, A., 2012. Emergent assessment opportunities: A foundation for configuring adaptive training environments. In P. J. Durlach, & A. M. Lesgold (Eds.), Adaptive technologies for training and education (pp. 222–235). Cambridge: Cambridge University Press. Mangos, P. M., & Johnston, J. H., 2009. Performance measurement issues and guidelines for adaptive, simulation-based training. In D. A. Vincenzi, J. A. Wise, M. Mouloua, & P. A. Hancock (Eds.), Human factors in simulation and training (pp. 301–320). Boca Raton: CRC Press. McCarthy, J. E., 2008. Military applications of adaptive training technology. In M. D. Lytras, D. Gasevic, & W. Huang (Eds.), Technology enhanced learning: Best practices (Vol 4, pp. 304–347). Hershey, PA: IGI Global. McCarthy, J. E., Pacheco, S., Banta, H. G., Wayne, J. L., & Coleman, D. S., 1994, November. The radar system controller intelligent training aid. In the Proceedings of the 16th Interservice/Industry Training Systems and Education Conference (pp. 1–10). Arlington, VA: National Training & Simulation Association. McCarthy, J. E., Wayne, J. L., & Deters, B. J., 2013. Supporting hybrid courses with closedloop adaptive training technology. In A. Peña-Ayala (Ed.), Intelligent and adaptive educational-learning systems: Achievements and trends (pp. 315–337). Berlin, Heidelberg: Springer. Messick, S., 1995. Validity of psychological assessment. American Psychologist, 50, 741–749. Mislevy, R., 1995. Probability-based inference in cognitive diagnosis. In P. Nichols, S. Chipman, & R. Brennan (Eds.), Cognitively diagnostic assessment (pp. 43–71). Hillsdale, NJ: Lawrence Erlbaum Associates. Mislevy, R. J., & Wilson, M., 1996. Marginal maximum likelihood estimation for a psychometric model of discontinuous development. Psychometrika, 61, 41–71. Morgeson, F. P., & Campion, M. A., 2000. Accuracy in job analysis: Toward an inferencebased model. Journal of Organizational Behavior, 21, 819–827. Murray, W. R., 1991. An endorsement-based approach to student modeling for planner-controlled intelligent tutoring systems (AL-TP-1 991-0030). Brooks Air Force Base, TX: Armstrong Laboratory. Nunnally, J. C., 1978. Psychometric theory. New York: McGraw-Hill. Ohlsson, S., 1996. Learning from performance errors. Psychological Review, 103, 241–262. Olson-Buchanan, J. B., & Drasgow, F., 1999. Beyond bells and whistles: An introduction to computerized assessment. In J. B. Olson-Buchanan, & F. Drasgow (Eds.), Innovations in computerized assessment (pp. 1–6). Mahwah, NJ: Lawrence Erlbaum Associates. O’Neil, H. F., Baker, E. L., Wainess, R., Chen, C., Mislevy, R., & Kyllonen, P., 2004. Plan for the assessment and evaluation of individual and team proficiencies developed by the DARWARS Environments. Sherman Oaks, CA: Advance Design Information.

256

Human Factors in Simulation and Training

Oswalt, I., & Cooley, T. (2019). Simulation-based training’s incorporation of machine learning. Proceedings of the 2019 Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC). Arlington, VA: National Training and Simulation Association. Pirolli, P., & Wilson, M., 1998. A theory of the measurement of knowledge content, access, and learning. Psychological Review, 105, 58–82. Ployhart, R. E., 2006. The predictor response process model. In J. A. Weekly, & R. E. Ployhart (Eds.), Situational judgment tests: Theory, measurement, and application (pp. 83–106). Mahwah, NJ: Lawrence Erlbaum Associates. Roberts, B., & Diller, D., 2014. Development methods. In T. Hussain, & S. Coleman (Eds.), Design and development of training games: Practical guidelines from a multidisciplinary perspective (pp. 464–475). Cambridge: Cambridge University Press. Rothrock, L., 2001. Using time windows to evaluate operator performance. International Journal of Cognitive Ergonomics, 5, 1–21. Sanchez, J. I., & Levine, E. L., 2000. Accuracy or consequential validity: Which is the better standard for job analysis data? Journal of Organizational Behavior, 21, 809–818. Schoenfeld, A. H., Smith, J. P., & Arcavi, A., 1993. Learning: The microgenetic analysis of one student’s evolving understanding of a complex subject matter domain. In R. Glaser (Ed.), Advances in instructional psychology (Vol. 4, pp. 55–177). Hillsdale, NJ: Lawrence Erlbaum Associates. Smith, P. A., & Bowers, C., 2019. Serious games advancing the technology of engaging information. In M. Khosrow-Pour (Ed.), Advanced methodologies and technologies in media and communications (pp. 153–164). Hershey, PA: IGI Global. Snow, R. E., 1994. Abilities in academic tasks. In R. J. Sternberg, & R. K. Wagner (Eds.), Mind in context: Interactionist perspectives on human intelligence (pp. 3–37). Cambridge: Cambridge University Press. Snow, R. E., & Lohman, D. F., 1984. Toward a theory of cognitive aptitude for learning from instruction. Journal of Educational Psychology, 76, 347–376. Snow, R. E., & Lohman, D. F. (1993). Cognitive psychology, new test design, and new test theory: An introduction. In N. Fredericksen, R. J. Mislevy, & I. I. Bejar (Eds.), Test theory for a new generation of tests (pp. 1–17). Hillsdale, NJ: Erlbaum. Sottilare, R., 2013. Special report: Adaptive intelligent tutoring system research in support of the army learning model research outline (ARL-SR-0284). Aberdeen Proving Ground, MD: Army Research Laboratory. Spain, R. D., Priest, H. A., & Murphy, J. S., 2012. Current trends in adaptive training with military applications: An introduction. Military Psychology, 24(2), 87–95. Stephanidis, C., Salvendy, G., Antona, M., Chen, J. Y. C., Dong, J., Duffy, V. G., Fang, X., & et al., 2019. Seven HCI grand challenges. International Journal of Human–Computer Interaction, 35(14), 1229–1269. Van Lehn, K., 1996. Cognitive skill acquisition. Annual Review of Psychology, 47, 513–53. Zachary, W., Cannon-Bowers, J., Bilazarian, P., Krecker, D., Lardieri, P., & Burns, J., 1999. The Advanced Embedded Training System (AETS): An intelligent embedded tutoring system for tactical team training. International Journal of Artificial Intelligence in Education, 10, 257–277. Zachary, W., Santorelli, T., Lyons, D., Bergondy, M., & Johnston, J. H., 2001. Using a Community of intelligent synthetic entities to support operational team training. In the Proceedings of the Computer Generated Forces Conference (pp. 215–224). Orlando, FL.

11

Scoring Simulations with Artificial Intelligence Carter Gibson, Nick Koenig, Joshua Andrews, and Michael Geden

CONTENTS Artificial Intelligence and Reproducing Expert Ratings......................................... 259 Traditional Approach to Scoring Open-Ended Content: Rater Training................. 261 The Architecture............................................................................................ 263 The Data......................................................................................................... 263 Output ...........................................................................................................264 Other Considerations.....................................................................................264 Scoring Actions in Simulated Environments.......................................................... 265 Traditional Approaches to Scoring Simulations............................................ 265 Data Representations for Modeling Simulations...........................................266 Machine Learning Methods for Scoring Simulations.................................... 267 Static Methods Using Summarized Representations..................................... 268 Time Series Methods..................................................................................... 269 Applications................................................................................................... 270 Trainee Feedback........................................................................................... 271 Early Prediction............................................................................................. 271 Real-Time Feedback...................................................................................... 272 Adaptive Simulations..................................................................................... 272 Conclusion.............................................................................................................. 273 References............................................................................................................... 274 The world is currently living through what some have called the fourth industrial revolution (Schwab, 2017). In this framework, the first three revolutions related to water and steam power, electric power, and electronics and information technology to automate production, respectively. This fourth industrial revolution is characterized by technologies that blur the lines among the physical, digital, and biological spheres and include such advances as the Internet of Things, 3D printing, nanotechnology, quantum computing, and, most importantly for this chapter, artificial intelligence (AI). Use of AI has increased in the last decade and is a large driver of innovation (Rust & Huang, 2014). In obvious and less obvious ways, AI is already impacting many areas of daily life (Poola, 2017). AI gets attention for high-profile uses, such as in autonomous vehicles or how it’s proving more accurate than expert

DOI: 10.1201/9781003401360-11

257

258

Human Factors in Simulation and Training

radiologists (e.g., Hosny et al., 2018; Schwarting, Alonso-Mora, & Rus, 2018). AI is also being used in smaller ways to subtly improve areas of modern life, such as in unlocking a phone with facial recognition, giving grammar advice for writing, filtering emails as spam, giving users personalized ads when browsing the web, or helping banks identify fraud (e.g., Ryman-Tubb, Krause, & Garn, 2018). Already ubiquitous, AI, and the fourth industrial revolution more broadly, promises to impact almost every field and job, including simulations and training. Crucial to understanding how AI is changing the field of simulations and training is first examining where simulations started. Simulations are built to mimic or reproduce a specific context. The typical goal is to train or measure in an environment at lower risk than learning on the job. For example, it’s not best practice for a pilot to learn how to perform a difficult maneuver or a surgeon to try out a new technique in the high-stakes context of their actual work. Perhaps an organization simply wants to train a leader to be a better communicator or give higher-quality performance appraisals. Simulations allow for structured, safe, and deliberate practice in a lower-stakes environment to develop skills that will transfer to the higher-stakes circumstances in the workplace. Simulations exist across a wide continuum, from highly realistic and technical (e.g., a flight simulator that accurately reproduces all of the controls in a plane) to more conceptually representative, like a paper-and-pencil activity. At a high level, the concept of fidelity can be demonstrated by how closely a simulation can recreate the appearance and potential dynamics of the simulated scenario. A realistic flight simulator would be high fidelity, whereas the paper-and-pencil task would be lower fidelity. Though some have criticized the concept of fidelity as poorly defined (e.g., Norman et al., 2018), the term is still useful for framing thinking about simulations in the field. High-fidelity simulations may create colossal amounts of data. In the context of a flight simulator, the computer could record what inputs are made, how quickly they’re made, how much pressure is applied, where the individual in the simulation is looking, and vital signs of the participant. With many of these issues, a novice may not be able to comprehend the importance of so many measures, but experts are able to take all of these inputs and provide specific feedback, advice, or general conclusions that can improve the performance of the participant in the exercise. On the other end of the fidelity spectrum, a training exercise could have a leader going through a developmental assessment center, where they work through an in-basket task writing emails, solving problems, or organizing their calendar (Motowidlo, Dunnette, & Carter, 1990). Again, a large amount of data is being created and historically has relied on expert human judgment to determine the quality of performance across the range of constructs being measured. How can simulation performance be accurately scored across varying levels of fidelity? High-fidelity simulations may have a large number of variables with varying degrees of importance, but performance in low-fidelity simulations may still be difficult to measure and quantify empirically. For example, how do you quantify the outcomes of the simulation designed to score a proficiency exam or determine who among a group of applicants should be hired for a position? While much research has gone into measurement, less work exists to combine these sources of data to predict

Scoring Simulations with Artificial Intelligence

259

important criteria (Sydell et al., 2013). We introduce the concept of fidelity to show that AI can be useful across a wide range of types of simulations, regardless of the type of data that is generated. The issue of what to do with all of the data generated by simulations and how to score them in reliable and valid ways are where AI has the potential to significantly impact how simulations are used. And while AI is changing the field, these changes were not unanticipated. That is, scholars have pointed to this future long before technology had the models and processing power to bring them about. Scholars pointed to two major predictive improvements in the area of scoring simulations: (1) combining information across item types and assessment experiences, and (2) leveraging the power of increasingly large sample sizes (and data sources) (Sydell et al., 2013). AI is following through with these promises and has significant implications for training and simulations, specifically because of its ability to automate scoring of data sources that were previously impossible to score by machines (or even, sometimes, by humans) and through more accurate models of scoring. Of course, like many new technologies, AI won’t destroy and replace what came before it, but rather provides a new tool. This chapter discusses what AI can do, what it can’t, and offers suggestions for users to start incorporating it into their own work. We believe AI will change the way simulations are used to select and train individuals by allowing simulations to be more easily scaled, automating previously manual scoring approaches, and helping experts design better and more predictive weighting schemes to create more optimal scoring models of complex behaviors.

ARTIFICIAL INTELLIGENCE AND REPRODUCING EXPERT RATINGS Automation within the field of human factors has been an area of interest for years. Since their invention, computers have become exponentially more powerful and cheaper following a pattern called Moore’s Law (Moore, 1965). Several books within the field of human factors have been dedicated to the subject (e.g., Parasuraman & Mouloua, 1996; Mouloua & Hancock, 2020), and hardware and software are consistently making more complex automation possible. Mosier and Manzey (2020) discuss automated decision support systems (DSSs) and the value these systems provide in reducing user bias across a variety of industries. But what if it was possible to use expert human judgment to train a system and remove the experts from the loop? In many disciplines, expert human judgment is leveraged for decision-making in complex tasks. From doctors reviewing patient MRIs to assessors evaluating assessment center candidates, expert judgment has been shown to outperform novice judgment (Salkowski & Russ, 2018; Schleicher et al., 1999). Wickens et al. (2016) described decision-making as including the following key features: uncertainty, time, and expertise. These features are certainly present in many high-stakes environments where decisions must be made. The complexity of human decision-making is ripe ground for AI as advances in hardware and software make replicating complex human decisions more feasible. While AI from the 1950s to the 1980s often involved explicitly programming symbolic representations of logic and decision-making into the computer, coined

260

Human Factors in Simulation and Training

“Good Old-Fashioned Artificial Intelligence” (Haugeland, 1985, pg. 112), more recent conceptualizations of AI have leveraged the idea of machine learning and the understanding that the software can program itself if given enough data. The availability of more processing hardware at lower costs has allowed for more and more model parameters and the introduction of deep learning. Model parameters, which are defined as variables internal to the model that are estimated using data, are extremely important to all of machine learning. An example of a model parameter in a linear regression is a beta weight, which is estimated by optimizing for best fit. In the field of natural language processing, parameters in the order of magnitude of several thousand via bag-of-words were considered large just a decade or two ago. Now, we have architectures like BERT with 345 million parameters (Devlin et al., 2019) and, more recently, GPT-3 and its 175 billion parameters (Brown et al., 2020). These algorithms consist of an input layer, where data is fed into the model, several hidden layers, and an output layer where a specific prediction is made. This is considered deep learning because the neural network has several hidden layers. The increase in the size of the parameter space leads to more and more complex representations of the data via the layers and internal neurons’ ability to extract very specific subsets of information from the input data. These complex algorithms make it possible to replicate human judgment on natural language-based tasks, such as evaluating the quality of an essay, a written job simulation, or even a job candidate’s interview response. This approach can be leveraged for automating simulations and systems where trained professionals currently need to evaluate natural languagebased responses and make decisions. The need to automate the understanding of human language exists across several domains. In the medical field, this technology has been used to take clinical notes and predict hospital readmission (Huang, Altosaar, & Ranganath, 2020) and patient diagnoses from electronic health records (Franz, Shrestha, & Paudel, 2020). In the field of human resources, natural language processing has been used to automate job analysis (Mracek et al., 2021) and the scoring of work simulations (Tonidandel et al., 2020). In the following pages, we will outline a process for developing an algorithm that can replicate trained subject matter experts when it comes to evaluating work simulations and interview responses based on the written or spoken English language. Assessment centers, in-person or virtual, often have several writing exercises in the form of in-baskets that require participants to respond to an email from a peer, boss, or customer. These unstructured text responses can then be evaluated for competencies relevant to success in the role. Extracting scores from these responses requires no small investment in resources. The responses can be lengthy and accurate ratings require review by trained evaluators, knowledge workers who receive substantial compensation for their expertise. In addition, the process itself is extremely repetitive, which can lead to a vigilance decrement (Thomson et al., 2015) extremely common in such work. Transformer-based NLP algorithms are most effective for use with such long-form responses. Simple chat simulations or short answer responses would likely not benefit from the added complexity these algorithms provide.

Scoring Simulations with Artificial Intelligence

261

TRADITIONAL APPROACH TO SCORING OPEN-ENDED CONTENT: RATER TRAINING The traditional method for scoring open-ended content is an expensive process in terms of both time and money. Almost by definition, the SMEs qualified to provide ratings on a complex subject are going to be both busy and expensive. Using them to rate and evaluate a large sample of any product will be challenging, whether in an academic or applied context. In the context of an organization, maintaining a stable group of trained judges can be challenging as people leave the organization or their original role. Furthermore, if ratings of a particular product are needed in a timely fashion, it may be difficult to get quick work from a judge. Plus, several of the steps require the judges to coordinate and discuss a shared frame of reference. AI isn’t going to remove humans, or in this case experts, on a given topic, but rather change their role. Using AI won’t be as simple as just applying an algorithm and solving a problem; the tools described in this chapter still require significant effort to ensure appropriate data is being fed into the system. If the end goal is a program that can rate a work product, such as a writing sample, a large pool of writing samples as well as expert ratings of these samples would be needed. The way these expert ratings are collected, even for the purposes of building an AI model, is going to look a lot like it has traditionally. A large pool of literature dating back several decades exists on how to train raters most effectively (e.g., Bernardin & Buckley, 1981), and all of these proven steps still need to be followed. At its core, what data scientists are trying to do is translate a large amount of text or other data into a useful and reliable score in a standardized way. Once this training is completed, a set of judges with a shared mental model of the constructs being assessed will have been created. Deep learning can then be used to recreate these expert ratings, and ultimately be able to rate new writing samples independent of human judges in a reliable and valid way. While many approaches could be used, frame-of-reference training is perhaps one of the most popular (Roch et al., 2012). Conceptually, the goal of this training is to get all judges onto the same metric to minimize differences due to judges with unique ideas about what is important when rating task performance. The first step is to create rating benchmarks and standards for all variables to measure. For example, in a writing sample, it’s possible to rate overall quality and also more specific constructs such as grammar, vocabulary, and style. To create these benchmarks, the collection of writing samples would be reviewed to find examples of various anchors on the scale, such as at 1, 3, and 5 on a 5-point scale. This process would be repeated for each of the variables the judges will rate. Once the benchmarks have been built, all the judges meet to work through a small sample of cases. For each of these cases, they would provide ratings for each variable and then discuss them until they came to a consensus (i.e., shared-mental model) on what a “5” looks like, what a “3” looks like, etc. Once the judges appear to be rating in a consistently similar way, the judges would proceed to review the entire sample and rate all cases. For the purposes of AI, several hundred rated samples would be needed, typically with at least three judges to ensure confidence that the “true” rating for each writing sample

262

Human Factors in Simulation and Training

has been obtained. There may be a need to have periodic meetings and calibrations to account for things like rater drift (Harik et al., 2009). Once the judges have rated several hundred cases, an important check is to review inter-rater reliability, or the degree to which raters are consistently giving the same ratings on each variable of interest for each sample of writing. See Hall and Brannick (2008) for a good review of the various considerations when choosing a metric for inter-rater reliability, and Gibson and Mumford (2013) for an example of this rater-training process used in practice. Of course, many judgment calls will need to be made about specifics in the process. We prefer to be conservative about ratings to ensure the highest quality data to train the model. For example, in some contexts, once judges have established sufficient agreement, only one judge may be needed to rate new data. This is more likely to be viable in cases with extremely high inter-rater agreement or when rating more concrete variables (e.g., a construct that has been very specifically operationalized). In other cases, raters may be allowed to have different ratings on a specific instance as long as, on average, they’re in agreement (e.g., one rater gives a “2” and another gives a “5” on the same product). Given the relative newness of using these algorithms, we have opted to be more conservative, such as expecting ratings to have evaluations within one point of each other on a 1-to-5 rating scale. Thus, if the first rater evaluates a work sample as a “4,” the third rater would need to have an evaluation of 3, 4, or 5, for that same work sample; otherwise, they would need to meet to discuss and draw a shared conclusion. Now that labels have been created, the next step is identifying the algorithm to be used. While bag-of-words (Zhang, Jin, & Zhou, 2010) and long short-term memory recurrent neural networks (Hochreiter & Schmidhuber, 1997) are reasonable methods to use, they both have shortcomings that are beyond the scope of this chapter. Note that many of the approaches described in this chapter are new and are still actively being developed, so rather than dive into a technical guide of a given method, it is more useful to review broadly the considerations when choosing an analytic technique. More recently, Vaswani et al. (2017) introduced the transformer architecture, and Devlin et al. (2019) and Liu et al. (2019) expanded upon the transformer architecture to create the current state-of-the-art model: the bidirectional encoder representations architecture (BERT). This architecture has the capacity to take in embeddings as representations of the words and adjust those representations depending on the words coming both before and after it. These word embeddings hold information about the word or token’s relationship to other words. This makes it possible for the language model to understand that in most cases, the words “customer” and “client” are very similar. While this architecture was originally very successful at predicting masked words (i.e., what a hidden word was most likely to be given surrounding words) and similarly creating state-of-the-art language translations (i.e., Google Translate), the researchers quickly realized it could also excel in downstream tasks, like question answering, predicting sentence sentiment, and more. These downstream tasks are tasks that the model wasn’t explicitly trained for, but could be trained to do with new data and sufficient computational power. Because of its ability to produce accurate results on a number of complex natural language

Scoring Simulations with Artificial Intelligence

263

processing tasks, this architecture is ideal for the downstream task of replicating human evaluations on complex human behaviors within written work simulations.

The Architecture BERT is freely and readily available via Hugging Face (huggingface​.​co). There are many variants to choose from, but we recommend RoBERTa’s base model. There is also the need for hyperparameter tuning of the model. Hyperparameters are parameters outside of the algorithm itself that control how the algorithm performs. In neural networks, these can be things like learning rate, percentage of nodes/ neurons that are dropped (dropout), optimization function, batch size, and more. We found success trying differing learning rates and dropout rates while using the largest batch size that could fit into memory.

The Data An important point about machine learning and deep learning in general is that the algorithms are extremely powerful when it comes to learning from the data they were trained on. For this reason, users need some form of a holdout set to ensure predictions will generalize on responses outside of the specific responses the algorithm was trained on. When hyperparameter tuning, users will want to use a cross-validation strategy to prevent inadvertently overfitting to the holdout set. Hyperparameters, unlike model parameters, are parameters that need to be set outside of the model itself, but can have an impact on the quality of prediction. For this reason, it may make sense to test a variety of combinations to identify the best set for your given data. One common hyperparameter for the transformer architecture is learning rate, which is simply the size of the update step used for moving along the gradient. Another common hyperparameter is neuron dropout. This is the proportion of neurons within the hidden layers that are randomly set to zero. This is an extremely effective regularization technique for deep learning. First, test different hyperparameters on a k-fold cross-validation sample set. A k-fold cross-validation set involves slicing your data into k-folds; common k’s used would be 5 or 10. For example, given a dataset with a sample size of 1,000 and a 5-fold cross-validation, 800 responses and labels would be used to train the algorithm, 200 responses would be predicted, and then the process would be repeated with each set of 200 being used as the holdout set. When hyperparameters are found that provide satisfactory results, train the final model on a training set consisting of roughly 80% of the data and evaluate the model’s performance on a holdout set of 20% of the data. We also recommend data stratification, which consists of identifying differences in the data and ensuring those differences are consistent across the folds. The outcome/label is that it is important to ensure it is stratified across folds. Other things to consider may be the length of the responses, the population the sample was taken from, and any other parameters that may differ within the data. This stratification ensures that the model is consistently being trained and evaluated on very similar data.

264

Human Factors in Simulation and Training

An example we have used this on was virtual assessment center in-basket email responses used as part of a multi-method assessment for job selection. Candidates responded to a fictitious email from a colleague inquiring about a problem they were facing and asking for guidance on how to proceed. The candidate was asked to respond to the email with recommendations for handling the specific situation. This response was evaluated on a number of competencies, from effective communication to an ability to drive results. One thousand responses were labeled by two trained subject matter experts on each of the competencies operationalized using predetermined behaviorally anchored rating scales, and the two raters had to come to a consensus on the competency rating. The responses and labels were then stratified across label distributions into both 5-folds and a final unrelated 80/20 split for final algorithm training. Several different dropout rates ranging from 0.05 to 0.15 and several different learning rates ranging from 1-e3 to 1-e6 were tested. The means and standard deviations of the correlations along with the means and standard deviations of the mean squared errors (MSEs) on the holdout folds were compared to one another and a final set of hyperparameters was chosen. MSE was chosen as the optimization function because our specific purpose involved a regression-based output. The best hyperparameters were then used to train a final model on the original 80/20 stratified split.

Output The final result is a deterministic algorithm that can be used to make predictions on new work samples within milliseconds of when the candidates produce them. The algorithm is deterministic in the sense that, given identical inputs, it will always produce an identical output. This differs significantly from the stochastic nature of the deep learning training process, where given the way the model is trained each time you retrain it, you will end up with different parameter weights and thus different predictions. A current paper by some of the authors (Thompson et al., forthcoming) found correlations between SMEs and the algorithm predictions that averaged above 0.84 on seven separate competency/work simulations. We also found that, on average, predictions on the competencies were within one point of the consensus (on a 1–5 scale) SME evaluations 91% of the time and within 0.5 of the consensus SME evaluations 66% of the time, providing evidence that the algorithms are consistently and accurately replicating the SMEs’ evaluation of the job candidate on these job-relevant competencies. In an effort to examine how good or bad these hit rates were, the algorithm was compared to the pre-consensus SMEs. This evaluation found that 75% of the time the SMEs were within one point of each other on a rating before consensus. This research suggests that not only is the algorithm evaluating the responses very similarly to the consensus rating provided by the two SMEs, but it is also producing more consistent ratings than any one SME.

Other Considerations As was mentioned earlier, there are other options for replicating work simulation evaluations on natural language data. Our research found that the transformer

Scoring Simulations with Artificial Intelligence

265

architecture outperforms the bag-of-words and LSTM architectures using 25% of the sample size. With a training set of 250 responses, we found that the evaluations on the holdout of 250 responses were more accurate than when using 750 responses to train the bag-of-words model. This is almost certainly a result of the transfer learning these transformers provide. As briefly discussed earlier, the transformer architecture comes with data built into it via its language model. It naturally has a vocabulary and representations of that vocabulary from the original purpose of the architecture, which was to predict masked words and upcoming sentences. This encoded information allows the transformer architecture to make robust and generalizable predictions on several downstream tasks with fractions of the data of more naïve architectures/implementations. While the above may not provide a step-by-step review of how to use the described tools, it should provide a conceptual overview of the process by which work simulations can be evaluated using SMEs and leveraging the transformer architecture to produce highly accurate predictions on never-before-seen responses to the same work simulation, effectively removing the human rater from the loop and automating what was once considered an extremely complex task that required human expert judgment and decision-making.

SCORING ACTIONS IN SIMULATED ENVIRONMENTS Beyond unstructured text (i.e., text that does not have a predefined format), simulations capture detailed data about an individual’s actions within the environment, including their motions, interactions with objects, and contextual information (e.g., time stamp, NPC). Event and motion data could have transformative potential and utility for scoring performance and providing feedback in virtual simulations. Actions within simulations are often logged by the software in a way that would be impossible for human raters to comprehend. Simulations produce a rich source of data on individuals, providing a considerable opportunity for training AI/machine learning models to identify new metrics and features relevant to success. Nonetheless, this source of data also comes with additional complexity and considerations that makes it difficult to structure and analyze.

Traditional Approaches to Scoring Simulations Scoring and rating simulations still rely heavily on SMEs to generate scoring metrics and provide trainee feedback. The current gold standard for scoring such simulations is to leverage SMEs to develop a mapping from an individual’s actions to the quality of their performance within the simulation, typically by providing ratings along the relevant dimensions of interest for the virtual simulation (Boyle et al., 2018; Oquendo et al., 2018). This is often performed during validation of a training simulation or when the simulation is used for evaluation purposes. Trained raters then use assessment tools (e.g., rating rubrics, checklists) to score and provide feedback to trainees. Nonetheless, scoring approaches that use SMEs have a focused perspective that often uses only a small amount of the available simulation data. SMEs are especially

266

Human Factors in Simulation and Training

valuable because they can identify and measure complex and abstract behaviors and constructs within simulated environments, but often at a high cost due to the required expertise and training. The reliance on SMEs for scoring and rating simulations is not cost-effective in providing scalable feedback and high-fidelity training to enable widespread adoption. AI/machine learning techniques are particularly good at identifying patterns and therefore could help supplement traditional scoring approaches with a scalable alternative. Expert ratings for unstructured, non-text data would be invaluable for training an AI to identify mistakes and potential errors in real time, which in turn would enhance feedback and skill acquisition. An alternative approach for more scalable scoring is to produce simple metrics based on SME domain knowledge that can be easily calculated in an automated fashion. This manual feature engineering typically results in a small number of easily interpretable metrics. For example, the Fundamentals of Robotic Surgery (FRS) has offered standardization for scoring using a set of metrics, such as time-tocompletion and deviation in cutting from a prespecified region. It is also commonly employed in game-based assessments using evidence-centered design (ECD), which specifies relationships between actions in a simulation with concepts the student is trying to learn (Mislevy, Steinberg, & Almond, 2003). This approach, while scalable and simple to deploy, suffers from multiple drawbacks. First, these methods are often poor indicators of actual performance on these tasks (Mills et al., 2017). Additionally, the extraction of SME knowledge requires the use of time-consuming methods, such as cognitive task analyses, to derive the simple scoring system. Finally, manually engineered metrics are often highly specified in a context, making generalization to new scenarios challenging. For example, we would likely observe large differences between ideal metrics for scoring heart surgery compared to bone surgery. Ideally, simulations should provide a scalable method for the administration and scoring of a scenario relative to the size of the trainee body. The point of simulations is to provide easy and safe practice for skills training, where the quality of learning during simulations is dependent upon the quality of feedback provided to the learner. Research has demonstrated that machine learning-aided skills evaluation is a scalable and automated means for measuring and collecting data on multidimensional evaluation constructs (Vedula, Ishii, & Hager, 2017). However, as we note, the applicable approaches are dependent upon the collection and structuring of training data, which often includes a time-based component.

Data Representations for Modeling Simulations Virtual simulations generate detailed logs recording the actions made by the user and events that occurred within the virtual environment. These logs provide a rich source of information about the user; however, they are typically stored in formats unsuited for direct application in machine learning models. The logs contain information unrelated to modeling goals, are sampled at excessively high frequencies (e.g., location at each millisecond), and are stored in a representation that is suboptimal for direct use in modeling. The first step of the machine learning pipeline is to preprocess this data into a format amenable to analysis. Simulators sample at a

Scoring Simulations with Artificial Intelligence

267

high frequency, as it is trivial to record the data and critical to capture every relevant action taken by the user, causing data files to quickly become extremely large (e.g., gigabytes per person). The high-frequency sampling rate causes many data points to be redundant, since the time between samples is so low that few actions have been performed. Down-sampling the data while ensuring critical actions are still recorded can dramatically speed up modeling time at little to no cost to predictive performance. It can also be necessary to adjust the sampling rate of data when merging across multiple sources, such as eye-trackers, videos, and tool motion (Vedula, Ishii, & Hager, 2017). The multi-modal data streams may be collected at different sampling rates through separate software, requiring adjusting the sampling rate of the disparate data streams to align time stamps for merging. Next, the log data is transformed to remove noise and make it easier for the model to learn the desired relationships. Transformations can occur across two axes: static and temporal. Static transformations translate representations on the observation level (i.e., each time point) from the goal of efficient storage for the simulator software to being more closely aligned with the objective of the model. For example, in game-based learning environments, it can be helpful to encode the user’s goals, accomplishments, actions (e.g., talking to a non-playable character or NPC), and the entities with which they are performing the actions (e.g., the name of the NPC) (Geden et al., 2020; Min et al., 2017). It can also be useful for defining complex actions, such as the type of strokes made with a surgical tool in surgical simulations (Ahmidi et al., 2015). Additionally, temporal transformations remove trends and seasonal changes from the time series to remove undesirable artifacts from the data. A couple of common temporal transformations are taking the moving average, differencing, and detrending sessional components (Wei, 2006). Manual feature engineering can also be used to improve the performance of a model by providing it with an SME’s heuristic for interpreting the environment. While certain models are able to automatically generate features from raw data (e.g., deep neural networks), these approaches require large amounts of data to learn these complex relationships. SMEs can provide a curated set of features relevant to the task, allowing for the model to focus on the mapping from the heuristics to the outcome variable without having to learn the intermediate representation (Vedula, Ishii, & Hager, 2017; Uzuner, 2009; Kuhn & Johnson, 2019; Krajewski et al., 2009; Garla & Brandt, 2012). While potentially effective, manual feature engineering is a time-consuming and domain-specific process requiring SMEs to encode their knowledge, illustrating the continuing importance of SMEs in the development of new models even as AI tools are sought to replace them.

Machine Learning Methods for Scoring Simulations A wide breadth of machine learning models have been successfully applied for scoring simulations across diverse domains such as education, medical, and transportation settings (Anh, Nataraja, & Chauhan, 2020; Henderson et al., 2020; Beninger et al., 2021). The diversity of machine learning models is partly fueled by the No Free Lunch theorem (Wolpert & Macready, 1997), which states that there is no single

268

Human Factors in Simulation and Training

“best” model to use across all circumstances, requiring the researcher to explore multiple methods and tailor their solution to the structure that is unique to their problem. This makes it impossible to provide a simple prescription for the selection of a model, as it may depend upon a number of factors, including the volume of data available, the form that the data is represented in, the type and number of criterion variables (i.e., classification, regression), and the unique structure of the task. For example, the transformer method described in the previous section has been extremely successful when handling text data; however, it is not applicable to tasks without a sequential component (e.g., credit loans) or non-language tasks with a small sample size (i.e., cannot apply pre-trained models). An important aspect of modeling simulator data is that criterion will rarely be available for each time point collected within the simulation. Instead, criterion data will be intermittently gathered during a window of time, such as the task performed for the last 10 minutes of the simulation. This creates a discrepancy in structure between the three-dimensional feature data (event × time-window × feature) and the two-dimensional criterion data (event × criterion). The relationship between multiple time points of the feature data and a single criterion impacts both how the researcher should structure their data and what models they can use. Broadly, there are two approaches available: traditional machine learning methods can be used if the feature data is compressed along the time axis to create a summarized representation aligned with the criterion, or time series methods can be used that natively handle the problem.

Static Methods Using Summarized Representations The compression of the time series predictors from a three-dimensional structure (event × time series × predictors) to a two-dimensional summarized representation (event × time series) is typically accomplished using either simple summary statistics or manually crafted features. The predictive performance of the machine learning model is entirely dependent upon the quality of the summarized features, making it critical that the researcher thinks carefully about which statistics relate to the criterion of interest. For illustrative purposes, we will walk through this approach using three commonly employed supervised learning models: support vector machines (SVMs) (Cortes & Vapnik, 1995), random forests (Breiman, 2001), and deep learning models (Rumelhart, Hinton, & Williams, 1985). All methods can be easily found in many programming languages (e.g., R, Python, MATLAB). SVMs are a robust, non-probabilistic, linear classifier that finds the decision boundary that maximizes the distance between classes. SVMs bear a strong resemblance to logistic regression, as both create predictions based on a linear combination of features; however, this is with one notable difference: the objective being optimized. Logistic regression minimizes the negative log-likelihood of the data, providing a probabilistic interpretation of the likelihood of each sample belonging to a particular class. SVMs use the hinge loss with a regularization term to maximize the distance between classes, encouraging the model to not only differentiate between classes but also to do so confidently. Due to the regularized hinge

Scoring Simulations with Artificial Intelligence

269

loss, SVMs provide a robust and scalable method that produces sparse solutions (e.g., coefficients encouraged to be 0). Zepf et al. (2019) used an SVM to predict drowsy driving in a simulated driving environment based on features automatically extracted from an EEG using principal frequency bands. Mirichi et al. (2019) used an SVM to create an interpretable model for predicting expertise in a VR simulation of a subpial tumor resection. Random forests are an ensemble method constructed from a multitude of decision trees trained on random subsets of the data (Brieman, 2001). Decision trees are directed acyclic graphs that use simple binary rules (e.g., X < 15) to create predictions. Decision trees are able to map nonlinear structures; however, they have a tendency to overfit to the data and are very sensitive to outliers. Random forests address this limitation by combining many diverse decision trees to create a more stable, robust model. Beninger et al. (2021) used random forests, neural networks, and SVMs to predict inattention during driving in a simulated driving environment. They first preprocessed the data by lowering the sampling rate, sampling a 1-minute window of feature data before each event, normalizing the features within the window, and calculating summary statistics to flatten the data (e.g., minimum, maximum, median). In their evaluations, random forests outperformed the linear SVM and neural network. McDonald and colleagues (2014) used random forests to predict drowsy driving in a simulated driving environment based on features extracted from steering wheel motions. Neural networks are directed acyclic graphs composed of layers with multiple nodes and are a universal function approximator (Rumelhart, Hinton, & Williams, 1985). Neural networks’ extremely flexible structure has led to their widespread success and adoption, particularly on complex tasks with large amounts of data, such as text and image processing (Sun et al., 2017). The simplest neural network can be constructed from an input layer and an output layer with a single node and a linear activation function, which is the same structure as a linear regression. The most complex neural networks are composed of billions of parameters and thousands of layers (Wang et al., 2017; Devlin et al., 2019). Anh, Nataraja, and Chauhan (2020) demonstrated that deep neural networks were able to accurately assess surgical skill in suturing, knot tying, and needle passing. Richstone et al. (2010) used neural networks to predict expertise based on eye movements in simulated surgical environments.

Time Series Methods Time series methods directly model the three-dimensional predictor data, sidestepping the need for creating a two-dimensional summarized representation. These methods typically require stronger assumptions about the structure of the time series data or the use of complex and flexible frameworks. Multivariate autoregression is a probabilistic model that predicts the criterion based on a linear combination of multiple previous time points of the predictors. Autoregressive methods require the researcher to specify the temporal dependence of the model (i.e., model order); this is usually found during exploratory data analysis by identifying trends and seasonal

270

Human Factors in Simulation and Training

relationships in the data. Multivariate autoregression is an interpretable model which supports statistical inference; however, it does not natively support nonlinear relationships between features and criteria. Loukas and Georiou (2011) use multivariate autoregression to predict laparoscopic skills (i.e., knot tying and needle driving) during surgical training based on hand motions. Another commonly employed method is recurrent neural networks (RNN). RNNs model temporal data by recursively calling a node based on feature data at the current time-step and intermediate data from the previous time-step of the RNN node (Rumelhart, Hinton, & Williams, 1985). RNNs make no assumptions about the data and can model nonlinear relationships; however, they are uninterpretable, blackbox models that do not support inferential reasoning. RNNs can be difficult to train due to their recursive structure and struggle with long-term temporal dependencies, which led to the development of numerous variants. One of the most successful and well-known variants is long short-term memory (LSTM) networks inspired by the mechanics of human memory (Hochreiter & Schmidhuber, 1997). LSTMs modify RNNs by adding in the ability for the model to “forget” information, addressing the training stability issues of RNNs while allowing them to better model long-time dependencies. Hong and Wang (2020) used LSTMs to predict drowsy driving based on an individual’s facial and steering features. Nguyen et al. (2019) use a modified form of LSTMs to predict surgical skill levels based on their hand motions within a surgical simulation.

Applications With a growing body of literature concerning the collection and structuring of data, as well as the potential utility of various models, it is important to focus research attention on the pragmatic application of AI/machine learning techniques. To date, high-fidelity simulation (e.g., virtual reality) technology has seen wide adoption in military and medical applications, where environmental realism is worth the high price of technology. However, the last decade has seen a major expansion of virtual reality technology, including a boom in the gaming industry. Like all other technologies, improvements in hardware, a shared programming knowledgebase, and capital investments are making virtual reality technology less expensive and more accessible. Organizations are already using virtual reality for process planning and factory layout planning (Mujber, Szecsi, & Hashmi, 2004; Gong et al., 2019). Organizations can further use simulated environments to allow candidates to virtually tour a work facility with the ability to simulate daily activities and train new employees on the use of heavy equipment like forklifts and transport container cranes without risk of injury (Yuen, Choi, & Yang, 2010; Bruzzone & Longo, 2013; Choi, Ahn, & Seo, 2020). Meaningful translation of technical advances in computer science to meet the specific contextual needs of human factors research could have a transformative influence on the field. There is increasing need for continued research concerning appropriate model and feature selection in addition to a nuanced understanding of when a model becomes confident enough to provide meaningful and stable feedback

Scoring Simulations with Artificial Intelligence

271

to a user or trainee. Additionally, traditional human factors topics concerning feedback communication should see revitalized attention in the new context of computersimulated environments. The following sections highlight some important areas for consideration when adopting AI/machine learning techniques for applied settings.

Trainee Feedback The major benefit of using simulations and simulated environments is the opportunity to provide trainee feedback in a structured, high-fidelity, and safe environment. Feedback is a long-standing topic of research in human factors, providing an abundance of relevant literature while leaving substantial room for new insights. Traditional topics, such as the design and evaluation of warning signals (Wogalter, Conzola, & Smith-Jackson, 2002), have seen revitalized attention given these new applications and contexts. The following sections highlight a few of the upcoming and important topics that are well-suited for examination through a human factors lens. In general, AI/machine learning techniques have been criticized for lack of decision-making interpretability. Interpretability of model output has particular importance in providing feedback from training simulations (Mirichi et al., 2019). As we have discussed in this chapter, machine learning approaches are well-adapted for identifying meaningful patterns in data; however, deciphering the decisionmaking process of machine learning models remains difficult. For training feedback to be meaningful, trainees must understand why they received certain scores and how they can improve. Typical black-box approaches are not always well-suited for providing this level of feedback. Nonetheless, researchers have proposed means by which machine learning models can be used to provide meaningful feedback during and after simulations.

Early Prediction Real-time detection has already demonstrated budding utility for static diagnosis and training simulations, but it also has applications beyond training, such as in computer-enhanced surgical assistance (Thai et al., 2020). In all instances, it is important to consider when and how a system should begin providing feedback to a user or trainee; accordingly, there are two distinct avenues of consideration when deploying this technology. The first is a technical perspective on when a model becomes confident enough to make a stable prediction of a future event or state (i.e., early prediction and model uncertainty). The second is the psychological consideration of how feedback should be delivered to a user or trainee in real time. Early prediction specifically concerns confidence in prediction when working with temporal data. It concerns the question of when a model’s prediction confidence is high enough (or uncertainty low enough) to make a stable prediction. For example, in medical research, early prediction would concern a model’s ability to accurately detect an abnormality or disease at early stages of diagnosis or identify symptoms of early onset (e.g., Hsu & Holtz, 2019). Early prediction methods have

272

Human Factors in Simulation and Training

shown tremendous utility for enhancing diagnostics across diverse circumstances, including predicting circulatory failure (Hyland et al., 2020), sepsis shock (Lin et al., 2018), and diabetes (Alam et al., 2019).

Real-Time Feedback Advancements in early prediction pave the road for real-time feedback. Recent findings show tremendous potential for using motion data to provide meaningful feedback to trainees. For example, researchers have determined corollaries for motion inference in games-based settings and provided evidence that motion information could be used for early indications of events (Hart, Vaziri-Pashkam, & Mahadevan, 2020). A natural expansion of this research is the application of advanced machine learning/deep learning models that could infer dangerous motion and provide early warning indications. Researchers have used an adaptation of random forest and LSTM neural network models to create a real-time feedback tool for a temporal bone surgery simulator by identifying characteristics in drilling strokes that improved surgery performance (Ma et al., 2017a, 2017b). More abstractly, researchers have explored machine-learning approaches to predict early warning signs of critical transitions within dynamic systems (Lade & Gross, 2012; Füllsack, Kapeller, Plakolb, and Jäger, 2020). The concept of identifying predictors of critical transitions using machine learning models could have widespread applications for monitoring dynamic systems, such as predicting communication breakdowns in military squadrons or oncoming disequilibrium in a patient during operation. Early prediction coupled with meaningful feedback would be useful across a variety of training settings, including team communication and education. However, for real-time feedback to be meaningful, it must be effectively received and processed by a user or trainee. Computerized systems should facilitate integration of established best practices, such as the use of personalized warnings (Wogalter, Racicot, Kalsher, & Simpson, 1994), meaningful use of alarms (Edworthy & Hellier, 2006), integration of multisensory warning signals (Ho, Reed, & Spence, 2007; Baldwin et al., 2012), and ensuring that warnings are maximally informative (Fagerlönn & Alm, 2010). Meaningful application of AI to training simulations will require an interdisciplinary perspective that can translate powerful analytics into products with pragmatic value.

Adaptive Simulations AI’s potential to produce real-time prediction and feedback will also enable advancements in adaptive simulations. The goal of AI-enhanced adaptive simulations would be to further increase the fidelity and learning opportunities within simulated training environments. Perhaps the most straightforward application is to adapt the difficulty of a simulation to create additional challenges when trainees are performing well. In addition to adaptive difficulty, simulations could include adaptive scenarios for dangerous events such as hydroplaning, losing control of an object when using machinery like cranes and forklifts, and emergency medical scenarios during

Scoring Simulations with Artificial Intelligence

273

surgery. Allowing simulations to adaptively introduce these events, especially in connection to real-time data about the environmental state, would increase simulation fidelity and better prepare trainees for low-frequency but high-risk events. A recent review of adaptive simulations found that the most common simulation adaptation was adjustment to difficulty, such as adjustment to speed or resistance in rehabilitation exercises (Zahabi & Razak, 2020). The study also found that most simulations with adaptive content did not provide adaptive real-time feedback. In our discussion, we express the need to provide adaptation of controlled elements in the environment and the usefulness of adaptive feedback. Achieving these goals will require that AI gain sufficient knowledge about current states, desired states, and future states and be able to process this information quickly and efficiently. Codifying the knowledge of when an action should have been performed, and then adaptively providing feedback to redirect toward the desired state, is not a task that can be easily programmed from a flat representation of actions, especially when moving beyond simple motion data. Furthermore, as it pertains to the application of such technologies outside of training simulations, the implementation of AI image (video) information is much more difficult in applied settings than in simulations (Vedula et al., 2017).

CONCLUSION The implications of AI for society are immense, and such techniques are already being used to change how we think about and score simulations. This chapter has discussed domains where AI can help with two of the greatest challenges in scoring simulations: NLP models for handling unstructured text data and various other techniques for dealing with unstructured data such as event and motion data. While we framed the techniques around real-world examples of their use, it should be noted that we have not tried to be exhaustive in terms of the possible analytic techniques or provided enough information so a reader could jump to using the above analytics directly. Each of these techniques likely deserves an entire chapter devoted to realworld usage, but such detail would be far beyond the scope of this chapter. Our goal was to outline both the implications of these technologies for how we score simulations and exposure to the various analytics techniques that could be, and often already are being, used in the field to score simulations. The truth is that the best techniques at a given time are rapidly changing, and any guide or cookbook for how to use a technique will become dated quickly. So rather than focus on the specifics, we wanted to outline consistent themes and limitations that should be considered whenever AI is being used and describe what may be possible. A consistent theme across everything discussed in this chapter is that the importance of SMEs, and an understanding of the domain at hand, will continue to be of critical significance. While AI is capable of extraordinary things, getting the most from these tools requires collaboration between data scientists and domain-specific SMEs, as the model is only as effective as the quality of data and simulation being assessed. Broadly speaking, we see two primary areas where AI will forever change how we score simulations. One will be the automation of scoring unstructured text

274

Human Factors in Simulation and Training

data that previously would require highly trained human raters, and the other would be new sources of data too complex for humans to process. We argue both areas have promising futures in the realm of simulations, whether for training or evaluative purposes. Lastly, we believe AI will create a golden age in the use of simulations as we are able to create models that help simulations become more accurate, scalable, and comprehensive.

REFERENCES Ahmidi, N., Poddar, P., Jones, J. D., Vedula, S. S., Ishii, L., Hager, G. D., & Ishii, M. (2015). Automated objective surgical skill assessment in the operating room from unstructured tool motion in septoplasty. International Journal of Computer Assisted Radiology and Surgery, 10(6), 981–991. Alam, T. M., Iqbal, M. A., Ali, Y., Wahab, A., Ijaz, S., Baig, T. I., Hussain, A., Malik, M., Raza, M., Ibrar, S., & Abbas, Z. (2019). A model for early prediction of diabetes. Informatics in Medicine Unlocked, 16, 1–6. Anh, N. X., Nataraja, R. M., & Chauhan, S. (2020). Towards near real-time assessment of surgical skills: A comparison of feature extraction techniques. Computer Methods and Programs in Biomedicine, 187, 105234. Baldwin, C. L., Spence, C., Bliss, J., Brill, C., Wogalter, M., Mayhorn, C., & Ferris, T. (2012). Multimodal cueing: The relative benefits of the auditory, visual, and tactile channels in complex environments. Proceedings of the Human Factors and Ergonomics Society. Beninger, J., Hamilton-Wright, A., Walker, H. E., & Trick, L. M. (2021). Machine learning techniques to identify mind-wandering and predict hazard response time in fully immersive driving simulation. Soft Computing, 25(2), 1239–1247. Bernardin, H. J., & Buckely, M. R. (1981). Strategies in rater training. Academy of Management Review, 6, 205–212. Boyle, W. A., Murray, D. J., Beyatte, M. B., Knittel, J. G., Kerby, P. W., Woodhouse, J., & Boulet, J. R. (2018). Simulation-based assessment of critical care ‘front-line’ providers. Critical Care Medicine, 46(6), e516. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models as few shot learners. Arxiv: https://arxiv​.org​/pdf​/2005​.14165​.pdf Bruzzone, A., & Longo, F. (2013). 3D simulation as training tool in container terminals: The TRAINPORTS simulator. Journal of Manufacturing Systems, 32, 85–98. Choi, M., Ahn, S., & Seo, J. (2020). VR-based investigation of forklift operator situation awareness for preventing collision accidents. Accident Analysis and Prevention, 136, 1–9. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Human Language Technologies, 4171–4186. Minneapolis, MN. Edworthy, J., & Hellier, E. (2006). Alarms and human behaviour: Implications for medical alarms. British Journal of Anesthesia, 97(1), 12–17. Fagerlönn, J., & Alm, H. (2010). Auditory signs to support traffic awareness. IET Intelligent Transport Systems, 4(4), 262–269.

Scoring Simulations with Artificial Intelligence

275

Füllsack, M., Kapeller, M., Plakolb, S., & Jäger, G. (2020). Training LSTM-neural networks on early warning signals of declining cooperation in simulated repeated public good games. MethodsX, 7, 100920. Franz, L., Shrestha, Y. R., & Paudel, B. (2020). A deep learning pipeline for patient diagnosis prediction using electronic health records. BioKDD 2020: 19th International Workshop on Data Mining in Bioinformatics. San Diego, CA. Garla, V. N., & Brandt, C. (2012). Ontology-guided feature engineering for clinical text classification. Journal of Biomedical Informatics, 45(5), 992–998. Geden, M., Emerson, A., Rowe, J., Azevedo, R., & Lester, J. (2020). Predictive student modeling in educational games with multi-task learning. Proceedings of the AAAI Conference on Artificial Intelligence, 34(1), 654–661. Gibson, C., & Mumford, M. D. (2013). Evaluation, criticism, and creativity: Criticism content and effects on creative problem-solving. Psychology of Aesthetics, Creativity, and the Arts, 7, 314–331. Gong, L., Berglund, J., Berglund, A., Johansson, B., & Borjesson, T. (2019). Development of virtual reality support to factory layout planning. International Journal of Interactive Design and Manufacturing, 13, 935–945. Hall, S., & Brannick, M. T. (2008). Performance assessment in simulation. In D. A. Vincenzi, J. A. Wise, M. Mouloua, & P. A. Hancock (Eds.), Human factors in simulation and training (pp. 149–168). Boca Raton, FL: CRC Press. Harik, P., Clauser, B. E., Grabovsky, I., Nungester, R. J., Swanson, D., & Nandakumar, R. (2009). An examination of rater drift within a generalizability theory framework. Journal of Educational Measurement, 46, 43–58. Hart, Y., Vaziri-Pashkam, M., & Mahadevan, L. (2020). Early warning signals in motion inference. PLoS Computational Biology, 16(5), e1007821. Haugeland, J. (1985). Artificial intelligence: The very idea. Cambridge, MA. The MIT Press. Henderson, N., Kumaran, V., Min, W., Mott, B., Wu, Z., Boulden, D., … Lester, J. (2020). Enhancing student competency models for game-based learning with a hybrid stealth assessment framework. International Educational Data Mining Society, 13, 92–103. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation 9(8), 1735–1780. Ho, C., Reed, N., & Spence, C. (2007). Multisensory in-car warning signals for collision avoidance. Human Factors: The Journal of the Human Factors and Ergonomics Society, 49(6), 1107–1114. Hong, L., & Wang, X. (2020). Towards drowsiness driving detection based on multi-feature fusion and LSTM networks. International Conference on Control, Automation, Robotics and Vision,732–736. Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L. H., & Aerts, H. (2018). Artificial intelligence in radiology. Nature Reviews Cancer, 18, 500–510. Hsu, P., & Holtz, C. (2019). A comparison of machine learning tools for early prediction of sepsis from ICU data. 2019 Computing in Cardiology (CinC), 46, 1–4. Huang, K., Altosaar, J., & Ranganath, R. (2020). ClinicalBERT: Modeling clinical notes and predicting hospital readmission. Arxiv: https://arxiv​.org​/pdf​/1904​.05342​.pdf Hyland, S. L., Faltys, M., Hüser, M., Lyu, X., Gumbsch, T., Esteban, C., Bock, C., Horn, M., Moor, M., Rieck, B., Zimmermann, M., Bodenham, D., Borgwardt, K., Rätsch, G., & Merz, T. M. (2020). Early prediction of circulatory failure in the intensive care unit using machine learning. Nature Medicine, 26(3), 364–373. Krajewski, J., Sommer, D., Trutschel, U., Edwards, D., & Golz, M. (2009). Steering wheel behavior based estimation of fatigue. Proceedings of the International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, 118–124.

276

Human Factors in Simulation and Training

Kuhn, M., & Johnson, K. (2019). Feature engineering and selection: A practical approach for predictive models. Abingdon, UK: Taylor & Francis Group. Lade, S. J., & Gross, T. (2012). Early warning signals for critical transitions: A generalized modeling approach. PLoS Computational Biology, 8(2), e1002360. Loukas, C., & Georgiou, E. (2011). Multivariate autoregressive modeling of hand kinematics for laparoscopic skills assessment of surgical trainees. IEEE Transactions on Biomedical Engineering, 58(11), 3289–3297. Lin, C., Zhang, Y., Ivy, J., Capan, M., Arnold, R., Huddleston, J. M., & Chi, M. (2018). Early diagnosis and prediction of sepsis shock by combining static and dynamic information using convolutional-LSTM. Proceedings of the International Conference on Healthcare Informatics, 219–228. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. Arxiv: https://arxiv​.org​/pdf​/1907​.11692​.pdf Ma, X., Wijewickrema, S., Zhou, Y., Zhou, S., O’Leary, S., & Bailey, J. (2017). Providing effective real-time feedback in simulation-based surgical training. International Conference on Medical Image Computing and Computer-Assisted Intervention, 566–574. Ma, X., Wijewickrema, S., Zhou, Y., Zhou, S., Mhammedi, Z., O’Leary, S., & Bailey, J. (2017b). Adversarial generation of real-time feedback with neural networks for simulation-based training. Proceedings of the International Joint Conference on Artificial Intelligence, 3763–3769. McDonald, A. D., Lee, J. D., Schwarz, C., & Brown, T. L. (2014). Steering in a random forest: Ensemble learning for detecting drowsiness-related lane departures. Human Factors, 56(5), 986–998. Mills, J. T., Hougen, H. Y., Bitner, D., Krupski, T. L., & Schenkman, N. S. (2017). Does robotic surgical simulator performance correlate with surgical skill? Journal of Surgical Education, 74(6), 1052–1056. Min, W., Mott, B., Rowe, J., Taylor, R., Wiebe, E., Boyer, K., & Lester, J. (2017). Multimodal goal recognition in open-world digital games. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 13(1), 80–86. Mirichi, N., Bissonnette, V., Yilmaz, R., Ledwos, N., Winkler-Schwartz, A., & Del Maestro, R. (2019). The virtual operative assistant: An explainable artificial intelligence tool for simulation-based training in surgery and medicine. PLoS One, 15(2), 1–15. Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). Focus article: On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1(1), 3–62. Moore, G. E. (1965). Cramming more components onto integrated circuits. Electronics Magazine, 38(8), 114–117. Mosier, K. L., & Manzey, D. (2020). Humans and automated decision aids: A match made in heaven? In M. Mouloua & P. A. Hancock (Eds.), Human performance in automated and autonomous systems: Current theory and methods (pp. 19–42). Boca Raton: CRC Press. Motowidlo, S. J., Dunnette, M. D., & Carter, G. W. (1990). An alternative selection procedure: The low-fidelity simulation. Journal of Applied Psychology, 75, 640–647. Mouloua, M., & Hancock, P. (2020). Human performance in automated and autonomous systems: Current theory and methods. Boca Raton: CRC Press. Mracek, D. L., Peterson, N., Barsa, A., & Koenig, N. (2021). DEEP*O*NET: A neural network approach to leveraging detailed text descriptions of the world of work. In K. Nei (Chair), Demonstrating natural language processing for improving job analysis. Symposium conducted at the meeting of the Society for Industrial/Organizational Psychology, New Orleans, LA.

Scoring Simulations with Artificial Intelligence

277

Mujber, T., Szecsi, T., & Hashmi, M. (2004). Virtual reality applications in manufacturing process simulation. Journal of Materials Processing Technology, 155–156, 1834–1838. Nguyen, X. A., Ljuhar, D., Pacilli, M., Nataraja, R. M., & Chauhan, S. (2019). Surgical skill levels: Classification and analysis using deep neural network model and motion signals. Computer Methods and Programs in Biomedicine, 177, 1–8. Norman, G. R., Grierson, L. E. M., Sherbino, J., Hamstra, S. J., Schmidt, H. G., & Mamede, S. (2018). Expertise in medicine and surgery. In K. A. Ericsson, R. R. Hoffman, A. Kozbelt, & A. M. Williams (Eds.), Cambridge handbooks in psychology: The Cambridge handbook of expertise and expert performance (pp. 331–355). Cambridge: Cambridge University Press. Oquendo, Y. A., Riddle, E. W., Hiller, D., Blinman, T. A., & Kuchenbecker, K. J. (2018). Automatically rating trainee skill at a pediatric laparoscopic suturing task. Surgical Endoscopy, 32(4), 1840–1857. Parasuraman, R., & Mouloua, M. (1996). Automation and human performance: Theory and applications. New York: CRC Press. Poola, I. (2017). How artificial intelligence is impacting real life every day. International Journal of Advance Research, Ideas and Innovations in Technology, 2., 96–100. Richstone, L., Schwartz, M. J., Seideman, C., Cadeddu, J., Marshall, S., & Kavoussi, L. R. (2010). Eye metrics as an objective assessment of surgical skill. Annals of Surgery, 252(1), 177–182. Roch, S. G., Woehr, D. J., Mishra, V., & Kieszczynska, U. (2012). Rater training revisited: An updated meta-analytic review of frame-of-reference training. Journal of Occupational and Organizational Psychology, 85, 370–395. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation. California Univ San Diego La Jolla Inst for Cognitive Science. Cambridge, MA: Bradford Books/MIT Press. Rust, R. T., & Huang, M. (2014). The service revolution and transformation of marketing science. Marketing Science, 33, 206–221. Ryman-Tubb, N. F., Krause, P., & Garn, W. (2018). How artificial intelligence and machine learning research impacts payment card fraud detection: A survey and industry benchmark. Engineering Applications of Artificial Intelligence, 76, 130–157. Salkowski, L. R., & Russ, R. (2018). Cognitive processing differences in experts and novices when correlating anatomy and cross-sectional imaging. Journal of Medical Imaging, 5(3), 031411. Schleicher, D. J., Day, D. V., Mayes, B. T., & Riggio, R. E. (1999). A new frame for frameof-reference training: Enhancing the construct validity of assessment centers. Paper presented at the annual conference of the Society for Industrial and Organizational Psychology, Atlanta, GA. Schwab, K. (2017). The fourth industrial revolution. New York: World Economic Forum. Schwarting, W., Alonso-Mora, J., & Rus, D. (2018). Planning and decision-making for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems, 1, 187–210. Sun, C., Shrivastava, A., Singh, S., & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. Proceedings of the IEEE International Conference on Computer Vision, 843–852. Sydell, E., Ferrell, J., Carpenter, J., Frost, C., & Brodbeck, C. C. (2013). Simulation scoring. In M. Fetzer & K. Tuzinski (Eds.), Simulations for personnel selection (pp. 83–107). New York, NY: Springer. Thai, M., Phan, P., Hoang, T., Wong, S., Lovell, N., & Do, T. (2020). Advanced intelligent systems for surgical robotics. Advanced Intelligent Systems, 2, 1–33.

278

Human Factors in Simulation and Training

Thomson, D. R., Besner, D., & Smilek, D. (2015). A resource-control account of sustained attention: Evidence from mind-wandering and vigilance paradigms. Perspectives on Psychological Science, 10(1), 82–96. Thompson, I., Koenig, N., Mracek, D. L., & Tonidandel, S. (forthcoming). Integrating deep learning and measurement science: Automating the subject matter expertise used to evaluate candidate work samples. Journal of Applied Psychology. Tonidandel, S., Thompson, I. B., Mracek, D. L., & Koenig, N. (2020). Automating subject matter expertise used to evaluate candidate work samples. In E. Campion & M. Campion, (Chairs). The Construct Validity of Computer-Assisted Text Analysis (CATA). Symposium conducted at the annual conference of the Society for Industrial and Organizational Psychology, Austin, TX. Uzuner, Ö. (2009). Recognizing obesity and comorbidities in sparse data. Journal of the American Medical Informatics Association, 16(4), 561–570. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomex, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing, 6000–6010. Vedula, S. S., Ishii, M., & Hager, G. D. (2017). Objective assessment of surgical technical skill and competency in the operating room. Annual Review of Biomedical Engineering, 19, 301–325. Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., … Tang, X. (2017). Residual attention network for image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3156–3164. Wei, W. W. (2006). Time series analysis. In T. D. Little (Ed.), The Oxford handbook of quantitative methods in psychology: Vol. 2. Oxford, UK: Oxford University Press. Wickens, C. D., Hollands, J. G., Banbury, S., & Parasuraman, R. (2016). Engineering psychology and human performance (4th ed). New York: Routledge. Wogalter, M., Conzola, V., & Smith-Jackson, T. (2002). Research-based guidelines for warning design and evaluation. Applied Ergonomics, 33(3), 219–230. Wogalter, M., Racicot, B., Kalsher, M., & Simpson, S., (1994). Personalization of warning signs: The role of perceived relevance on behavioral compliance. International Journal of Industrial Ergonomics, 14(3), 233–242. Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82. Yuen, K., Choi, S., & Yang, X. (2010). A full-immersive CAVE-based VR simulation system of forklift truck operations for safety training. Computer-Aided Design & Applications, 7(2), 235–245. Zahabi, M., & Razak, A. M. A. (2020). Adaptive virtual reality-based training: A systematic literature review and framework. Virtual Reality, 24, 725–752. Zepf, S., Stracke, T., Schmitt, A., van de Camp, F., & Beyerer, J. (2019, December). Towards real-time detection and mitigation of driver frustration using SVM. Proceedings of the International Conference on Machine Learning and Applications, 202–209. Zhang, Y., Jin, R., & Zhou, Z. (2010). Understanding bag-of-words model: A statistical framework. International Journal of Machine Learning and Cybernetics, 1, 43–52.

12

Dissecting the Neurodynamics of the Pauses and Uncertainties of Healthcare Teams Ronald Stevens, Trysha Galloway, and Ann Willemsen-Dunlap

CONTENTS Introduction............................................................................................................. 279 The Significance of Structure in EEG Amplitudes................................................. 282 Neurodynamic Correlates of Uncertainty............................................................... 285 Estimating the Frequency Magnitude and Duration of Uncertainty....................... 286 Augmenting Debriefings with Neurodynamics...................................................... 289 Using Neurodynamic Analyses to Train the Trainers............................................. 290 Early Novices................................................................................................. 291 Later Novices................................................................................................. 292 Evolving the Technology................................................................... 292 Summary................................................................................................................. 292 References............................................................................................................... 293

INTRODUCTION Simulation is widely accepted as an important educational tool for healthcare professionals in training and practice, and evidence for its positive impact on patient safety is growing (Schmidt et al., 2013). High-fidelity simulations provide opportunities for students to dynamically integrate their static knowledge, for professionals to acquire and maintain their professional skills, and for high-stakes testing (Lateef, 2010). Clinical simulations of complex and evolving patient conditions often occur in high-fidelity environments using realistic mannequins and standardized participants, a degree of fidelity not always available to smaller healthcare facilities, therefore making it difficult for them to achieve the research-reported benefits of simulation training (Goodwin et al., 2021). Today, virtual reality (VR) and augmented reality (AR) have arrived on the simulation scene and they are here to stay as they provide supplements/alternatives to DOI: 10.1201/9781003401360-12

279

280

Human Factors in Simulation and Training

mannequin-based simulations (Creutzfeldt et al., 2016). For instance, VR is providing opportunities for surgeons to visualize complex operations and surgical treatment options both preoperatively and intraoperatively where they can make mistakes and learn from them but in an environment where there is no risk to the patient. Hospitals have also experimented with VR during the informed consent process as it allows patients and their families to virtually walk through their surgical plan, stepping into their own anatomy and diagnosis as they gain the deep understanding and trust needed to make complex medical decisions (Surgical Theater, 2021). Simulations both support and are supported by a variety of stakeholders, each with a need to understand different facets of the simulation process. Professionals in training are generally thought of as the primary learners, but as shown above, patients can also benefit from VR simulations. Another stakeholder would be the training facilitators who support the learning of trainees by probing and identifying deficits in knowledge, attitudes, and skills that arise during debriefings as well as help trainees form new abstract concepts and generalizations (Maytin et al., 2015). In parallel, this group of simulation educators might also draw on their debriefing experiences and use them to train new facilitators or use their skills to design new simulations to focus on unaddressed knowledge gaps. Finally, program directors in charge of residency programs invest a significant amount of time and resources in the recruitment process, as well as maintaining efficiency and cost-effectiveness. Use of VR facility tours as an alternative to in-person tours of affiliate training facilities during a residency interview day is a viable and innovative option that can save time and money and favorably impact the applicant’s impression of the program (Zertuche et al., 2020). The diversity of stakeholders, and their need for understanding different levels of simulation processes, argue for automated systems that will provide rapid, objective, and quantitative representations of performance information that provide educationally meaningful evidence of short- and long-term learning tailored to the needs of the different stakeholders. Such systems built around cognitive frameworks would be applicable across content domains as well as simulation platforms created and delivered by established methods or by the emerging AR, VR, mixed reality (MR), or generative immersive scenario testbed (GIST) technologies. A challenge for conceptualizing and delivering this next generation of capabilities is that the dynamics of learning during simulation training or even making estimates of when that learning might be occurring are poorly understood. Sometimes, the best that can be said is that learning likely occurred during the simulation, and perhaps more so during the debriefing. At the neuronal level, however, it is increasingly apparent that learning is driven by unexpected events, i.e., those that cause uncertainty (Gillon et al., 2021). This chapter describes the progress our lab has made in developing increasingly understandable neurodynamic models that analyze the brainwaves of individuals and teams in real time, and report the frequency, magnitude, and duration of neural correlates of uncertainty. These neurodynamic measures and models quantitatively scale from neurons to teams providing the chance to report training gaps and performance levels to stakeholders throughout the healthcare simulation community

Neurodynamics of the Pauses and Uncertainties of Healthcare Teams

281

whether they are the trainees, learning facilitators, simulation designers, or program directors. The challenge in developing performance-based evaluations using neural measures is not with the EEG measures themselves. Since the discovery of brainwaves, many measures have been developed using EEG, i.e., the frequency, amplitude (power), phase, complexity, scalp topology, ERPs, etc. An equally large number of methods have been developed for collecting, preprocessing, and modeling the measures in real time (Delorme & Makeig, 2004; Mullen et al., 2015; Bigdely-Shamlo et al., 2015; Oostenveld et al., 2011). The challenge of broad-scale neural modeling has been that bottom-up analytic approaches rapidly become complicated as most low-level neural processes are not in themselves directly causal to team performance but instead are the result of everyday cognitive activities that support seeing, listening, decision making, etc. It is when these activities are transiently amplified or modified by the context that they assume greater importance for understanding teamwork. Higher-level representations of neural processes are needed where modifications to and amplifications of the micro-scale dynamics are allowed to change freely, while still providing the “best fit” (i.e., more stable) functional approximations for higherlevel activities; in other words, abstract representations that have a basis in mechanism but where many of the micro-details don’t need specifying (Flack, 2017a). Information is one such abstraction. It has been proposed that biological systems, like teams, are hierarchies of information that are functionally organized across spatial and time scales (Flack, 2017). Uncertainty is the messenger on this hierarchy guiding information back and forth between the environment and the team (Flack et al., 2012), with ripples and islands in these information streams representing periods of changing organization (see Stevens et al., 2013 for team examples). This changing information helps the brain identify statistical regularities in the environment and use them to shape adaptations along the macroscopic and microscopic continuum of experience and learning (Daniels et al., 2017). We have proposed that the informational structure of EEG rhythms might be a candidate representation for bridging the micro–macro scale gap as they have a basis in organization, not power or phase, and may be more likely to align with processes responsible for observable macro-scale organizations and behaviors. Scalp brainwaves are defined by the amplitudes of EEG-determined frequencies and their phase with other brainwaves. Changes in EEG amplitude are seen for all frequencies in the 1–40 Hz EEG spectrum, and observable cognitive behaviors are increasingly being linked with these (short-lived) changes in different EEG frequencies and amplitudes. For instance, the meaning of EEG power is important. Alpha band oscillations (8–12 Hz) emphasize different functional priorities depending on whether their states are synchronized (aka activated or high power) or desynchronized (aka deactivated or low power). Low-power (i.e., often also called suppressed) states are seen during attentive reading (Lachaux et al., 2008) and tend to favor new memory encodings. Higher alpha power states may transiently suppress gamma rhythms and help protect the contents of working memory from being disturbed, thereby enhancing retention (Klimesch et al., 2006; Klimesch, 2012; Wianda &

282

Human Factors in Simulation and Training

Ross, 2019; Ossandon et al., 2011; Bonnefond & Jensen, 2015). Similar considerations might apply to (32–40 Hz) gamma waves (Sedley & Cunningham, 2013) or delta waves (1–4 Hz), which show an increase in power during the onset of fatigue, which decrease following challenge interruptions (Bodala et al., 2018).

THE SIGNIFICANCE OF STRUCTURE IN EEG AMPLITUDES Detecting structure in data streams involves first deconstructing continuous data into discrete symbols. From the different cognitive activities described above, EEG amplitudes might exist in activated, deactivated, or neutral states, which could be assigned any three symbols that were easy to visualize. In our studies, activated states are assigned “3,” deactivated states are assigned “–1,” and neutral states are assigned “1.” So during a second when the EEG amplitude was high, the symbol for that second would be 3; if it were low, the symbol would be –1 and 1 if in an average power state. The result is a data stream of 3’s, 1’s, and –1’s. The number of states to divide the EEG amplitude into each second involves a trade-off between resolution and computation; dividing the amplitude each second into six or nine states might increase the resolution, but would slow the modeling. Figure 12.1 shows a team of two persons where the EEG amplitudes were separated into three states each second (Figure 12.1A), or six states (Figure 12.1B), or nine states (Figure 12.1C). For Figure 12.1A, since there are two persons and three symbols in each person’s data stream, the team data stream would have nine symbols. The temporal structure (not power) in this data stream can be estimated each second by measuring the mix of the nine symbols in a 60 s segment that slides over the data and is updated each second. If only one of the nine symbols was expressed in this 60 s segment, the entropy would be 0 bits; if there was an equal mix of the nine symbols, then the entropy would be 3.17 bits, which is the maximum. So, the fewer the symbols expressed in a window of 60 s, the more organized the team and the lower the entropy. Since it could be confusing having lower entropy mean higher organizations, the data plots are made more intuitive by calculating the neurodynamic information (NI), which is the bits of information when entropy values are subtracted from the maximum entropy for the number of unique symbols. So, an entropy value of 2.87 would have a NI level of 3.17 minus 2.87, or 0.4 bits. The resulting NI profile is shown to the right and the average NI of the team’s performance was 0.16 bits. Similar calculations can be made when the amplitude was separated into six or nine states (Figure 12.1B and C). Although the NI values increased with additional symbols in each group, the NI profiles were similar, indicating that adding additional symbols had little effect on the dynamical structure of the data; for most studies we separate EEG data into three categories. A general, and perhaps more important, point from this figure is that symbolically analyzing the structure of EEG amplitude creates a quantitative and bounded scale of EEG organizations. If there is no organization to the EEG data stream, the NI would be 0. If the EEG was maximally organized, i.e., all a single symbol, the NI would be the maximum for the number of symbols in the system, i.e., 4.75 bits

Neurodynamics of the Pauses and Uncertainties of Healthcare Teams

283

FIGURE 12.1  Symbolic modeling of neurodynamic data. The EEG was collected from two team members (TM1 and TM2) and each second the scalp averaged EEG amplitude was separated into three (A), six (B), or nine (C) divisions. The NI was calculated for the three models using a 60 s moving window that was updated each second.

for a 27-symbol three-person team, 3.17 bits for a 9-symbol dyad, or 1.59 bits for an individual. These mathematical limits have implications for creating quantitative performance measures. It means that the neurodynamic information of any team of two persons who are performing any task where the EEG is separated into three levels will have NI levels between 0 and 3.17 bits. The average value of 0.16 bits for the team in Figure 12.1B is a value that can be quantitatively compared with other team performances, and these can be aggregated for a class of trainees, or used to compare one training protocol to another. If the average NI for a team member is calculated, this value can be quantitatively compared with that of other team members. It means that the neurodynamic organization of one brain region can be compared with that of another brain region and across the 1–40 Hz EEG spectrum. The same reasoning applies if the neurodynamic organization of a team is compared in the simulation scenario versus the debriefing, or across a critical healthcare event like intubation.

284

Human Factors in Simulation and Training

Such neurodynamic organizations contribute properties to the system not always possessed by the amplitude or phase of brainwaves alone. For instance, neurodynamic information has been shown to link with the organization of team activities (Stevens & Galloway, 2017) or speech (Gorman et al., 2016), or submarine (Stevens et al., 2017), or healthcare (Stevens & Galloway, 2021) team expertise. These options are explored in the next section of this chapter. There is one other important question that needs exploring first: Why do neurodynamic organizations correlate with these team activities, while the EEG power values (EEG-PV) from which the NI was calculated do not? This is because it is not the power levels per se in an EEG data stream that link with behavioral activities, but the information or degree of organization within the EEG-PV that is important for making this connection. The differences between the two are shown in Figure 12.2. Figure 12.2A shows the NI of a neurosurgeon with five to seven discrete peaks during the 1700 s of a simulated operation. The information profile was developed from a frequency-averaged EEG data stream from the T4 sensor using a 30 s window where the EEG power was first divided into equal numbers of three discrete segments; these were –1 for below average levels, 1 for average levels, and 3 for above average levels as described earlier. There were also parallel fluctuations in EEG-PV. If we use the symbols –1, 1, or 3 numerically, then equal numbers of these symbols over the performance would have an average of 1.0. As shown in Figure 12.1B, about half of the fluctuations were below the performance average power levels, some were above, and others were around the average. Whenever there were periods of persistent EEG-PV, either high

FIGURE 12.2  Neurodynamic information (NI) accumulates over time as teams experience periods of uncertainty. The EEG power values (EEG-PV) progress through periods of high and low power but don’t accumulate like the NI.

Neurodynamics of the Pauses and Uncertainties of Healthcare Teams

285

or low, there was elevated NI. The result was that the overall EEG power levels were 0.96, i.e., around the expected average, but the levels of NI when averaged for the performance were significantly elevated as the information is agnostic to the details of what is being organized, just that it is organized. A second interesting feature of NI versus EEG-PV was the strong link between periods of elevated NI and periods of uncertainty, which does not exist for EEG-PV (Stevens, Galloway, Halpin, & Willemsen-Dunlap, 2016; Stevens & Galloway, 2019).

NEURODYNAMIC CORRELATES OF UNCERTAINTY Uncertainty is a fundamental property of neural computation that helps us estimate the (perceived) state of our world. The brain uses this uncertainty to continually access memories (the past) to imagine future possibilities and the actions that would give the best outcomes, outcomes that might be orders of magnitude away in the future. Humans maintain low levels of uncertainty by operating in familiar environments and situations where well-rehearsed sequences of cognition can be exploited. As a result, we think and act in terms of chunks of several seconds up to a minute, which help streamline the moment-to-moment activities (Rumelhart, 1980; Schank & Abelson, 1977; Cooper & Shallice, 2000; Daw et al., 2006; Schneider & Logan, 2015). To the extent that the planning and execution of these routines meet the immediate task requirements, the future will be predictable, and so we avoid being surprised. Occasionally, unfamiliar environments or unexpected outcomes increase our uncertainty about what to do next. When this happens, the brain switches from exploiting past experiences to exploring new possible approaches (O’Reilly, 2013; Soltani & Izquierdo, 2019; Domenech et al., 2020). In professional settings, this exploratory uncertainty and the pauses and hesitations that it generates are often early indicators of deteriorating performance (Ott et al., 2018). Currently, there is no good way to predict how long the uncertainty will last. The ability to rapidly and quantitatively measure uncertainty would have implications for educational and training efforts by supporting in-progress corrections, or generating forecasts about future disruptions, or using identified periods of uncertainty to target reflective discussions about past actions. The links between elevated NI and uncertainty were shown to be common during short periods (~1 min) of verbalized uncertainty, or during submarine navigation, while the data needed to establish the submarine’s position was being collected and shared among the navigation team (Stevens & Galloway, 2017). The links were extended to include medical students, hospital anesthesiologists, and operating room staff (i.e., circulating nurse, scrub nurse, and neurosurgeon, etc.) during simulation training when they experienced difficulties while ventilating a patient or deciding a course of patient management (Stevens et al., 2016). While originally the elevated NI was described in the context of spoken uncertainty, these elevations more generally occur during stressful periods whether or not someone was speaking (Stevens et al., 2017). These associations were not a product

286

Human Factors in Simulation and Training

of the simulation environment; they were also seen in two neurosurgeons and an anesthesiologist during a live-patient surgery (Stevens et al., 2019). Finally, these linkages have been made more explicit by training artificial neural networks to recognize pattern variations in the NI peaks associated with verbalized uncertainty (Stevens & Galloway, 2019, 2021).

ESTIMATING THE FREQUENCY MAGNITUDE AND DURATION OF UNCERTAINTY The picture emerging is that as simulations (and real-world events) evolve, the neurodynamic information accumulates and the bits accumulated are a function of the frequency, magnitude, and duration of periods of uncertainty. More experienced teams accumulate less NI during a task by virtue of having fewer, smaller, and/ or shorter periods of uncertainty than less experienced teams. These features are present in healthcare, military, and pre-college teams and appear to be a general property of human performance (Stevens & Galloway, 2017, 2019, 2021). From a training and feedback perspective, important questions are: How frequently does uncertainty occur? What is the level of uncertainty? Where in the brain do elevated NI levels come from? How long will the uncertainty last? More practically, how can these estimates be used to improve the efficiency and effectiveness of training? The team neurodynamic profiles in Figures 12.3 and 12.4 provide an analytic context for approaching these questions. These figures show an experienced team that had previously worked together in the operating room. Brainwaves were collected from an anesthesiologist (AN), a circulating nurse (CN), a scrub nurse (SN), and a standardized participant (SP) acting in the role of the neurosurgeon. There was minimal talk during the first 11 min of the simulation, while the patient and suite were prepared for neurosurgery. There were occasional low-level peaks of NI, while the AN intubated the patient, the surgeon was gloved and gowned, and the patient draped. Just before the surgery began, signs of malignant hyperthermia (MH) were recognized, indicating the start of a life-threatening event that can occur in persons with a rare genetic condition following the inhalation of anesthetic gasses. Elevations in CO2 levels combined with an increase in heart rate as seen in Figure 12.3 at 680 s were the first signs of this life-threatening hypermetabolic event, and this triggered the AN to issue requests to the team such as to order chemistries, contact the pharmacy for Dantrolene (a muscle relaxant), and obtain ice for cooling the patient. The AN spoke the most during the next 16 min as she coordinated these efforts (Figure 12.3A). The AN also showed the greatest NI elevation at the onset of the MH event. The CN who was assisting the AN showed moderate NI levels until toward the end of the simulation when she prepared sodium bicarbonate and Dantrolene solutions for injection. The SN showed moderate NI while preparing the patient for surgery, and low NI levels for the remainder of the simulation. Across the four team members during the scenario, the NI levels were not significantly correlated (all p-values >0.16) (Table 12.1).

Neurodynamics of the Pauses and Uncertainties of Healthcare Teams

287

FIGURE 12.3  The scalp-averaged NI levels of an experienced medical team performing a simulated case of malignant hyperthermia. (A) Periods of speech are shown for the AN, CN, SN, the pharmacist, and standardized participant. (B) The segments highlight the major sections of patient management. (C, D, E) The individual NI profiles of the AN, CN, and SN are shown on the same scale.

To illustrate how the analysis can be moved to a lower and finer-grained level, scalp-wide NI levels were calculated for the AN using the 19 channels arranged from the front of the scalp toward the back (Figure 12.4A). The AN was selected for this analysis as she was responsible for directing the subsequent management of the patient (Figure 12.4). This composite map shows that elevated NI levels were not uniform over time or across the channels but varied with the evolving situation (highlighted in Figure 12.4B). This composite begins at 680 s when the elevated HR and CO2 levels were first mentioned. A small elevation of NI rapidly occurred followed by near baseline levels for ~30 s while the initial requests were made for chemistries and a cart supplied with sodium bicarbonate and the muscle relaxant Dantrolene. The NI then rapidly rose as the requests were addressed, with NI increases in the frontal (Fz), central (C3, Cz, C4), and temporal/parietal sensors (T4, Pz, P4). As described in more detail below, brain activities located below these channels are part of the default mode

288

Human Factors in Simulation and Training

FIGURE 12.4  (A) The NI values for the AN in Figure 12.3 are plotted for the 19 EEG channels arranged from the front of the scalp toward the back according to the 10/20 labeling system. The channels in the larger bold font are the sites for default mode network activity. (B) The sensor averaged NI values are annotated with key events during the simulation.

TABLE 12.1 NI Correlations for the Team, p-Values –0.16 AN

CN

ST

AN

1.00

CN ST

–0.24 –0.13

1.00 –0.12

1.00

SP

–0.14

–0.12

–0.25

SP

1.00

network (DMN), which is involved in planning, and the sensory-motor network involved in imagining and executing motor activities. The heterogeneity of brain regions involved suggests sub-goal exploration, which requires multitasking between cognitive processes as the solution path(s) for future sequential cognitive processes are established. This initial NI peak lasted ~100 s and included a smaller shoulder peak when the call was made to the pharmacy to request additional Dantrolene. These initial peaks would be considered prolonged as previous studies have shown that the NI duration of uncomplicated patient management averages around 40–45 s for healthcare simulations (Stevens & Galloway, 2021).

Neurodynamics of the Pauses and Uncertainties of Healthcare Teams

289

A major cognitive shift occurred around 1,000 s and the NI activity in the Fz, T4, and Pz channels decreased and NI activity in the Cz and P4 channels increased. These changes occurred when the lab results came back and when the team decided to set up an arterial line. This cognitive shift was accompanied by a slow NI decline toward baseline levels as the AN switched from planning the patient management to executing the plan. This performance illustrates dynamical features of NI that have practical significance for training. The elevated NI at different channels suggests that different computations are being performed in these brain regions, each with its own degree of uncertainty. The first question is why would these computations be important for training? Previous studies have shown that difficulties executing motor activities (i.e., intubating a patient or controlling an instrument) are often accompanied by elevated NI levels in the central region (C3, Cz, C4), which provide an indication of the difficulties they are having (Stevens & Galloway, 2017, 2019, 2021; Stevens et al., 2016; Stevens, Galloway & Willemsen-Dunlap, 2018). The AN in this simulation rapidly intubated the patient with only a minor NI elevation around 550–600 s. The current simulation also involved lengthy periods of mixing Dantrolene with saline prior to injection, and while the AN was not directly involved in the mixing, she was advising the CN and SN on the best procedure for doing so. Closely watching others perform a procedure can deactivate mu rhythms in the sensorimotor brain region, and as described previously, this would result in increased NI (Stevens & Galloway, 2021). The parallel sensor activations in the Fz (anterior medial prefrontal cortex), Pz (posterior cingulate cortex), and T4 (angular gyrus) sensors, also known as the default mode network (DMN), during the early planning stages (Figure 12.4) illustrate uncertainty in areas that are associated with collecting and analyzing complex data (Čeko et al., 2015). An emerging view of the brain shows that much of its activity is spent in maintaining accurate models of past events and estimating the sensory results that would occur if particular action(s) were taken based on these models (Hawkins, 2021). The above-mentioned distributed set of brain regions collectively termed the default mode network, has been proposed to be a network where prior intrinsic information that is continuously accumulated over seconds to minutes is melded with arriving extrinsic sensory information (Yeshurun et al., 2021). This temporally extended accumulated information is tied to the context of the stimulus, much like when you read a story you continuously link the current chapter with events and characters in the previous chapters. In this simulation, the absence of increased NI levels in elements of the DMN might actually be a cause for concern.

AUGMENTING DEBRIEFINGS WITH NEURODYNAMICS If simulations provide an opportunity for honing process skills, debriefings provide the opportunity for trainees to learn what they don’t know. Debriefing is considered a key element for transferring experiences in simulation-based education into learning

290

Human Factors in Simulation and Training

that can be applied to patient care. In order for this transformation to happen, experts agree that reflection, currently mediated through conversation, is necessary (Baker et al., 1997). Fanning and Gaba (2007) defined debriefing as “a facilitated or guided reflection in the cycle of experiential learning.” These facilitated conversations involve the group’s identification of significant teamwork and clinical events that serve as exemplars to guide learning and adapt current behaviors to future challenges. The literature on trainee simulation and debriefing has been evolving over the past 30 years (Lederman, 1992; Rudolph et al., 2006; Eppich & Cheng, 2015; Seelandt et al., 2021). Currently, useful frameworks, guidelines, and standards exist (International Association for Clinical Simulation & Learning Board of Directors, 2013); however, little is known about debriefing from a dynamical perspective that is linked to the most fundamental neural processes of learning. Debriefing frameworks have practical value, but provide few theoretical suggestions or quantitative data to accelerate reflective learning for simulation participants or the journey to competence for debriefers. There is a sense in the simulation and research communities that more dynamic models, objective metrics, and theories of debriefing are needed (Fanning & Gaba, 2007; Bowe et al., 2017), but empirical evidence about what these models might look like is sparse. Neurodynamic modeling as described in the first section of this chapter may help fill this gap, although perhaps not initially at the level of trainees. Successful adoption of new educational resources needs time and a full understanding by, in this case, the debriefers.

USING NEURODYNAMIC ANALYSES TO TRAIN THE TRAINERS The time required to achieve competency as a debriefer is variable, requiring the expertise of an experienced mentor. Factors such as understanding key principles, practice opportunities, self-reflection, and expert feedback are all important. Just as simulation and debriefing have the potential to standardize clinician education, we believe that a combined analysis of team NI and video, in conjunction with discussion and expert feedback, has the potential to standardize the formation of debriefers and decrease time to competency. As described earlier, NI may be thought of as streams of information capable of highlighting the frequency, magnitude, and duration of individual and team uncertainty. It is represented as bits of data. Figure 12.5A shows the NI profile for a single team member during the scenario of a simulation. The average NI for the performance was 0.085 bits. This NI level (dot in Figure 12.5B) is compared with 16 other prior performances and was slightly lower than the average of the other teams; i.e., this was a good performance. During the debriefing the instructor has just clicked on a peak with 0.25 bits of information and this triggers the playing of the team video starting 30 s before the peak and extending 30 s past the peak. While the highlighted peak is shown as an illustration, during practice the debriefer would likely want to view the final 200 s and try to surface from the

Neurodynamics of the Pauses and Uncertainties of Healthcare Teams

291

FIGURE 12.5  Sample dashboard for linking neurodynamics with team activities. (A) The NI profile of a team member participating in a simulation of malignant hyperthermia. (B) The scalp-average NI for 17 team members during simulation scenarios. (C) A video clip of the period highlighted by the dotted text box in Figure 12.4A.

participant what was happening during this period where there was elevated NI and little speech (Figure 12.3A). While Figure 12.5 demonstrates NI for a single individual, dashboards are also available for each member of the team in the simulation, much like Figure 12.3, and these can be generated in near real time. In the next section, we offer several use cases for the use of NI data, combined with video review, to understand teamwork and individuals’ contributions to the team in the context of debriefing.

Early Novices Early novice debriefers would watch complete recordings of simulations. Debriefing mentors would ask them to note segments they would debrief, their reasons for selecting each segment, and how they tie those segments to pre-defined learning objectives. Mentors would then review the novice debriefer’s responses with them and introduce NI data to explore the correlation between that data and actions observed during the performance video. The purpose of this is to identify portions of the simulation, based on both observational and quantitative data of team members’

292

Human Factors in Simulation and Training

uncertainty, for which they could then practice formulating debriefing questions. Discussion and feedback between debriefing mentor and novice on the segments chosen for debriefing, what the novice debriefer observed, the debriefing questions, and their relationship to learning objectives are crucial to making this approach successful. A talk-out-loud review in which the mentor describes key behaviors they observe in the video, its linkage to NI data, and questions they might ask in debriefing may be beneficial.

Later Novices Novice debriefers with a measure of experience could potentially benefit from a more advanced form of this exercise. This group would watch recordings in which simulation participants did not demonstrate neurodynamic uncertainty during scenarios that either showed potential or became critical due to inaction or incorrect action. Novice debriefers would be asked to formulate debriefing questions to determine whether participants’ gaps were around knowledge, skills, or attitudes. An akin review and discussion, including a talk-out-loud approach, is expected to be similarly beneficial. Similar exercises, conducted by trained, peer debriefing coaches, would be used by intermediate and advanced debriefers to further hone and extend their debriefing skills. Evolving the Technology Currently, the NI data for training debriefers contain six simulations with videos and NI profiles, which makes them well-suited as a training tool mentors can use to extend and refine the skills of debriefers at all skill levels. It is also reasonable to expect these static snapshots of NI to evolve into real-time debriefing aids displaying dashboards with the NI of each team member, easy time-stamping, and annotation by debriefers. Future iterations of such technology will improve the ability to provide reliable assessments capable of comparing individuals and teams across time, training programs, and team member composition.

SUMMARY In this chapter, we have illustrated a quantitative framework for parsing the neurodynamic correlates of a team’s uncertainty into those of the team members, and then to the brain regions where the uncertainty arose and then to the brain rhythms active during periods of uncertainty at those sites. The quantitative scale of neurodynamic information is bounded (meaning it can never be less than 0 and never more than the maximum information of the symbol system), and generates the ability to aggregate the bits of information and compare team members of teams, or different teams, or the effects of different learning experiences on team members and teams. In other words, neurodynamic information is a causal intermediate between low-level neural processes and the organizations that we recognize as important for teams, and it is one that tracks closely with the observed hesitations and pauses that we associate with uncertainty.

Neurodynamics of the Pauses and Uncertainties of Healthcare Teams

293

REFERENCES Baker, A. C., Jensen, P. J., & Kolb, D. A. (1997). In conversation: Transforming experience into learning. Simulation & Gaming, 28(1), 6–12. https://doi​.org​/10​.1177​/10468781972 81002 Bigdely-Shamlo, N., Mullen, T., Kothe, C., Su, K. M., & Robbins, K. A. (2015). The PREP pipeline: Standardized preprocessing for large-scale EEG analysis. Frontiers in Neuroinformatics, 9, B153. Bodala, I. P., Li, J., Thaor, M. V., & Al-Nashas, H. (2018). EEG and eye tracking demonstrate vigilance enhancement with challenge integration. Frontiers in Human Neuroscience, 10–273. https://doi​.org​/10​.3389​/fnhum​.2016​.00273 Bonnefond, M., & Jensen, O. (2015). Gamma activity coupled to alpha phase as a mechanism for top-down controlled gating. PLoS One, 10, e012866. Bowe, S. N., Johnson, K., & Puscas, L. (2017). Facilitation and debriefing in simulation education. Otolaryngologic Clinics of North America, 50(5), 989–1001. https://doi​.org​ /10​.1016​/j​.otc​.2017​.05​.009 Čeko, M., Gracely, J. L., Fitzcharles, M. A., Seminowicz, D. A., Schweinhardt, P., & Bushnell, M. C. (2015). Is a responsive default mode network required for successful working memory task performance? The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 35(33), 11595–11605. Cooper, R., & Shallice, T. (2000). Contention scheduling and the control of routine activities. Cognitive Neuropsychology, 17(4), 297–338. Creutzfeldt, J., Hedman, L., & Felländer-Tsai, L. (2016). Cardiopulmonary resuscitation training by avatars: A qualitative study of medical students’ experiences using a multiplayer virtual world. JMIR Serious Games, 4, e22. Daniels, B., Flack, J., & Krakauer, D. (2017). Dual coding theory explains biphasic collective computation in neural decision-making. Frontiers in Neuroscience 2017. https://doi​.org​ /10​.3389​/fnins​.2017​.00313 Daw, N. D., O’doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876–879. Delorme, A., & Makeig, S. (2004). EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods, 134(1), 9–21. Domenech, P., Rheims, S., & Koechlin, E. (2020). Neural mechanisms resolving exploitationexploration dilemmas in the medial prefrontal cortex. Science, 369, 1076. Eppich, W., & Cheng, A. (2015). Promoting Excellence and Reflective Learning in Simulation (PEARLS): Development and rationale for a blended approach to health care simulation debriefing. Simulation in Healthcare: Journal of the Society for Simulation in Healthcare, 10(2), 106–115. https://doi​.org​/10​.1097​/SIH​.0000000000000072 Fanning, R. M., & Gaba, D. M. (2007). The role of debriefing in simulation-based learning. Simulation in Healthcare: Journal of the Society for Simulation in Healthcare, 2(2), 115–125. https://doi​.org​/10​.1097​/SIH​.0b013e3180315539 Flack, J. (2017). From matter to life: Information and causality. In S. I. Walker, P. C. W. Davies, & G. F. R. Ellis (Eds.), Life’s information hierarchy (pp. 283–302). New York: Cambridge University Press. Flack, J. (2017a). Coarse-graining as a downward causation mechanism. Philosophical Transactions of the Royal Society A, 375, 20160338. Flack, J., Erwin, D., Elliot, T., & Krakauer, D. (2012). Timescales, symmetry, and uncertainty reduction in the origins of hierarchy in biological systems. In K. Sterelny, R. Joyce, B. Calcott, & B. Fraser (Eds.), Cooperation and its evolution (pp. 45–74). Cambridge, MA: MIT Press.

294

Human Factors in Simulation and Training

Gillon, C. J., Pina, J. E., Lecoq, J. A., Ahmed, R., Billeh, Y. N., Caldejon, S., ... & Zylberberg, J. (2021). Learning from unexpected events in the neocortical microcircuit. BioRxiv, 2021-01. Goodwin, C., Velasquez, E., Ross, J., Kueffer, A. M., Molefe, A. C., Modali, L., Bell, G., Delisle, M., & Hannenberg, A. A. (2021). Development of a novel and scalable simulation-based teamwork training model using within-group debriefing of observed video simulation. Joint Commission Journal on Quality and Patient Safety. S15537250(21)00035-0. Advance online publication. https://doi​.org​/10​.1016​/j​.jcjq​.2021​.02​ .006 Gorman, J. C., Martin, M. J., Dunbar, T. A., Stevens, R. H., Galloway, T. L., Amazeen, P. G., & Likens, A. D. (2016). Cross-level effects between neurophysiology and communication during team training. Human Factors, 58(1), 181–199. Hawkins, J. (2021). A thousand brains: A new theory of intelligence. New York: Basic Books. ISBN 1-5416-7581-9. https://www​.jumpsimulation​.org ​/research​-innovation ​/our​-blog ​/2016​/may​/the​-potential​- of​ -virtual​-reality​-in​-simulation International Association for Clinical Simulation & Learning Board of Directors. (2013). Standards of best practice: Simulation. Clinical Simulation in Nursing, 9(6, Supplement), 1–32. ISSN 1876-1399. https://doi​.org​/10​.1016​/j​.ecns​.2013​.05​.008 Klimesch, W. (2012). Alpha-band oscillations, attention and controlled access to stored information. Trends in Cognitive Sciences, 16(12), 606–617. Klimesch, W., Sauseng, P., & Hanslmayr, S. (2006). EEG alpha oscillations: The inhibitiontiming hypothesis. Brain Research Reviews, 53, 63–88. PMID 16887192 https://doi​.org​ /10​.1016​/j​.brainresrev​.2006​.06​.003 Lachaux, J. P., Jung, J., Dreher, J. C., Bertrand, O., Minotti, L., Hoffman, D., & Kahane, P. (2008). Silence is golden: Transient neural deactivation in the prefrontal cortex during attentive reading. Cerebral Cortex, 18, 443–450. Lateef, F. (2010). Simulation-based learning: Just like the real thing. Journal of Emergencies, Trauma, and Shock, 3(4), 348–352. https://doi​.org​/10​.4103​/0974​-2700​.70743 Lederman, L. C. (1992). Debriefing: Toward a systematic assessment of theory and practice. Simulation & Gaming, 23(2), 145–160. https://doi​.org​/10​.1177​/1046878192232003 Maytin, M., Daily, T. P., & Carillo, R. G. (2015). Virtual reality lead extraction as a method for training new physicians: A pilot study. Pacing and Clinical Electrophysiology, 38, 319–325. Mullen, T. R., Kothe, C. A., Chi, Y. M., Ojeda, A., Kerth, T., Makeig, S., Jung, T. P., & Cauwenberghs, G. (2015). Real-time neuroimaging and cognitive monitoring using wearable dry EEG. IEEE Transactions on Bio-Medical Engineering, 62(11), 2553–2567. Oostenveld, R., Fries, P., Maris, E., & Schoffelen, M. (2011). FieldTrip: Open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data. Computational Intelligence & Neuroscience, 2011, Article ID 156869, 9 pages. https:// doi​.org​/10​.1155​/2011​/156869. O’Rielly, T. X. (2013). Making predictions in a changing world: Inference, uncertainty and learning. Frontiers in Neuroscience, 7, 105. Ossandon, T., Jerbi, K., Vidal, J. R., Bayle, D. J., Henaff, M. A., Jung, J., Minotti, L., Bertrand, O., Kahane, P., & Lachaux, J. P. (2011). Transient suppression of broadband gamma power in the default mode network is correlated with task complexity and subject performance. Journal of Neuroscience, 31, 14521–14530. Ott, M., Schwartz, A., Goldsmith, M., Bordage, G., & Lingard, L. (2018). Resident hesitation in the operating room: Does uncertainty equal incompetence? Surgical Training, Medical Education, 52, 851–860.

Neurodynamics of the Pauses and Uncertainties of Healthcare Teams

295

Rudolph, J. W., Simon, R., Dufresne, R. L., & Raemer, D. B. (2006). There’s no such thing as “nonjudgmental” debriefing: A theory and method for debriefing with good judgment. Simulation in Healthcare: Journal of the Society for Simulation in Healthcare, 1(1), 49–55. https://doi​.org​/10​.1097​/01266021​-200600110​- 00006 Rumelhart, D. E. (1980). On evaluating story grammars. Cognitive Science, 4(3), 313–316. https://doi​.org​/10​.1207​/s15516709cog0403_5 Schank, R., & Abelson, R. (Eds.) (1977). Scripts, plans, goals, and understanding: An inquiry into human knowledge structures. Hillsdale, NJ: Lawrence Erlbaum Associates. Schmidt, E., Goldhaber-Fiebert, S. N., Ho, L. A., & McDonald, K. M. (2013). Simulation exercises as a patient safety strategy: A systematic review. Annals of Internal Medicine, 158(5 Pt 2), 426–432. https://doi​.org​/10​.7326​/0003​- 4819​-158​-5​-201303051​- 00010 Schneider, D. W., & Logan, G. D. (2015). Chunking away task-switch costs: A test of the chunk-point hypothesis. Psychonomic Bulletin & Review, 22, 884–889. Sedley, W., & Cunningham, M. O. (2013). Do cortical gamma oscillations promote or suppress perception? An under-asked question with an over-assumed answer. Frontiers in Human Neuroscience, 7, article 595. Seelandt, J. C., Walker, K., & Kolbe, M. (2021). “A debriefer must be neutral” and other debriefing myths: A systemic inquiry-based qualitative study of taken-for-granted beliefs about clinical post-event debriefing. Advances in Simulation (London, England), 6(1), 7. https://doi​.org​/10​.1186​/s41077​- 021Soltani, A., & Izquierdo, A. (2019). Adaptive learning under expected and unexpected uncertainty. Nature Reviews Neuroscience, 20, 435–544. Stevens, R., & Galloway, T. (2017). Are neurodynamic organizations a fundamental property of teamwork? Frontiers in Psychology, May, 2017. https://doi​.org​/10​.3389​/fpsyg​.2017​ .00644 Stevens, R., & Galloway, T. (2019). Teaching machines to recognize neurodynamic correlates of team and team member uncertainty. Journal of Cognitive Engineering and Decision Making, 13, 310–327. https://doi​.org​/10​.1177​%2F1555343419874569 Stevens, R., & Galloway, T. (2021). Parsing neurodynamic information streams to estimate the frequency, magnitude and duration of team uncertainty. Frontiers in Systems Neuroscience, 01 February 2021 Available from:// https://doi​.org​/10​.3389​/fnsys​.2021​ .606823 Stevens, R., Galloway, T., Halpin, D., & Willemsen-Dunlap, A. (2016). Healthcare teams neurodynamically reorganize when resolving uncertainty. Entropy, 18, 427. https://doi​ .org​/10​.3390​/e18120427 Stevens, R., Galloway, T., Lamb, J., Steed, R., & Lamb, C. (2017). Linking team neurodynamic organizations with observational ratings of team performance. In A. A. Von Davier, P. C. Kyllonen, & M. Zhu (Eds.), Innovative assessment of collaboration (pp. 315–330). Cham, Switzerland: Springer International Publishing. Stevens, R., Galloway, T., & Willemsen-Dunlap, A. (2018). Quantitative modeling of individual, shared and team neurodynamic information. Human Factors, 60, 1022–1034. Stevens, R., Galloway, T. L., & Willemsen-Dunlap, A. (2019). Advancing our understandings of healthcare team dynamics from the simulation room to the operating room: A neurodynamic perspective. Frontiers in Psychology, 10, 1660. https://doi​.org​/10​.3389​/ fpsyg​.2019​.01660 Stevens, R., Gorman, J. C., Amazeen, P., Likens, A., & Galloway, T. (2013). The organizational neurodynamics of teams. Nonlinear Dynamics, Psychology, and Life Sciences, 17(1), 67–86. Surgical Theater. (2021, April). Virtual reality for surgery: Precison XR. Retrieved April, 2021, from http://surgicaltheater​.net/

296

Human Factors in Simulation and Training

Wianda, E., & Ross, B. (2019). The roles of alpha oscillations in working memory retention. Brain and Behavior, 9, e01263. https://doi​.org​/10​.1002​.brb3​.1263 Yishurun, Y., Nguyen, M., & Hasson, U. (2021). The default mode network: Where the idiosyncratic self meets the shared social world. Nature Reviews Neuroscience, 22, 181–192. Zertuche, J. P., Connors, J., Scheinman, A., Kothari, N., & Wong, K. (2020). Using virtual reality as a replacement for hospital tours during residency interviews. Medical Education Online, 25(1), 1777066.

13

The Future of Simulation P. A. Hancock

CONTENTS Proem...................................................................................................................... 297 The Fundamental and Practical Reasons for Simulation........................................ 298 Simulations in the Past............................................................................................ 299 On Predicting the Future.........................................................................................302 The Practicalities of Simulation..............................................................................306 Simulation and Training..........................................................................................307 Discourse between Two Worlds..............................................................................308 Hybrid Simulation Worlds......................................................................................309 Assessing the Progress of Simulation Technologies............................................... 310 The Turing Test of Simulation................................................................................ 311 Supersimulation...................................................................................................... 312 The Moral Dimension of Simulation...................................................................... 312 A Philosophical Valediction.................................................................................... 313 Summary and Conclusion....................................................................................... 316 Acknowledgments................................................................................................... 317 References............................................................................................................... 317

PROEM I argue that, in the coming decades, the conception of simulation will undergo a metamorphosis as the fundamental assumptions about what constitutes simulation evolve under the driving force of progressive technological innovation. The primary stimulus for development will come from the need to explore all processes through which humans interact with technology. Such future interaction will find operators working on representations of task spaces, presented via diverse forms of sensory display (see Mouloua et al., 2003). As the linkage between these display representations and actual system configurations will be contingent solely upon the software connections—and given that the metaphor for representation will be judged by its operational effectiveness, not the degree to which it replicates the appearance of the actual system—the difference between what is simulation and what is actual operation will disappear. The definition of simulation in such circumstances will depend solely on whether the operator actually effects change in the real-world system or is alternatively using, evaluating, or training at the time on exactly the same display connected to an electronic surrogate. In multiple-operator, multiplesystem configurations, even this criterion will eventually fail to hold any permanent DOI: 10.1201/9781003401360-13

297

298

Human Factors in Simulation and Training

distinction because momentary control of the action affecting the system will be passed between individuals at different times. At this juncture, simulation will have passed the Turing test for reality. In systems where the only criterion difference between real and surrogate worlds is visual fidelity (conceived for enclosed control-room activities such as emergency response centers), it is feasible that visual projection capabilities will soon meet or exceed that which the eye encounters in the real world. It is to be anticipated that further progress in the visual realm will take us toward supersimulation, in which what can be seen in a computer-generated reality exceeds that which can be seen outside such a facility. Given our knowledge of the human visual system and the technical focus on improving visual graphical representations, we are now passing quickly into this evolutionary stage. Supersimulation in vision will stimulate the desire for supersimulation in all other sensory modalities, and in the foreseeable future, we shall pass the comparable Turing tests for all of the major sensory systems. At that juncture, we shall be incapable of distinguishing between a computer-generated and a physically generated world. Human factors will have a critical voice in contributing to and evaluating these developments because many of the barriers along this avenue of progress are composed of questions about human–system interaction. To create such supersimulated worlds, we shall want to know much more about cognitive and emotional capacities and individual variations in human abilities and attributes. Early improvements may well be seen in relation to working with handicapped individuals who will benefit most immediately from the blurring of reality and simulation. Some of these notions have been expressed in both science fiction and film media, and their aspirations tend to anticipate scientific progress. The fundamental problems that follow will then concern the very nature of experiential reality itself. This is a philosophical issue and is addressed at the end of this chapter, where I seek to show what constraints bind us to our present reality, and what forces may emerge to divorce us from what we have previously been pleased to call the “real world.” The intentions of a tool are what it does. A hammer intends to strike, a vise intends to hold fast, a lever intends to lift. They are what they are made for. But sometimes a tool may have other uses that you don’t know. Sometimes in doing what you intend, you also do what the (tool) intends without knowing (Pullman, 2000).

THE FUNDAMENTAL AND PRACTICAL REASONS FOR SIMULATION Similar to robotics and artificial intelligence (AI), the bedrock impetus behind simulation is its capacity to support the efforts of human beings to artificially recreate themselves, and to control the world around them. Unlike robotics and AI, simulation is neither necessarily anthropocentric nor fundamentally anthropomorphic. Nevertheless, simulation has, throughout its history, been focused largely on the creation and recreation of “real” environments, often under the mandate of either entertainment or more serious practical necessities. The most obvious stimulus, and therefore the source of support for simulation, will continue to be the interest in

The Future of Simulation

299

gaming and in the training of individuals and teams for subsequent performance in real and often dangerous situations. Consequently, in the immediate future, simulation is likely to continue on its present course and follow these established trends. One growing concern, however, will emanate from the need for researchers to explore complex dynamic processes in disciplines ranging from chemistry and physics to geology, mathematics, and medicine. In fact, virtually all areas where humans pursue understanding can and will benefit from the dynamic and malleable representations that simulation renders. This will be a burgeoning aspect of the world of applied simulation. Even in light of this growing range of applications, the fundamental motivation behind future simulation will remain essentially involved with our never-ending quest to attain mastery over that which we can presently perceive, but over which we cannot at present exert control (Hancock, 1997a). Given such traditional and emerging drives, such as the desire for control and a pragmatic need for exploration in surrogate or transferred environments, it is an exercise in both logic and imagination to distill the future of simulation.

SIMULATIONS IN THE PAST Those who want to see far into the future must first look well into the past. So, before we address what might be coming, a glimpse into the past can help us set our quest in motion. The first thing we need to recognize is that simulation is not a modern invention. Indeed, models and representations for practical employment have been around for many centuries. For example, the architectural model for San Petronio in Bologna, Italy, built in 1390, was 59 ft long and allowed people to walk inside to visualize what the finished building would be like. There is even evidence that Greek architects employed models in a similar manner some five centuries before the birth of Christ (see King, 2001). Clearly then, modeling and simulation are nothing new. Dependent upon the degree to which we let our definitional boundaries of the word “simulation” dissolve, we can even include artifacts such as religious icons as representations or simulations. One particularly interesting example can be seen in some of the great European Gothic cathedrals where, on the floor of the nave, was inlaid a maze, the one remaining at Chartres in northern France being an outstanding example (Figure 13.1). The primary purpose of these constructions was to act as surrogates for pilgrimages to Jerusalem. For those too old, too sick, too poor, or too pressed for time to accomplish the actual journey, the maze allowed completion of the journey symbolically. Eventually, of course, we shall recover this symbolic aspect for our more technically replete simulations. The true origin of modern technical simulation may be attributed to Brunelleschi’s demonstration of perspective. A Florentine architect, Filippo Brunelleschi (1377– 1446) is best known for his construction of the dome of Santa Maria del Fiore, which remains the visual icon representative of Florence to the present day. Prior to Brunelleschis interest and demonstration in perspective, medieval painting was predominantly nonrepresentational and, to us, a strange mixture of two-dimensional and fractal depictions (and see the discussion of the use of the camera obscura technique by Hockney, 2001). Brunelleschi’s concern was not expressed in terms of the

300

Human Factors in Simulation and Training

FIGURE 13.1  The maze at Chartres Cathedral, one of the few remaining ones. It was used as a surrogate representation for symbolic pilgrimages to the Holy Land. An early “simulation,” its validity depended upon the credence ascribed by the user. Our future simulations will also eventually embrace this functional symbolism.

word “simulation,” but this was exactly what he achieved. With respect to the object of simulation, he chose one of the city’s most famous sites, one which would be immediately recognizable to his fellow Florentines—the Baptistry of San Giovanni. Standing just inside the door of the Cathedral of Santa Maria del Fiore (with which his name would forever be linked), Brunelleschi painted a small panel of the baptistry outside in the correct perspective, with the cathedral doorway as a frame. Using the vanishing point as his reference location, he created a small hole in the panel. Thus, Brunelleschi could replicate exactly what individuals would see when they looked out of the door, by replacing the real scene with a static painted representation. This element of the simulation worked excellently. However, Brunelleschi faced the problem of simulation dynamics. It was acceptable that people did not appear in his painted scene; after all, the locale was not always populated with pedestrians. But what of the sky? Although there might be no moving terrestrial objects, were not the clouds always moving? Brunelleschi came up with an ingenious solution, which was characteristic of his highly inventive capacities. He solved the problem by making the display a hybrid one. The top part of the display panel was covered with a mirrorlike surface which reflected the actual sky above—surely one of the first examples of mixed reality (see Figure 13.2). The illusion was so convincing that it changed how people (and particularly how artists) represented their world (see also Hockney,

The Future of Simulation

301

FIGURE 13.2  Illustration of Brunelleschi’s mixed reality simulation. This illustration of the middle 1400s shows the situation as it would appear to the observer. At the center top of the door, one can see the eye-point and the hole through which the observer views the scene. In the mirror that the right hand is holding, one can see the painted panel secured by the fingers of the left hand. The mirror, in this case, shown as a circular one, can be temporarily removed and the observer then sees the real scene, shown in the figure at the edges of the mirror, for example, where the chin appears. The painted panel can be repositioned over the real scene to provide the artist’s surrogate. The panel surface above the painted building is a reflective one so as to allow the sky to change dynamically and further complete the illusion. Arguably the origin of formal simulation, this was a magical wonder when first created. (Illustration by Lauriann Jones.)

2001). Today, we should applaud this inventor as perhaps the spiritual godfather of simulation. With the impact of Brunelleschi’s demonstration, it was little wonder that subsequent competitions awarding commissions to construct specific parts of the Cathedral of Santa Maria del Fiore required the proposer to produce a model so that the city fathers could judge between different conceptions. One has the feeling that Filippo Brunelleschi would have been very happy at today’s computer-aided design (CAD) station, visualizing the innovations he is now recognized for. But his unique simulation was for just this one fixed scene, and if any of the circumstances changed, such as the observer turning around, the illusion was immediately lost. The freedom to explore completely simulated worlds had to wait hundreds of years, until the present era. Indeed, only recently have we arrived at the point where we can now prognosticate the future of simulation.

302

Human Factors in Simulation and Training

ON PREDICTING THE FUTURE Predicting the future has many advantages. First, many of the developments that we are bound to see are direct, linear extrapolations of currently existing trends. For example, display resolution will improve, and networked computational systems will increase in capability and will be used to enhance multimodel experience. The design and facility of head-mounted units will undergo significant improvements. Indeed, for those who have ever seen the “Dayton Grasshopper,” this latter line of progress will already be evident. With respect to computational capacity, although we have not reached the clear physical barriers to computer speed, it is important to understand that we shall soon have to be concerned with the impending approach of these inherent limits. Despite the continuing progress in recent decades (see Moravec, 1988), it is evident that we cannot continue to double computational capacity at the rate we have been doing. However, the point in time when fundamental physical computational constraints will start to curtail simulation enacted on a network of parallel machines is perhaps a little too far in the future to affect the expected lifespan of the present chapter. Other linear forms of progression will include much greater research efforts on the integration of visual input with input from other sensory systems to enhance the overall experience of “presence.” Audition, tactile stimulation, and olfaction will each play greater roles in improving the fidelity of all simulation. As a consequence, we shall have to make significant methodological progress in understanding and measuring the sense of presence in order to gauge the evolving state of the art. Thus, the trick in prognostication is to attach some spurious numeracy to these predictions of linear or geometric progression and then feign surprise and publicize to the greatest possible extent the moments when such states of development are reached. Recognition then as a futurist depends upon a fallacy in readers’ memories in which the author emphasizes all the successful predictions and relies on the frailty of human memory in failing to recall all the predictions made by the futurist that failed to reach fruition. I shall not engage in such legerdemain; but I am warning the interested reader that touts of all sorts, be they astrologers, financial advisors, or tabloid seers, all rely upon this fallibility of human memory (see Gardner, 2000; Schacter, 2001). However, if I am right in my predictions, be prepared to hear about it on multiple occasions; if I am wrong, I shall hope the publisher prices this book at the usual exorbitant rate! Linear extrapolations are not the problem in predicting the future; it is the nonlinear leaps of progress, often fueled by a concatenation of unusual and often unforeseeable developments, which represents the great challenge to the seer. In the realm of simulation, this challenge is significant indeed as simulation lies at the confluence of so many volatile sciences. Obviously, any major stride in computer science directly affects simulation and its progress. This is equally true for both hardware and software innovations. However, computation alone is not the be-all and end-all of simulation. Psychology and the cognitive and neurosciences have an enormous influence on what will develop in simulation because the laws of simulated worlds are largely psychological rather than physical. Indeed, it could be argued that the

The Future of Simulation

303

central point of simulation is to eventually free individuals from these physical and psychological constraints, a brief discussion of which I present later. But this is not the end of the confluence of forces. Many of the future developments in simulation will depend upon artists and those in the entertainment field whose conceptions and ideas are far removed from the groove of scientific thought. The minds of such individuals entering into the mix provide many quirks and bijouteries, and this is perhaps the major reason for their inclusion. However, all contributions need to be tempered to a scientific understanding. We can add much more to the factors influencing simulation’s development. Among these, we have not yet included customers and their diverse desires, issues, and challenges, which will fuel the financial segment and so help shape the agenda of progress. The forces involved in the cauldron of simulation development are so varied and so tangential to each other that it is almost inevitable that they will produce hiccups and left-field notions that will have mainstream influence on all simulation technologies. One of the possible nonlinearities of progress in simulation already being pursued is the way in which we get information into the brain itself. It may well be that, in the near future, we find that the eye is a relatively limited channel through which to get information to the visual cortex. The complexities, inversions, and selectivity of the retina might well prove to be the bottleneck in moving information across the cortical barrier. Where, then, is the sense of spending millions of dollars improving visual graphics systems for marginal overall gains when direct neural stimulation provides some orders of magnitude better return? To illustrate the possibilities involved in all forms of short-circuiting of the simulation process, it is critical to have an underlying theoretical framework of goal-directed behavior. One such framework is illustrated in Figure 13.3, which deviates from the normal linear stimulus–response information-processing perspective to emphasize the recursive nature of the perception–action/action–perception process (see Flach, Hancock, Caird, & Vicente, 1995; Hancock, Flach, Caird, & Vicente, 1995). In dealing with a recursive (in this case, actually a spiraling) process, one can essentially start and finish anywhere in the circular representation, but for the sake of convenience and clarity let us use the environment as the initial point of departure. In contemporary, electronic simulation, we replace the proximal (immediately perceived) world with its computational surrogate. Previous forms of simulation (such as those used in movie production) used physically erected sets that looked solid, but had, in actuality, little substance. The purpose of each of these forms, similar to the purpose of stage magic, is to “fool” the sensory systems. Often this is done by taking advantage of foibles and illusions well-known to perceptual psychologists. This constitutes the past and present of simulation with which we have already dealt. Obviously, the next developmental stage of simulation is a hybrid one. Similar to the part-real, part-electronic combinations discussed later (see Figures 13.4 and 13.5), the synthesis of part-electronic and part-direct cortical stimulation will be the next breakthrough stage in evolution. The logical step to full simulation via cortical stimulation will be an obvious, feasible, and embraced challenge. However, notice that the goal of all these efforts is to communicate the perceived world to the user.

304

Human Factors in Simulation and Training

FIGURE 13.3  The future of simulation expressed as a function of the site of simulation and its representation. This sequence shows the fundamental purpose of simulation as—breaking the barrier between the perceiver and the perceived, and between the same actor and the act.

FIGURE 13.4  Illustration of a virtual forest, showing individuals experiencing a hybrid reality composed of a physical substrate with virtual overlays. (Copyright 2003, Media Convergence Laboratory, University of Central Florida, reproduced with permission.)

The Future of Simulation

305

FIGURE 13.5  At left is the physical appearance of the environment, and at right the complete environment with the virtual elements that provide the totality of the experience. (Copyright 2003, Media Convergence Laboratory, University of Central Florida, reproduced with permission.)

The next important conceptual step is to understand that simulation needs to make a transition from perception to action. For this progression, we will need to know much more about the tasks, the goals, and, more generally, the intent of the user. At such a point in time, our concern becomes much more directed as to why simulation is necessary. If we are controlling, or practicing to control, complex systems in dangerous situations, the need to perform overrides the imperative to represent. If we can project simulations directly to the cortex, can we not similarly extract response directly from it? This being so, the surrogate and the experiential reality can be simply set together in a closed-loop relation (see Bush, 1945). There are movements toward creating such technologies as companions to input simulation, in which the neurological state initiates action in the real world via machine activation. This area, given the name neuroergonomics, is, at present, in its earliest infancy (see Hancock & Szalma, 2003; Parasuraman, 2003). In future, it promises to complement the advances in the perceptual augmentations of simulation by creating a way for the brain to directly affect the world. As we more fully understand human response in the context of expanding technological capacities, we shall see how simulation is a crucial component in dissolving the barrier between virtually expressed intention and subsequentially realized reality. At such a juncture one has, collectively, to be extremely wary of what one wishes for, as intent will almost immediately be translated into action. Simulation will be critical in showing us the outcome of our

306

Human Factors in Simulation and Training

intentions prior to their physical expression. However, this is for the far-off future. Let us return to a little more near-term scenario.

THE PRACTICALITIES OF SIMULATION Up to the present, I have largely engaged in speculation about the possible futures of simulation and the fundamental driving forces behind said progress. However, to those involved in the practicalities of simulation, such discourse may well seem almost completely superfluous. After all, those who are beset by pragmatic needs are concerned with the advantages simulation can render tomorrow and not in the faroff horizon of the decades to come. Therefore, it is incumbent upon me to consider these near-term issues because simulation science has, for most of its existence, been a very practical concern. Those looking to employ simulation in the near term have to ask very pointed questions about whether it can save them time, money, and/or even lives in return for their investment of resources when compared to alternative approaches to their respective problems. Simulation is almost always appreciably cheaper than operations on the real system. Almost any technology we can think of, such as process control, aviation, commercial vehicle operations, and others, is much more expensive to train on the system itself than on a valid simulation. Of course, the problem of effective skill transfer always remains, but for technologies such as the space shuttle, one simply cannot train on the system itself as there are just too few of them in existence. For technologies such as nuclear reactors that are in constant operation, training on the actual system is neither feasible nor advisable. It is here that the cost-effectivity of simulation steps to the fore and makes its case. Being able to achieve one’s goal of operator training in the cheapest possible manner actually drives most practical decisions in our cost-conscious world today. Indeed, it was this focus on training that served historically to separate simulation from mainstream in human factors. As Chapanis Garner, Morgan, and Sanford (1947) noted, training and thus simulation were largely the domain of the personnel psychologist who evolved into the industrial/organizational specialist of mainstream psychology. In contrast, human factors and human engineering scientists come from the roots of experimental psychology. This irrational division between human factors and simulation science is still evident to some degree in, for example, the comparable evolution of the Human Factors and Ergonomics Society and the American Psychological Association’s Division 21 on Applied Experimental Society, although the latter organization has by itself a complex history. Fortunately, this rift has been healed by a much closer association that should now serve to bring human factors and simulation applications much closer together, hopefully to the strong benefit of each (R. S. Kennedy, personal communication, 2004). Although the simple fact of cost often dictates decisions, such deliberations are also often constrained by time. Building actual systems, or even full-scale mock-ups and models, can take an extensive amount of time, and in business worlds where time and money are considered interchangeable, the argument again comes down to the fundamental cost to achieve the goal. Simulations can be run so as to allow the

The Future of Simulation

307

consideration of an almost endless number of “what-if” scenarios. For actual systems or even models in which catastrophic consequences put an end to the physical entities, the cost of running a failure scenario can be prohibitive (as those who used model boards for early flight simulation know only too well). In the software world, the cost is essentially zero. Allied to this argument is the question of risk exposure. Whereas actual models and even systems themselves can be rebuilt, human beings cannot. It is often the case that the practicalities of simulation are specifically for training in high-risk circumstances where failure is not an option. In simulation, we can engage in multiple attempts. In comparable real-world systems, these become one-shot trials in which one or more lives are clearly on the line. Again, this is a protection of resources and, in business terms, falls back to cost. However, when, for example, an instructor pilot and a student are killed in a crash, more than money is lost. Thus, for simulation to continue to function healthily in the future and even to burgeon, it needs to make very explicit its advantages in terms of time, money, and safety to the customers. Quantifying these advantages through mutually inspective and reliable assessment methods remains an important goal of the simulation scientist.

SIMULATION AND TRAINING Simulation has always supported training. Giving individuals surrogate experiences in advance of events has always been considered an important advantage of simulation. However, for future simulations, much harder questions will be asked about the cost and effectiveness of such training. This implies that the behavioral scientist and the simulation technologist will have to work in even closer association to perpetuate simulation’s advantage. In the near future, it will not be sufficient to merely present scenarios that approximate expected conditions and then expose individuals to these general circumstances. There will have to be a much more targeted approach as to what specific skills need to be trained and what component elements of such skill development can best be served by exposure to simulations. Thus, we could construct situations to familiarize the individual with general contextual information as to the global environment of operations. Alternatively, we could provide support for the assimilation of specific strategic skills such as decision making. For example, we could provide simulation support specifically for purpose-directed psychomotor skills of extremely high value, such as surgical expertise. It is evident that the nature of the cues that compose such simulations vary according to the skill set that is required for transfer. As has been previously pointed out (Kantowitz, 1992), this means that future simulations need to focus on the psychological variables necessary for skills support and cease to be driven solely by appearance and technological innovations. It may well be that the best simulation for supporting specific continuous psychomotor skill transfer (e.g., driver training) is simply a dynamic line display, and all other elements of the sensory display (such as texture) actually distract from such transfer. Thus, future researchers will have to ask very hard questions about the goal of simulation use and tailor the simulation created to those specific goals.

308

Human Factors in Simulation and Training

In this line of evolution, we may well get some surprising paradoxes such as the one described earlier, where reducing the apparent fidelity of simulation improves its specific utility. Such developments await a much more systematic understanding of what facilitates simulation-based skill assimilation for subsequent real-world transfer (see also Morris et al., 2004). Practical simulation is not only used for training (see Aldrich, 2004). Indeed, one can argue that the role of simulation in the process of design and systems acquisition might, in the future, be even greater than training (cf., AGARD, 1980; Lane et al., 1994). It has been noted that the degree to which training is needed in human– machine systems reflects the degree of flaw in the original design. As an absolute statement, this is certainly subject to debate. However, it is true that some design improvements obviate the need for more extensive training, and so, in the search for ever-more cost-efficient systems, the wise procurer will think carefully about up-front investment in design. Here, of course, simulation again steps to the fore. The physical and cognitive operational characteristics of any system can be presented first in simulation and quickly changed to search for optimal, or at least improved, design options. However, this is a somewhat static view of the design process. I have argued previously (Hancock, 1997b) that, in the future, design will be a much more fluid and interactive process. Because systems themselves will be much more dynamic, generative, and evolutionary, the linkage between simulation and design will also be a much more flexible and interactive one. Indeed, as noted earlier, operators of the future need not necessarily know whether they are working on the actual system or practicing on a perfect surrogate; this same blurring will happen with the process of design, also. In other words, fluidity will be of much higher value, and the idea of the fixity of final products or finished simulations will begin to fade. The days of nomothetic, or generalized, simulations are numbered. The future will require much greater flexibility and a much greater focus on the individual as designer, operator, user, trainee, or customer. The future watchword of simulation will be customization or, perhaps more realistically, individuation (see Hancock, 2003a).

DISCOURSE BETWEEN TWO WORLDS Whatever the trajectory of future simulation, I think it is safe to assert that there will be two fundamental considerations. Both concern the nature of the simulated worlds. I predict, without much temerity, that we will see a continued focus on improving real-world replication. In contrast, there will be a steadily growing interest in representing artificial environments. Since the very earliest forays into simulation, one of the main concerns has been with the practical issue of presenting simulacra of the real world. Obviously, the roots of simulation lie in training, in which it has been assumed that the most effective course is to expose individuals to a complete representation of an environment before they have to face the real situation. The less forgiving the real-world situation, the greater the utility of simulation. Little wonder then that the military has always been at the forefront of simulation technology as it is the quintessential institution engaged in exposing individuals to real-world threats. One of the mantras of this line of development has been “the better the simulation,

The Future of Simulation

309

the more effective the transfer of developed skills to the real world.” The issue here, as noted previously, lies in the nature of the necessary skills. Simulation developers have been computer scientists and engineers, not psychologists. Engineers use a face validity, “existence-proof” measure of simulation quality. In seeking to control issues such as frame rate, polygon frequency, texture mapping, stair stepping, and the like (each of which is largely issues in computational speed), they focus on the appearance of the world, rather than on its psychological characteristics. Their traditional yardstick for improvement is to enter the simulation, and if it looks better and more like the real world, that is progress. The problem focus is on the technical barriers and glitches that prevent this metric of appearance from reaching better levels. Unfortunately, simulations are built for people and built for a purpose, and often such purpose and subject interaction are subtly at odds with the appearance metric. Let me give one illustration, although the reader will find many other examples in this text. In constructing wraparound ground-vehicle simulations, it is often the case that attempts are made to provide a high-fidelity visual appearance of the whole field, which can extend to a 360° field of view. Problematically, the human visual system does not register such levels of detail throughout its range, and indeed devotes detailed processing only to a very narrow 2°, the foveal visual field. Providing highly realistic, detailed information in the person’s visual periphery is therefore not only computationally wasteful, it can also induce simulation sickness and therefore negate the very reason for the simulation in the first place (see Stanney, 2002; and see Stanney, Kennedy, Hale, and Champney in chapter 6, the original book). Few people can acquire important skills to be transferred to real-world operations while they are nauseous. Fortunately, as simulation research has begun to foster much greater interdisciplinary contributions and is now often encountered as a team effort, such issues as nominal realism versus psychological composition are receiving significant and deserved attention. It is a simple prediction that such interactive teams will make significant strides not merely in the improvement of simulation effectiveness but also in distilling composite measures of progress that include both computational and psychological parameters.

HYBRID SIMULATION WORLDS Before passing on to the issues of assessment, it is important here to comment directly on the most interesting initiative that concerns simulations of mixed or hybrid forms. It is easy to see that simulation has made great strides in replicating certain forms of sensory stimulation, with vision being the outstanding one. It is equally easy to see that simulation has made relatively little progress in some comparable areas such as tactile stimulation. Here, I do not go into the reasons as to why this is so, but it does reflect, in part, the state of knowledge and research progress, as well as the result of practical resource allocation, in the immediate past. Be that as it may, we are left with a conundrum that simulation is close to fooling some sensory systems, whereas others are hardly touched at all. The interim solution to this problem is the use of mixed or hybrid worlds in which the visual and auditory cues are computergenerated, whereas the tactile cues remain firmly rooted in our present reality. Such

310

Human Factors in Simulation and Training

a compromise results in the use of many overlay technologies such as blue screens, which have been pioneered and long-used in the entertainment industry. There is great near-term promise for these technologies, which are now beginning to have numerous practical uses. The illustrations shown in Figures 13.4 and 13.5 are from the University of Central Florida’s Media Convergence Laboratory and are taken from its mixed reality innovation test bed. As noted earlier, not only are these important steps in simulation evolution, they also mark the way to other hybrid forms once further technical developments have been accomplished. Consequently, the way they integrate information sources is a most instructive development.

ASSESSING THE PROGRESS OF SIMULATION TECHNOLOGIES To comprehend the problem of measuring the progress of simulation technologies, let us begin in a very roundabout manner; namely, by considering why (I think) ventriloquists are so awful. As a child and later as an adult, I have had to squirm in embarrassment when watching a “nominal” (amateur or, unbelievably, even professional) ventriloquist go through the shtick of “throwing” their voice to an unconvincing dummy. First, the voice is rarely thrown very far. In fact, in most cases, there are only 4–5 inches between the performer and the clacking mouth of a wooden puppet, or even more horrendously, a decorated sock! Fortunately, this pursuit seems to have disappeared from our major media, but still persists in what I prefer to call distributed pockets of psychological illness. The failure of the ventriloquists’ illusion is evident in the decreasing age of their audience. Nowadays, unless restrained, even 5-year-olds hoot such performances off the stage. Perhaps it can now quietly and reverently return from whence it came: the adult-to-baby game of peek-a-boo. The problem is quite evident. The so-called artists seek to suspend reality by relocating causation from themselves to alternate entities. But the illusion simply does not work. Ventriloquists can never throw their voice sufficiently far so that a facing audience can “see” the voice as if it were coming from a different location (i.e., the dummy). Hence, the dummy is always designed to be visually attractive, with a large moving mouth. The putative entertainers augment this nominal suspension of belief by adopting the stereotypical funny voice and by the minimal movement of their own mouths. Similar to that of a bad magician whose rabbits and pigeons are leaping and flying from the dress-coat, we see through and deride the failing illusion of the sad performer. In a general sense, ventriloquists are trying, albeit however slightly, to pervert the laws of physics and produce the appearance of sentience and causation in an obviously inanimate object. Note that the conversational and comedic aspects of ventriloquism are exactly parallel to the two-man stand-up comedy act, but in the latter, we do not experience any dissonance as the two individuals are both clearly seen as sources of causation. In essence, the sad ventriloquist is the equivalent of the failed simulation software and hardware engineer. They have tried to produce an illusion and manifestly failed. However, how and why the illusion fails are crucial questions for the future of simulation.

The Future of Simulation

311

THE TURING TEST OF SIMULATION To a degree, all simulations fail. Perhaps this failure is what we actually mean by simulation, that is, a degraded version of reality. However, soon this degree of failure or degradation will become so disappearingly small that those exposed to simulation will be hard put to tell the difference. All human sensory systems have a limit to their resolution capacities, and simulation capabilities will sequentially approach these individual limits for each specific sensory system. Paradoxically, because vision is a distal sense, in that it decodes information that is largely remote from the actual retinal site of activation, the visual capacities of simulation are rapidly approaching the level of reality. I have labeled this as the “Turing test” of simulation because, analogous to the way that Alan Turing constructed a test for machine intelligence, the inability to tell the real from the simulated world constitutes a watershed threshold (Turing, 1950). For audition, this threshold is a little further removed. Not only are auditory displays serial representations, they are omnidirectional and very sensitive to change in locality. Also, quite bluntly, we know much more about vision, having dispensed many Nobel Prizes for vision research. Compared with the funding for vision, audition research is the poor cousin. Unfortunately, the most proximal sense, that of touch, receives virtually no funding in comparison. The few brave souls who labor in the field of tactile kinesthesis are made aware of this lack on a daily basis but—and again paradoxically—it is the barriers to the simulation of touch that are likely to be the last to fall in the search for the final passage of simulation’s Turing test. Subjectively, we attempt to assess this overall experience through a “subjective” sense of “presence.” This latter construct represents the degree to which we are willing to suspend our disbelief of the simulation shortfall. As presence improves, so we approach closer to passing the Turing test. One method of achieving this passage early is by finessing the problem of touch. Instead of artificial actuators attempting some form of direct stimulation, one can merely substitute the actual real world for this part as in the hybrid simulations discussed in the previous section (see Figures 13.4 and 13.5). Transportation simulation is already well along the way to crossing the reality threshold. Using the strong advances of visual and auditory projection, the problem of touch is circumvented by having the individual sit in a cab of some sort, often taken from an actual vehicle. Providing stationary tactile cues is now taken care of, and the issue devolves to a question of providing appropriate dynamic motion cues to match events represented in the visual and auditory displays. In current high-end, singleseat aircraft simulators, this experience gets close enough to reality to make pilots sweat in hazardous situations. Of course, one can finesse the problem of vision and audition also by simulating naturally degraded worlds such as those occupied by fog in which both visual and auditory displays are, naturally, reduced. However, depending on what it is we are intending to simulate, the feasibility of passing the Turing test increases daily.

312

Human Factors in Simulation and Training

SUPERSIMULATION Human sensory systems have finite resolution capabilities. They address only a restricted portion of the electromagnetic range and have intrinsic limitations in both spatial and temporal acuity. Thus, what we experience as reality is a highly restricted “window” into a much richer world. Simulation is not bound in the same way. Indeed, it has been noted that whereas the laws of physics bound real worlds, it is the laws of psychology that bound virtual ones. So, if we pass the simulation Turing test, what then? Due to the fact that many physical constraints can be fractured in simulation, it is both feasible and practicable to generate representations beyond realworld fidelity; these capacities, which I have labeled “‘supersimulation,” are the next evolutionary step. Of course, we already have several forms of supersimulation. We can present the individual with displays of information taken from infrared and ultraviolet ranges, and scale them to be displayed via intrinsic visual capacities. We could also render the same isomorphisms in both auditory and tactile-kinesthetic worlds in which stimulation beyond the normal range can be mapped into the spectrum of normal capacities. This would, for example, allow us to hear what dogs experience, and via teleoperation we could “touch” worlds where no human has yet been. Variations in temporal presentation are also very familiar to us through entertainment media in which slow-motion or time-lapse photography either slows down or speeds up realtime events, respectively. The variation in visuospatial resolution is rather akin to the use of dynamic binoculars that provide a radically increased magnification resolution in some specified part of the visual field (where, of course, auditory and tactilekinesthetic analogs are also feasible). In some sense, the latter manipulation then replicates visual structure and function of the fovea and periphery in the retina where an area of especially high sensitivity is bound by a surrounding region of lower resolution. In supersimulation, the boundaries of space, time, and electromagnetic range of innervations are no longer immalleable, and I fully expect to see the exploitation of these dimensions continue in the future to an even greater degree. Finally, as with earlier observations, the actual content of such worlds is bound only by imagination as we have already seen in many entertainment “worlds.”

THE MORAL DIMENSION OF SIMULATION As well as the technical future of simulation, there is also a growing concern for its moral dimension, which it is important to consider here. In recent decades, we have seen the growth of the World Wide Web and the massive impact it has exerted in a number of fields. However, even the most cursory survey of the Web shows that one of the dominant themes is pornography, or more properly, explicit sexual content, since pornography is very much in the eye of the beholder (Lawrence, 1929). In public, many individuals deplore this tendency, and yet such images and activities must find a widespread and ready market; otherwise, they would not be so plentiful. Our attitude toward this issue often reflects our own individual moral grounding, and questions of usage and censorship vary according to the different regions of the world and the cultural diversity in which such use is embedded. Many individuals

The Future of Simulation

313

agree that limitations on some aspects—for example, on child pornography—are needed, and this consensus allows groups to designate certain activities as illegal. However, is all public condemnation to extend to purely private circumstances, and what role does enforcement play in such situations? Such questions represent the horns of moral dilemmas with which future simulation scientists must inevitably wrestle (see Hancock, 1998, 2003b). What is clear is that sex is a fundamental human drive and, like other forms of physical activity, sexual activity, too, can be represented in simulation situations. Given the fundamental nature of this drive, it can be anticipated that much effort will go into producing surrogates. However, what does this say about reality? If simulation were to reach a level of sufficient viability in reproducing tactile cues as well as visual and auditory stimulation, would this represent a breakdown in real-world sexual activity, which would pale by comparison with its unlimited and unconstrained electronic substitute? By extension, would this mean the curtailment of other social relationships? Teenagers already spend enormous amounts of time playing in their own rooms in electronic game worlds. Simply imagine the alternatives that could be rendered by advanced simulation possibilities. Indeed, how many of us would return to a mundane realworld existence given the choice of ultimate fantasy worlds? It is not solely this “virtual isolation” which is of concern. What do we do about those individuals who wish to conduct illegal, illicit, or threatening activities and seek to use the facility of simulation to advance these goals? Indeed, the terrorists of September 11, 2001, took extensive advantage of simulation to achieve their ends (Hancock & Hart, 2002). Do we take the scientists’ traditional dissociation excuse and indicate that the fruits of science are morally neutral until they are put to specific ends through social and political implementation (see Hancock, 2003b; Parasuraman, Hancock, Radwin, & Marras, 2003)? It was this very dilemma that had to be faced by Oppenheimer in the creation of the atomic bomb, which was certainly never a neutral technology. Thus, the future of simulation is not merely a technical challenge but promises to be one of radical social import. The creators of technology can no longer legitimately claim moral neutrality, and perhaps simulation science is an area that can address this thorny but vital issue. Can we, indeed should we, find and impose limits?

A PHILOSOPHICAL VALEDICTION I cannot leave a discourse on the future of simulation without explicit speculation as to these wider personal and social implications. Such projections are, of necessity, eventually to be founded in philosophical discussions. I know that such deliberations can be anathema to a segment of readers who are pragmatic and practice-oriented, who are hereby excused, without prejudice, to proceed directly to the final, summary section. However, the pillars of philosophy are the foundations of society, and as the future of simulation promises to shake these very pillars, this brief excursion is more than justified. Perhaps the most relevant place to begin is with considerations of the nature of reality. The British empiricists Locke (1690), Berkeley (1710), and Hume (1739) were

314

Human Factors in Simulation and Training

among the vanguard of modern philosophers to question the fundamental nature of experience itself. It is evident that our moment-to-moment experience is derived overwhelmingly through our immediate sensory stimulation, and the basic question is whether all experience is contingent upon this ongoing stream, as other potential contributions are themselves contingent upon remembered experiences extracted from the same source. Locke’s original comments on newborn children, as tabulae rasae upon which nature writes as if on an empty page, is one radical position in the ongoing nature–nurture contention. Contemporary biogenetics tells us that the situation is much less absolutist than this, and that newborns are equipped with many innate capacities to help them deal with the challenges of the environment. In essence, Locke considered individual memory as an important player in the totality of experience, but did not, at that time, comprehend the notion of inherited genetic capacities (essentially a genetic memory) that would help frame the very earliest forms of experience. Our contemporary knowledge helps us understand that, indeed, reality is more than the stimulation of the moment. Following Descartes, the question of reality as illusion became an important philosophical issue. Could it be, independent of the nature of all previous experience, that what was perceived as reality was actually illusion? Descartes could imagine an all-powerful entity (which he expressed as a devil, but in simulation terms might merely be an exceptionally capable computer) that could present to his senses a sufficiency of stimulation so that he was fooled. His only comfort was the fact that such a computer could not deal with the pure intuition that made it possible for him to doubt that illusory reality. His famous aphorism, “cogito ergo sum,” assumes that the very essence of self is thought, not perception—although in this he was also mistaken to some degree. Berkeley identified that all-powerful being as God and asked the most sensible of questions as to whether matter actually existed. His argument is most instructive, especially for future simulation scientists. Berkeley, himself a Bishop of the Church, was a believer in the omnipotence and infallibility of God—omnipotence, meaning God could do anything; infallible meaning God was perfect. Since God was perfect, God would not make mistakes. Thus, God would not engage in any action that itself was not the most economic and efficient method of achieving a specific aim. From this, Berkeley argued that God could put directly into the mind of each individual the experience of reality, omnipotence allowing this difficult but conceivable action to take place. This being so, creating matter, that is, creating an intermediary mechanism for the perception of reality, is not really needed. We do not need to go through the step of “perceiving” external objects because God can project such an image directly to the brain. Therefore, God being both infallible and omnipotent, matter is unnecessary—quod erat demonstrandum (QED). This wonderful solipsist conundrum has never been resolved and, indeed, there are good grounds to believe that empirical resolution may be impossible. Today, we do not believe in this solipsist assertion, not because of any proof to the contrary but rather because of much greater doubts about the presence, actions, and capacities of any particular deity. Berkeley’s argument however remains unassailed. The question for the future of simulation in this context is—can we play the role of God? In actuality, this is logically equivalent to the passage of the Turing test for

The Future of Simulation

315

simulation reality. At present, when we are in a simulation, we remain very much aware that we are in that simulation, not least in part because we remember entering that environment and agreeing, at least to some extent, to suspend our disbelief. The sense of reality would be much enhanced if, for example, we woke up to such a simulation, where the power of the continuous stream of memory played a diminished role. Similar to other concerns about the basic nature of individual existence, developing simulations that slowly and surreptitiously introduce artificially mediated parts of the environment into naturally occurring situations may help fully suspend our disbelief. Contemporary psychologists and neuroscientists are very aware that the content of consciousness is a dynamic interplay between the stream of incoming information interacting with the centralized, largely memory-mediated processes. In this manner, to fabricate a true reality, simulation will have to embrace much more than simple surrogate sensory displays. It will have to dig deeply into the nature of memory and the facets of individual differences that connote personal identity and idiographic experience. Thus, the creation of “constrained” realities, that is, those in which the individual personally, voluntarily, and knowingly “buys” into the premises and constraints of the surrogate world, is not far off, and for many gaming situations, it is already here. A convincing replacement world for a non-cooperative individual is further away, and designs will continue to rely on support from actual real-world surroundings for some time to come. Although the barriers are daunting, such problems are not insurmountable, and as we understand more about problematic issues such as tactile-kinesthetic stimulus replacement and the integrated experience of consciousness, we will eventually achieve other realities; but what then? Simply creating persuasive other worlds is only the first step. Since our present concern is with the philosophical issues, let us cast aside, albeit temporarily, the pragmatic drivers that will power the future of simulation and ask the greater social questions. Let us suppose we can now create infinite alternative universes (and by this, I mean that the users will be able to dynamically construct and control any object or agent in their “worlds” and that some method has been found to port the material essentials, for example, food, oxygen, water, etc., into these worlds). In such worlds, the user will be able to instantly satisfy any physical or cognitive desire. Who would wish to occupy such a world? Having occupied it, who would wish to return to this one? Coming back to “reality” would mean encountering individuals who are odd, unpleasant, incompliant, uncaring, polemic, sadistic, and even murderous. Who would take these characters over a collective of purpose-built electronic substitutes having infinite empathy, pity, love, and caring? In essence, what would happen to the fabric of human society when the necessity of social cohesion is fractured? The philosopher Rousseau asked much the same questions some centuries ago, arguing that any social bond is one in which one trades a degree of security for a restriction in autonomy (Rousseau, 1762). Further, he had a recommendation for those who wished to exercise untrammeled autonomy because they could not accept the restrictions of civilized society—he recommended they go to America! There, he argued, they would find a “new” world where the pressure of population on land was sufficiently small that people could “do their own thing.” This transmigration of the discontent

316

Human Factors in Simulation and Training

was not, of course, the preferred immigration policy of the Native Americans of the time. In today’s world, Rousseau’s contract is no longer voluntary and, therefore, no longer a viable contract. An individual born today cannot effectively decide to opt out, find unclaimed but productive land, work on that land, and remain socially isolated. There are individuals and small groups who try to sustain such isolation, and some (such as the Amish) have a degree of success. Other individuals and groups are, for differing reasons, not successful, especially when they directly encounter the unsympathetic power of society at large. In this sense, the cases of the Uni-Bomber and the Branch Davidian group are particularly instructive and recent examples (Reavis, 1998). Given the prior claim on all lands of the world, today’s social contract is imposed on individuals by force variously disguised in order to maintain a relatively stable collective. In this inherent tension, is there an opportunity for simulation science? I suggest there is. Given the current population, it is evident that we will not reach any planetary system sufficient to support human sustenance and expansion in the time we have available (although growth is actually slowing in some regions). Neither, despite all the optimistic prognostications, are we liable to find a voluntary balance in the global population. Given that there is little actual (effective) real estate for pioneers to explore and that the pressures on land continually increase, can the simulation sciences represent the “new” world? In the film The Matrix, we are shown warehoused individuals stockpiled for their power generation capabilities (a very doubtful premise). However, it may be that such warehousing is much more aligned with excessive population. Indeed, we already warehouse those elements of society who are termed criminal. Two hundred years ago, we ported such individuals to “new” worlds, such as the forced migration of “criminals” to Australia (see Hughes, 1987; Rees, 2002) by the English. Today, we essentially have no such remote lands to be exploited. Could we port such individuals to simulated worlds? Immediately, the questions of cruelty and purposelessness come to the fore, but like the colonies of old, simulation worlds need not be inherently unproductive, and much of value may be brought back from these electronic potentialities to the one that will remain “mother reality.” Sufficiently advanced, future simulations can therefore act to change society once more in order to present choice to individuals. The problem with this is that as human beings are inherently structured to accept options involving the least effort, who would be left to tend the machinery, advance the technology, and frame mother reality? My answer would be that it will, as it has always been, be those who embrace challenge, see opportunity, and turn adversity into progress. As America once was, as the West once represented, as the Apollo program once exemplified, future simulation can represent the new frontier.

SUMMARY AND CONCLUSION The future of simulation is bright. There continue to be many circumstances in which we wish to train individuals, and yet not expose them immediately to dangerous situations. The traditional custom of military forces will continue, and the ever-burgeoning demand for entertainment will also drive simulation technology to

The Future of Simulation

317

improve. However, as well as progressing along these traditional lines, simulation will begin to expand and, in some ways, dissolve. I expect to see dissolution in the actual process itself. Since advanced system operators already act on representations of systems, and not directly on systems themselves, they already (to a degree) act on simulations. That this simulation changes between the actual system and a computer surrogate (and back again) could be easily achieved, and the substituted remain opaque to the operator. Such technological progress is feasible, and I expect to see it engaged in various forms in the very near future. However, the flexibility and change in simulation will not stop there. Simulation has often been used in the design process. The future will see a much more interactive role for simulation here, and in the same way we will find it difficult to parse simulation for training from simulation for operations; we will find it similarly difficult to parse simulation for design from simulation in operation. If the future continues to emphasize speed and flexibility, these dissolutions of definition will also be accelerated. Further, we will see much greater customization. In general, human factors have gone from their earliest forms in which individuals built and customized their own tools, through eras of mass production to adaptive systems, and are now finally returning to individualization (see Hancock, 2003a). In the near future, everyone will expect their respective simulation(s) to adapt to themselves and their own personal settings. Simulation will have to follow this trend and also show ever-increasing cost-effectiveness through a much greater focus on what the goal of the simulation is and how to achieve that goal at the least cost. The future looks bright—but don’t worry! it is coming whatever you do!

ACKNOWLEDGMENTS I would very much like to thank Robert Tyler, Brian Goldiez, John Wise, and Robert Kennedy for their comments on an earlier version of this chapter. Preparation of this chapter was facilitated by grants from the U.S. Army. The first was the Multiple University Research Initiative-Operator Performance under Stress (MURIOPUS) Grant (#DAAD19-01-0621). The second was from the Advanced Decision Architecture Consortium (# DAAD 19-01-0009). The views expressed in this chapter are those of the author and do not necessarily reflect the official policy or position of the Department of the Army, the Department of Defense, or the U.S. government. The author would like to thank Elmar Schmeisser, Sherry Tove, and Mike Drillings for providing administration and technical direction for the first grant, and to Mike Strub for the second grant.

REFERENCES AGARD. 1980. Fidelity of simulation for flight training. AGARD Advisory Report No. 159, Harford House, London. Aldrich, C. 2004. Simulations and the future of learning. Wiley: San Francisco. Berkeley, G. 1710. Treatise concerning the principles of human knowledge. Tonson: London. Bush, V. 1945. As we may think. The Atlantic Monthly, 176(1), 101–108 .

318

Human Factors in Simulation and Training

Chapanis, A., Garner, W. R., Morgan, C. T., & Sanford, F. H. 1947. Lectures on men and machines: An introduction to human engineering. Systems Research Laboratory: Baltimore, MD. Flach, J., Hancock, P. A., Caird, J. K., & Vicente, K. (Eds.). 1995. Global perspectives on the ecology of human-machine systems. Lawrence Erlbaum: Mahwah, NJ. Gardner, M. 2000. Did Adam and Eve have navels? W. W. Norton: New York. Hancock, P. A. 1997a. Essays on the future of human-machine systems. Banta: Eden Prairie, MN. Hancock, P. A. 1997b. On the future of work. Ergonomics in Design, 5(4), 25–29. Hancock, P. A. 1998. Should human factors prevent or impede access. Ergonomics in Design, 6(1), 4. Hancock, P. A. 2003a. Individuation: Not merely human-centered but person-specific design. Paper presented at the 47th Annual Meeting of the Human Factors and Ergonomics Society. Denver, CO. Hancock, P. A. 2003b. The ergonomics of torture: The moral dimension of evolving humanmachine technology. Proceedings of the Human Factors and Ergonomics Society, 47, 1009–1011. Hancock, P. A., Flach, J., Caird, J. K., & Vicente, K. (Eds.). 1995. Local applications in the ecology of human-machine systems. Lawrence Erlbaum: Mahwah, NJ. Hancock, P. A., & Hart, S. G. 2002. Defeating terrorism: What can human factors/ergonomics offer? Ergonomics in Design, 10(1), 6–16. Hancock, P. A., & Szalma, J. L. 2003. The future of neuroergonomics. Theoretical Issues in Ergonomic Science, 4(1–2), 238–249. Hockney, D. 2001. Secret knowledge. Viking Studio, Penguin: New York. Hughes, R. 1987. The fatal shore. Knopf: New York. Hume, D. 1739. A treatise of human nature. Noon: Cheapside, London. Kantowitz, B. H. 1992. Selecting measures for human factors research. Human Factors, 34(4), 387–398. King, R. 2001. Brunelleschi’s dome. Penguin Books: New York. Lane, N. E., Kennedy, R. S., & Jones, M. B. 1994. Determination of design criteria for flight simulators and other virtual reality systems. Proceedings of the IMAGE Conference, Tucson, AZ 12–17 June. Lawrence, D. H. 1929. Pornography and obscenity. Faber and Faber: London. Locke, J. 1690. An essay concerning human understanding. Basset: London. Moravec, H. 1988. Mind’s children: The future of robot and human intelligence. Harvard University Press: Boston, MA. Morris, C. S., Hancock, P. A., & Shirkey, E. C. 2004. Motivational effects of adding context relevant stress in PC-based games training. Military Psychology, 16(2), 135–147. Mouloua, M., Gilson, R., & Hancock, P. A. 2003. Designing controls for future unmanned aerial vehicles. Ergonomics in Design, 11(4), 6–11. Parasuraman, R. 2003. Neuroergonomics: Research and practice. Theoretical Issues in Ergonomic Science, 4(1–2), 5–20. Parasuraman, R., Hancock, P. A., Radwin, R. A., & Marras, W. 2003. Defending the independence of human factors/ergonomics science. Human Factors and Ergonomics Society Bulletin, 46(11), 1, 5. Pullman, P. 2000. The amber spyglass. Random House: New York. Reavis, R. J. 1998. The ashes of Waco: An investigation. Syracuse University Press: New York. Rees, S. 2002. The floating brothel. Hyperion: New York.

The Future of Simulation

319

Rousseau, J. J. 1762. The social contract: On principles of political right. Translation G. D. H. Cole. www. consitution​.org​/jjr​/socon​​.htm Schacter, D. L. 2001. The seven sins of memory. Houghton-Mifflin: Boston. Staney, K. M. 2002. (Ed.). Handbook of virtual environments: Design, implementation and applications. Lawrence Erlbaum: Mahwah, NJ. Turing, A. M. 1950. Computing machinery and intelligence. Mind, 59, 433–460.

Appendix A: Glossary of Modeling Terms Compiled by Michael G. Lilienthal, Ph.D., CPE, CTEP, senior analyst, Electronic Warfare Associates, Government Systems, Inc., and William F. Moroney, Ph.D., CPE, Professor Emeritus, Human Factors Program, University of Dayton (Ohio). Term

Definition

Activity models

A process model that describes the functional activity under examination in terms of inputs, transforms, outputs, and controls.

Analytical model

A model consisting of a set of mathematical equations, e.g., a system of solvable equations that represents the laws of thermodynamics or fluid mechanics. A model whose inputs, outputs, and functional performance are known, but whose internal implementation is unknown or irrelevant. For example, a model of a computerized changereturn mechanism in a vending machine that is in the form of a table indicating the amount of change to be returned for each amount deposited. A model consisting of well-defined procedures that can be executed on a computer. For example, a model of the stock market in the form of a set of equations and logic rules. A model that documents the business information requirements and structural business process rules of the architecture and describes the information that is associated with the information of the architecture. Included are information items, their attributes or characteristics, and their inter-relationships. The description of what the model or simulation will represent, the assumptions limiting those representations, and other capabilities needed to satisfy the user’s requirements. A collection of assumptions, algorithms, relationships, and data that describe a developer’s concept about the simulation. A mathematical or computational model whose output variables change in a continuous manner. In a database, the user’s logical view of the data in contrast to the physically stored data or storage structures. A description of the organization of data in a manner that reflects the information structure of an enterprise.

Black box model

Computational model

Conceptual data model

Conceptual data model

Continuous model Data model

(Continued)

321

322

Term Descriptive model

Deterministic model

Digital elevation model

Discrete model

Dynamic model

Emulation Enterprise model Environment effect model

Error model

Executable model Federated Object Model (FOM)

Glossary of Modeling Terms

Definition A model used to depict the behavior or properties of an existing system or type of system. For example, a scale model or written specification used to convey to potential buyers the physical and performance characteristics of a computer. A model in which the results are determined through known relationships among the states and events, and in which a given input will always produce the same output. For example, a model depicting a known chemical reaction. A numerical model of the elevations of points on the earth's surface. Digital records of terrain elevations for ground positions at regularly spaced horizontal intervals. A mathematical or computational model whose output variables take on only discrete values; that is, in changing from one value to another, they do not take on the intermediate values. For example, a model that predicts an organization’s inventory levels based on varying shipments and receipts. A model of a system in which there is change, such as the occurrence of events over time or the movement of objects through space. For example, a model of a bridge that is subjected to a moving load to determine characteristics of the bridge under changing stress. A model that accepts the same inputs and produces the same outputs as a given system. Information model(s) that presents an integrated top-level representation of processes, information flows, and data. A model representing the impact or effect that an environmental feature has on a simulation entity, component or process. A model used to estimate or predict the extent of deviation of the behavior of an actual system from the desired behavior of the system. For example, a model of a communications channel used to estimate the number of transmission errors that can be expected in the channel. A model that instantiates the conceptual model of a system as its design specification. A specification defining the information exchanged at runtime to achieve a given set of federation objectives. This information includes object classes, object class attributes, interaction classes, interaction parameters, and other relevant information. The FOM is specified to the runtime infrastructure (RTI) using one or more FOM modules. The RTI assembles a FOM using these FOM modules and one Management Object Model (MOM) and Initialization Module (MIM), which is provided automatically by the RTI or, optionally, provided to the RTI when the federation execution is created. (Continued)

323

Glossary of Modeling Terms

Term Graphical model

Hierarchical model Human behavioral model

Iconic model

Graphical model Hierarchical model Human behavioral model

Human, social, cultural, and behavioral representation

Information model

Logical data model

Markov chain model

Mathematical model

Definition A symbolic model whose properties are expressed in diagrams. For example, a decision tree used to express a complex procedure. Cf. mathematical model, narrative model, soft- ware model, and tabular model. A model in which superior/subordinate relationships are represented, often as trees of records connected by pointers. Model of a human activity in which individual or group behaviors are derived from the psychological or social aspects of humans. Behavioral models include a diversity of approaches; however, computational approaches to human behavior modeling that are most prevalent are social network models and multi-agent systems. A physical model or graphical display that resembles the system being modeled. For example, a nonfunctional replica of a computer tape drive used for display purposes. A symbolic model whose properties are expressed in diagrams (e.g., a decision tree used to express a complex procedure). A model in which superior/subordinate relationships are represented, often as trees of records connected by pointers. Model of a human activity in which individual or group behaviors are derived from the psychological or social aspects of humans. Behavioral models include a diversity of approaches; however, computational approaches to human behavior modeling that are most prevalent are social network models and multi-agent systems. A model of the structure, interconnections, dependencies, behavior, and trends associated with any collection of individuals ranging from the small unit level (e.g., tribes, militias, small military units, terrorist cells) to the macro level (e.g., of nations, religions, cultures, ethnic groups, and international organizations), and the integrated relationships between and among them. A model that represents the processes, entities, information flows, and elements of an organization, and all relationships between these factors. A model that provides a common dictionary of data definitions to consistently express models wherever logical-level data elements are included in the descriptions. A discrete, stochastic model in which the probability that the model is in each state at a certain time depends only on the value of the immediately preceding state. A symbolic model whose properties are expressed in mathematical symbols and relationships. For example, a model of a nation’s economy expressed as a set of equations. Cf. graphical model, narrative model, software model, and tabular model. (Continued)

324

Term Metamodel

Mock-up Model Modeling and simulation (M&S)

Natural model

Numerical model

Parametric model Petri net model

Physical data model

Physical model

Physical based model

Glossary of Modeling Terms

Definition A model of a model or simulation. Metamodels are abstractions of the M&S being developed that use functional decomposition to show relationships, paths of data and algorithms, ordering, and interactions between model components and subcomponents. Metamodels allow the developer to abstract details to a level that subject matter experts can validate. A full-sized model, but not necessarily functional, built accurately to scale, used chiefly for study, testing, or display. A physical, mathematical, or otherwise logical representation of a system, entity, phenomenon, or process. The use of models, including emulators, prototypes, simulators, and stimulators, either statically or over time, to develop data as a basis for making managerial or technical decisions. A model that represents a system by using another system that already exists in the real world. For example, a model that uses one body of water to represent another. (a) A mathematical model in which a set of mathematical operations is reduced to a form that is suitable for solution by simpler methods such as numerical analysis or automation. For example, a model in which a single equation representing a nation’s economy is replaced by a large set of simple averages based on empirical observations of inflation rate, unemployment rate, gross national product, and other indicators. (b) A model whose properties are expressed by numbers. A model using parametric equations that may be based on numerical model outputs or fits to semiempirical data. An abstract, formal model of information flow, showing static and dynamic properties of a system defined by places, transitions, input function, and output function. It graphically depicts the structure of a distributed system as a directed bipartite graph with annotations. A model that defines the structure of the various kinds of system or service data that are utilized by the systems or services in the architecture. A model whose physical characteristics resemble those of the system being modeled. For example, a plastic or wooden replica of an airplane; a mock-up. Mathematical models in which the equations that constitute the model are those used in physics to describe or define physical phenomenon being modeled. (Continued)

325

Glossary of Modeling Terms

Term Predictive model

Prescriptive model

Probabilistic model Process model Qualitative model

Queuing model

Queuing model

Reliability model

Representation

Scale model

Simulation object model (SOM)

Definition A model in which the values of future states can be predicted or are hypothesized. For example, a model that predicts weather patterns based on the current value of temperature, humidity, wind speed, and so on, at various locations. A model used to convey information regarding behavior or properties of a proposed system. For example, a scale model or written specification used to convey to a computer supplier the physical and performance characteristics of a required computer. See: stochastic model. A model that defines the functional decomposition and the flow of inputs and outputs for a system. A model that provides results expressed as a non-numeric description of a person, place, thing, event, activity, or concept. A model consisting of service facilities, entities to be served, and entity queues (e.g., a model depicting teller windows and customers at a bank) ng to be served. For example, a model depicting a network of shipping routes and docking facilities at which ships must form queues to unload their cargo. A model consisting of service facilities, entities to be served, and entity queues (e.g., a model depicting teller windows and customers). A model used to estimate, measure, or predict the reliability of a system. For example, a model of a computer system that is used to estimate the total down time that will be experienced. Models of the entity or phenomenon associated and its effects. Representations using algorithms and data that have been developed or approved by a source having accurate technical knowledge are often considered authoritative. A physical model that resembles a given system, with only a change in scale. For example, a replica of an airplane one-tenth the size of the actual airplane. A specification of the types of information that an individual federate could provide to High Level Architecture (HLA) federations as well as the information that an individual federate can receive from other federates in HLA federations. The SOM is specified using one or more SOM modules. The standard format in which SOMs are expressed facilitates determination of the suitability of federates for participation in a federation. (Continued)

326

Term Static model

Stochastic model

Glossary of Modeling Terms

Definition A model of an entity or system in which there is no change. For example, a scale model of a bridge that is provided for its appearance rather than for its performance under varying loads. A model in which the results are determined by using one or more random variables to represent uncertainty about a process, or in which a given input will produce an output according to some statistical distribution. For example, a model that estimates the total dollars spent at each of the checkout stations in a supermarket, based on probable number of customers and probable purchase amount of each customer. Syn. probabilistic model. See also: Markov chain model. Cf. deterministic model.

Source: Department of Defense Modeling and Simulation Glossary. M&S Glossary (msco​.m​il) https:// www​.msco​.mil​/MSReferences​/Glossary​/MSGlossary​.aspx

Appendix B: Glossary of Simulation Terms Compiled by T. Chris Foster, Ph.D., MSC, USN. Military Director, Human Systems Engineering, Naval Air Warfare Center Aircraft Division, Patuxent River, MD and Michael G. Lilienthal, Ph.D., CPE, CTEP, Senior Cybersecurity Analyst, Electronic Warfare Associates. Term

Definition

Activity-based simulation

A discrete simulation that represents the components of a system as they proceed from activity to activity. For example, a simulation in which a manufactured product moves from station to station in an assembly line.

Agent-based simulation

A simulation that focuses on the implementation of agents and the sequence of actions and interactions of the agents over periods of time. A family of simulation interface protocols and supporting infrastructure software that permit the integration of distinct simulations and war games. Combined, the interface protocols and software enable large-scale distributed simulations and war games of different domains to interact at the combat object and event level. A class of technology that enables the user to interact with the real environment while overlaying or otherwise adding information from a virtual environment to enhance the user’s experience with the real environment. Drascic and Milgram (1996) state that ‘AR describes a class of displays that consists primarily of a real environment, with graphic enhancements or augmentations.’ A class of technology that enables the user to interact with the virtual environment while overlaying or otherwise adding information from the real environment to enhance the user’s experience with the virtual environment. A simulation that is executed on a computer, with some combination of executing code, control/display interface hardware, and, in some cases, interfaces to real-world equipment. A simulation where time advances are paced to have a specific relationship to wall clock time. These are commonly referred to as real-time or scaled real-time simulations. Human-in-the-loop (e.g., training exercises) and hardware-in-the loop (e.g., test and evaluation simulations) are examples of constrained simulations. Simulations involving simulated people operating simulated systems. Real people can be allowed to stimulate (make inputs) to such simulations. See: live, virtual, and constructive simulation.

Aggregate Level Simulation Protocol

Augmented reality (AR)

Augmented virtuality (AV)

Computer simulation

Constrained simulation

Constructive simulation

(Continued)

327

328

Term Discrete event simulation Distributed interactive simulation (DIS)

Event-driven simulation

Extended reality (XR) Hardware in the loop simulation Human in the loop simulation Instructional simulation Interval-oriented simulation Live simulation Live, virtual, and constructive simulation Mixed reality (MR)

Modeling and simulation (M&S) Monte Carlo simulation

Real environment

Glossary of Simulation Terms

Definition A simulation where the dependent variables (i.e., state indicators) change at discrete points in time referred to as events. A time and space coherent synthetic representation of world environments designed for linking the interactive, free-play activities of people in operational exercises. The synthetic environment is created through real-time exchange of data units between distributed, computationally autonomous simulation applications in the form of simulations, simulators, and instrumented equipment interconnected through standard computer communicative services. The computational simulation entities may be present in one location or may be distributed geographically. A simulation in which attention is focused on the occurrence of events and the times at which those events occur. For example, a simulation of a digital circuit that focuses on the time of state transition. A blanket term encompassing virtual reality (VR), augmented reality (AR), and mixed reality (MR). Simulation and simulators that employ one or more pieces of operational equipment (to include computer hardware) within the simulation/ simulator system. Simulation and simulators that employ one or more human operators in direct control of the simulation/simulator or in some key support function. A simulation that provides stimuli in the synthetic environment, for the purpose of training. A continuous simulation in which simulated time is advanced in increments of a size suitable to make implementation possible on a digital system. A simulation involving real people operating real systems. See: live, virtual, and constructive simulation. A broadly used taxonomy describing a mixture of live simulation, virtual simulation, and constructive simulation. MR can be conceptualized as a balanced mix of the real environment and a virtual environment in which neither predominates and purposeful interaction between the two is enabled. Note: While this definition intentionally excludes AR and AV from the definition of MR, there are researchers that include AR and AV on the MR continuum. The use of models, including emulators, prototypes, simulators, and stimulators, either statically or over time, to develop data as a basis for making managerial or technical decisions. A simulation in which random statistical sampling techniques are employed to determine estimates for unknown values (i.e., making a random draw). The real world with which individuals interact using their own body and senses without augmentation. (Continued)

329

Glossary of Simulation Terms

Term Real-time simulation

Simuland Simulation environment Simulation game

Simulation object model (SOM)

Simulation time

Simulator Stimulate Stimulation

Stimulator Time-step simulation or time- interval simulation

Virtual environment (VE)

Definition Simulated time advances at the same rate as actual time. Faster than real time is when simulated time advances at a rate greater than actual time. Slower than real time is when simulated time advances at a rate less than actual time. The system being simulated by a simulation. The operational hardware, software including databases, communications, and infrastructure in which a simulation operates. A simulation in which the participants seek to achieve some agreed upon objective within an established set of rules. For example, a management game, a war game. Note: The objective may not be to compete, but to evaluate the participants, increase their knowledge concerning the simulated scenario, or achieve other goals. A specification of the types of information that an individual federate could provide to High Level Architecture (HLA) federations as well as the information that an individual federate can receive from other federates in HLA federations. The SOM is specified using one or more SOM modules. The standard format in which SOMs are expressed facilitates determination of the suitability of federates for participation in a federation. (a) A simulation’s internal representation of time. Simulation time may accumulate faster, slower, or at the same pace as sidereal time. (b) The reference time (e.g., Universal Coordinated Time) within a simulation exercise. This time is established by the simulation management function before the start of the simulation and is common to all participants in a particular exercise. A hardware or software device that provides input into an operational system or subsystem. To provide input to a real system or subsystem to observe or evaluate the response. The use of simulations to provide an external stimulus to a real system or subsystem. An example is the use of a simulation representing the radar return from a target to drive (stimulate) the radar of a missile system within a hardware/software-in- the-loop simulation. A hardware or software device that provides input injects into system or subsystem platforms and environment that are not physically present. Simulations in which simulation time is advanced by a fixed or independently determined amount to a new point in time, and the states or status of some or all resources are updated as of that new point in time. Typically, these time steps are of constant size, but they need not be. For example, a model depicting the year- by-year forces affecting a volcanic eruption over a period of 100,000 years. A virtual environment requires (1) a computer model, (2) a representation of that model which stimulators the user’s senses (e.g., visual, auditory, haptic), (3) a user or users, and (4) a way in which the user(s) can interact with the computer model. (Continued)

330

Term

Glossary of Simulation Terms

Definition

Virtual reality (VR)

A class of technology that seeks to immerse the user(s) in a virtual environment and detach the user(s) from the real environment. Heim (1998) defines VR as a ‘technology that convinces the participant that he or she is actually in another place by substituting the primary sensory input with data produced by a computer.’

Virtual simulation

A simulation involving real people operating simulated systems.

Source: https://www​.msco​.mil​/MSReferences​/Glossary​/TermsDefinitionsN​-R​.aspx

Appendix C: Glossary of Verification, Validation, and Accreditation Terms Compiled by Michael G. Lilienthal, Ph.D., CPE, CTEP, Senior Cybersecurity Analyst, Electronic Warfare Associates, and William F. Moroney, Ph.D., CPE, Professor Emeritus, Human Factors Program, University of Dayton (Ohio). Term

Definition

Accreditation

The official certification that a model or simulation and its associated data are acceptable for use for a specific purpose.

Accreditation agent

The organization or individual designated to conduct an accreditation assessment for a model, simulation, and their associated data for a particular application. The organization or individual responsible to approve the use of models, simulations, and their associated data for a particular application. A set of standards that a particular model, simulation, or federation must meet to be accredited for a specific purpose. The plan of action for certifying a model, simulation, or federation of models and simulations and its associated data as acceptable for specific purposes. The accreditation plan specifies the reviews, testing, and other accreditation assessment processes. The determination that data have been verified and validated. Data user certification is the determination by the application sponsor or designated agent that data have been verified and validated as appropriate for the specific M&S usage. Data producer certification is the determination by the data producer that data have been verified and validated against documented standards or criteria. The process of verifying the internal consistency and correctness of data and validating that it represents real-world entities appropriate for its intended purpose or an expected range of purposes. The process of determining whether a model or simulation seems reasonable to people who are knowledgeable about the system under study, based on the model’s performance. This process does not revie the software code or logic, but rather reviews the inputs and outputs to ensure they appear realistic or representative.

Accreditation authority Accreditation criteria Accreditation plan

Data certification

Data verification and validation (V&V) Face validation

(Continued)

331

332

Term Independent verification and validation

Validation

Verification

Verification and validation agent

Glossary of Verification

Definition The conduct of verification and validation of a model, simulation, and associated data by an individual, group, or organization that did not participate in the development and is not in the same chain of command or organization as the developer. The process of determining the degree to which a model or simulation and its associated data are an accurate representation of the real world, from the perspective of the intended uses of the model. The process of determining that a model or simulation implementation and its associated data accurately represent the developer’s conceptual description and specifications. The individual, group, or organization designated to verify and validate a model, simulation, and associated data.

Source: DoD Modeling and Simulation (M&S) Glossary, https://www​.msco​.mil​/MSReferences​/ Glossary​/TermsDefinitionsN​-R​.aspx

Index A

H

Adaptive simulation-based training, 229 Artificial intelligence, 44, 52, 58, 66, 203–204, 251, 253–254, 256–257, 259–261, 263, 265, 267, 269, 271, 273, 275–277, 298

Healthcare, 9, 14–15, 55–56, 58, 68–69, 83–84, 88–89, 109–110, 138, 183, 189–190, 201–203, 255, 276, 279–281, 283–289, 291, 293, 295 High-fidelity simulator, 149, 153 Human perception, 146, 153

B Benefits, 5, 34, 36, 47, 59, 66, 74, 79, 82, 85, 91, 97, 105, 107, 110–111, 121, 129, 137, 220, 274, 279

C Cognitive fidelity, 81, 85, 91, 93, 98, 114, 133, 138, 142, 146–147

D Data collection, 17, 68, 78, 153, 185–186, 193, 205, 208, 220–221, 225 Debriefing, 17, 89, 106, 170–171, 181–204, 280, 283, 289–295 Deep learning, 260–261, 263–264, 268, 272, 275, 277–278 Disadvantages, 1, 13, 18–19, 66, 74, 76–77, 81–82, 135, 145–146, 153, 207, 219

E EEG, 269, 279, 281–285, 288, 293–294 Enjoyment, 65–67, 69–70, 86, 176 Evaluation, 6, 9–10, 17–18, 28, 32, 36, 38, 47, 52, 58, 60, 64–68, 74, 77–78, 83, 90–91, 102–103, 106, 108, 118, 120, 124, 137, 145–146, 155–156, 168, 174, 176–177, 198, 207–208, 213, 216–219, 224, 226, 230–231, 234, 236, 247, 251–253, 255, 262, 264–266, 271, 275, 278, 327

F Flight technical error (FTE), 206

I Information, 5, 8–9, 19, 28, 30–31, 35, 37–40, 42, 44, 46, 48–49, 56, 60–61, 69, 73, 78, 81, 83–84, 87–88, 90, 100, 103–105, 108, 123, 129–130, 133–135, 138, 142– 143, 152, 154, 156, 159–160, 169, 175, 184–185, 187–188, 190–192, 194–196, 199, 201, 203–204, 206–208, 219–221, 223, 225, 231–234, 241, 243, 246, 249, 251–252, 255–257, 259–260, 262, 265–266, 270, 272–273, 276, 278, 280–286, 289–290, 292–295, 303, 307, 309–312, 315, 321–325, 327, 329 Intelligent tutors, 4

L Learning assessments, 230–231, 233 Limitation, 38, 145, 147–148, 151–152, 160–161, 236, 269

M Machine learning, 197, 251, 256–257, 260, 263, 265–268, 270–272, 274–275, 277–278 Motion, 3, 6–9, 16, 25, 28–29, 38, 42, 55, 67, 69, 71–74, 76, 80–81, 83, 91, 95–97, 106, 141–147, 153–157, 160, 162–167, 169–179, 265, 267, 272–275, 277, 299, 311–312 Motion algorithms, 145

O Observation, 134, 182, 213, 237, 241–242, 245–246, 267

333

334

Index

Performance measurement, 17, 19, 59, 79, 109, 116, 118, 139, 205–206, 219–220, 223, 229–231, 233–237, 239–243, 245–255 Performance standards, 151, 219, 223, 235

Simulation, 1–116, 118–139, 141–157, 159–160, 162, 164, 166, 168, 170, 172–176, 178, 181–234, 236, 238–240, 242–244, 246, 248–250, 252–258, 260, 262, 264–266, 268–270, 272–286, 288–319, 321–322, 324–332

R

T

Reliability, 15, 74, 87, 132, 135–136, 142–143, 156, 183, 205–208, 211, 213, 215–220, 226–229, 232, 236, 239–240, 243, 245–246, 262, 325 Research, 5, 7–8, 10–11, 15, 19, 23–24, 28, 31, 36, 38, 43–44, 46, 53, 55, 57–59, 61–69, 71–72, 74–75, 77–78, 81–90, 92–93, 95, 97–100, 102, 104–110, 112–116, 121–124, 126–128, 130–131, 134, 137–139, 142–143, 145–147, 152–157, 160–161, 164, 166, 171–176, 178, 187–188, 190, 199, 202–203, 205–207, 213, 220, 226, 228–231, 234, 236, 238, 241, 247–254, 256, 258, 264, 266, 270–272, 276–279, 290, 294, 302, 309, 311, 317–318

Trainee performance, 147, 182, 185, 189, 196–197, 232, 235, 238 Training, 1–92, 94–100, 102–139, 141–157, 159–179, 181–208, 210, 212–214, 216–218, 220, 222–266, 268, 270– 280, 282–286, 288–290, 292–294, 296–300, 302, 304, 306–308, 310, 312, 314, 316–318, 327–328 Transfer of training, 9, 16, 27, 59, 62, 66, 74, 76, 80, 89, 91–92, 96–97, 99–100, 103–106, 109–119, 121–124, 142–143, 145–146, 154, 157, 239

P

U Uncertainty, 58, 68, 93, 100, 102, 126, 130, 139, 259, 271, 279–281, 284–286, 289–290, 292–295, 326

S Scoring simulations, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277

V Vestibular system, 144–145

Human Factors in Simulation and Training Human Factors in Simulation and Training: Application and Practice covers the latest applications and practical implementations of advanced technologies in the field of simulation and training. The text focuses on descriptions and discussions of current applications and the use of the latest technological advances in simulation and training. It covers topics including space adaptation syndrome and perceptual training, simulation for battle-ready command and control, healthcare simulation and training, human factors aspects of cybersecurity training and testing, design and development of algorithms for gesture-based control of semi-autonomous vehicles, and advances in the after-action review process for defence training. The text is an ideal read for professionals and graduate students in the fields of ergonomics, human factors, computer engineering, aerospace engineering, occupational health, and safety.

Human Factors in Simulation and Training Application and Practice Second Edition

Edited by

Dennis Vincenzi, Mustapha Mouloua, P. A. Hancock, James A. Pharmer, and James C. Ferraro

Front cover image: Nadezda Murmakova/Shutterstock Second edition published 2024 by CRC Press 2385 NW Executive Center Drive, Suite 320 Boca Raton, FL 33431 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2024 selection and editorial matter, Dennis Vincenzi, Mustapha Mouloua, Peter A. Hancock, James A. Pharmer, and James C. Ferraro; individual chapters, the contributors First edition published by CRC Press 2019 Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. ISBN: 978-1-032-51249-5 (hbk) ISBN: 978-1-032-51250-1 (pbk) ISBN: 978-1-003-40135-3 (ebk) DOI: 10.1201/9781003401353 Typeset in Times by Deanta Global Publishing Services, Chennai, India

Contents Preface......................................................................................................................vii Editors........................................................................................................................ix Contributors............................................................................................................ xiii Chapter 1 Controls and Displays for Aviation Research Simulation: A Historical Review.................................................................................. 1 Gloria L. Calhoun and Kristen K. Liggett Chapter 2 Augmented Reality as a Means of Job Task Training in Aviation...... 65 Dan Macchiarella, Jiahao Yu, Dahai Liu, and Dennis A. Vincenzi Chapter 3 Civil Aviation: Flight Simulators and Training.................................. 87 Ronald J. Lofaro and Kevin M. Smith Chapter 4 Integrating Effective Training and Research Objectives: Lessons from the Black Skies Series of Exercises............................ 129 Christopher Best, Gregory Funke, Winston Bennett, Michael Tolston, Simon Hosking, and Robert Bolia Chapter 5 Extended Reality in Training Environments: A Human Factors Trend Analysis................................................................................... 151 Salim A. Mouloua, Gerald Matthews, John French, and Mustapha Mouloua Chapter 6 Mitigation of Motion Sickness Symptoms by Adaptive Perceptual Learning: Implications for Space and Cyber Environments.................................................................................... 179 Mustapha Mouloua, John French, Janan A. Smither, and Robert S. Kennedy Chapter 7 Decision-Making under Crisis Conditions: A Training and Simulation Perspective......................................................................209 Jiahao Yu, Tiffany Nickens, Dahai Liu, and Dennis A. Vincenzi



v

vi

Contents

Chapter 8 Healthcare Simulation and Training................................................. 225 Sarah A. Powers and Mark W. Scerbo Chapter 9 Best Practices in Surgical Simulation............................................... 255 Dominique Doster, Christopher Thomas, and Dimitrios Stefanidis Chapter 10 Healthcare Simulation Methods: A Multifaceted Approach............. 275 Amy L. Hanson and Aaron W. Calhoun Chapter 11 Design and Development of Algorithms for Gesture-Based Control of Semi-Autonomous Vehicles............................................. 297 Brian Sanders, Yuzhong Shen, and Dennis Vincenzi Chapter 12 The Influence of New Realities: How Virtual, Augmented, and Mixed Reality Advance Training Methods in Aviation.................... 317 Graham King, Kendall Carmody, and John Deaton Chapter 13 Training, Stress, Time Pressure, and Surprise: An Accident Case Study......................................................................................... 331 Julianne M. Fox and Mustapha Mouloua Index....................................................................................................................... 347

Preface As we look toward the future, we find that it is neither totally random nor is it totally predictable. This is a truth that persists despite the years that have passed since the publication of the previous edition of this book. We maintain that, if the future were completely predictable, there would be no point looking forward because we would already know what was to come. If it were completely random, we would not bother because we could not know anything systematic about forthcoming events. That life lies between these two polar extremes gives us both the motivation to try to understand the future and the belief that we can do so, at least to a useful degree. Indeed, the triumphs of science encourage us to believe that we are making “progress” in so far as our predictions of the future. At least in relation to many physical processes, these are growing more accurate as the years progress. And, of course, the more we can know about the future, the more we can generate rational courses of action based upon this understanding. This respective confluence of ideas encourages us to develop theories, models, methodologies, and other such instruments to continue to improve our predictive capabilities. However, although certain forms of prediction work well for some of the simpler physical processes, there are many forms of complex interaction in which our predictive capacities are at present rudimentary at best. Unfortunately, many of these complex processes—global warming, for example—may prove so dangerous to our species that we cannot afford to assert predictions that are radically incorrect. Flawed prediction here can spell our end. As a consequence, we are in ever greater need of technologies that allow us to generate and refine predictions as well as exploring alternative potentialities found to be countertheses and antitheses to these various propositions. One such technology is simulation. As a tool, simulation is an aid to the imagination. It allows us to create, populate, and activate possible futures and explore the ramifications of these developed scenarios. However, in common with all tools, it performs its task only to the degree that it is open to facile interaction with the user. One can imagine that on many occasions, a poor simulation with its impoverished or wildly inaccurate outcomes might be of even more harm than good. Thus, as with all tools and technologies, we certainly need the application of the branch of science that turns user–machine antagonism into user–machine synergy. That branch of science is human factors. Hence, the focus of this present work is on human factors issues as they pertain to simulation in support of training and predicting humans to do certain tasks. In general, these issues revolve around two central themes, represented uniquely across two books. The theme of this book is pragmatic and utilitarian in nature because it reflexively asks how simulation itself can help address human factors issues. These include training, design, evaluation, testing, certification, and visualization. In this way, human factors seek to both refine and improve the technology of simulation and then, in turn, benefit from those very improvements. The chapters presented in the following text reflect these general concerns, as well as other emerging and critical vii

viii

Preface

issues. The chapters in this book pertaining to application and practice will expand on concepts of interest to the simulation and training communities regarding simulator usage, particularly with respect to the validity and functionality of simulators as training devices. These chapters provide context to the theory surrounding the use of simulation for the purposes of training or evaluating performance. Topics include controls in aviation research, simulation in surgery, and guidance for applications supporting semi-autonomous vehicle operation. Enveloping both theory and application, this book series will address in detail numerous issues and concepts pertaining to human factors in simulation, gathering this important information into two comprehensive volumes. Dennis A. Vincenzi Mustapha Mouloua P. A. Hancock James A. Pharmer James C. Ferraro

Editors Dennis A. Vincenzi earned his doctoral degree in 1998 from the University of Central Florida in Human Factors and Applied Experimental Psychology and has over 25 years of experience as a Human Factors researcher. He has been employed by Embry-Riddle Aeronautical University from 1999 to 2004, where he held the position of Assistant Professor in the Department of Human Factors and Systems in Daytona Beach. In 2004, Dr. Vincenzi left Embry-Riddle to work for the United States Navy as a Senior Human Factors Engineer at the Naval Air Warfare Center Training Systems Division (NAWCTSD) in Orlando, FL. His duties included performing Human Factors research involving simulation and training system development for a variety of Navy sea and air platforms, including the F/A 18 Hornet and Super Hornet, F-35 JSF, Los Angeles, Ohio, and Virginia class submarines, and Littoral Combat Ship (LCS). He was also heavily involved in research involving pilot selection, human performance, and ground control station design for a number of Navy, Marine Corps, and Special Operations Command Unmanned Aerial Systems (UAS). Since returning to Embry-Riddle Aeronautical University in 2012, Dr. Vincenzi has been involved in research related to UAS and regulatory requirements within the NAS and has been heavily involved in the development of an experimental gesturebased interface used for investigating user preference, usability, and functionality issues related to interface design in virtual environments. Dr. Vincenzi is currently the Program Chair for the Master of Science in Human Factors program at EmbryRiddle Aeronautical University. Mustapha Mouloua is Professor of Psychology and Director of the Transportation Research Group at the University of Central Florida (UCF), Orlando, FL. He earned his Ph.D. (1992) and M.A. (1986) degrees in Applied/Experimental Psychology from the Catholic University of America, Washington, DC. Before joining the faculty at UCF in 1994, he was Postdoctoral Fellow at the Cognitive Sciences Laboratory of the Catholic University of America from 1992 to1994, where he studied and researched several aspects of human–automation interaction topics sponsored by NASA, and the Office of Naval Research (ONR). He has over over 30 years of experience in the teaching and research related to complex human–machine systems. His research interests include vigilance and sustained attention, cognitive aging, human performance assessment, human–automation interaction, pilot–alerting systems interaction, automation and workload in aviation systems, simulation, and training in transportation systems. Dr. Mouloua made over 300 conference presentations with his undergraduate and graduate students, as well as his professional colleagues. He also has about 200 research publications and scientific reports published in journals and proceedings, such as Experimental Aging Research, Human Factors, Ergonomics, Perception and Psychophysics, Journal of Experimental Psychology: Human Perception and Performance, International Journal of Aviation Psychology, Journal of Cognitive Engineering and Decision Making, Proceedings of the Human ix

x

Editors

Factors and Ergonomics Society, Applied Ergonomics, Transportation Research Part F: Traffic Psychology and Behaviour, Ergonomics in Design, Transportation Research Record, and International Journal of Occupational Safety and Ergonomics. Together with his colleagues Raja Parasuraman and Robert Molloy, he was the winner of the Jerome Hirsch Ely Award of the Human Factors and Ergonomics Society in 1997. He was previously Director of the Applied/Experimental and Human Factors Psychology doctoral program (2008–2017). At UCF, Dr. Mouloua earned eight prestigious Teaching and Research Awards and was inducted into the UCF College of Sciences Millionaire Club for procuring over $1 million in research funds. He was awarded a UCF “Twenty Years’ Service” award in 2014, was awarded the UCF International Golden Key and Honorary member status in 2011, and his research was selected to be among the top 30 best published research articles in the last 50 years by the Human Factors and Ergonomics Society in 2008. P. A. Hancock, D.Sc., Ph.D., is Provost Distinguished Research Professor in the Department of Psychology and the Institute for Simulation and Training, as well as at the Department of Civil and Environmental Engineering and the Department of Industrial Engineering and Management Systems at the University of Central Florida (UCF). At UCF in 2009, he was created the 16th ever University Pegasus Professor (the Institution’s highest honor) and in 2012 was named 6th ever University Trustee Chair. He directs the MIT2 Research Laboratories. He is the author of over 1,100 refereed scientific articles, chapters, and reports as well as writing and editing more than 25 books. He has been continuously funded by extramural sources for every one of the forty years of his professional career. This includes support from NASA, NSF, NIH, NIA, FAA, FHWA, NRC, NHTSA, DARPA, NIMH, and all of the branches of the US Armed Forces. He has presented or been an author on over 1,200 scientific presentations. In association with his colleagues Raja Parasuraman and Anthony Masalonis, he was the winner of the Jerome Hirsch Ely Award of the Human Factors and Ergonomics Society for 2001, the same year in which he was elected a Fellow of the International Ergonomics Association. In 2006, he won the Norbert Wiener Award of the Systems, Man and Cybernetics Society of the Institute of Electrical and Electronics Engineers (IEEE), being the highest award that Society gives for scientific attainment. He is a Fellow and past President of the Human Factors and Ergonomics Society and a Fellow and twice past President of the Society of Engineering Psychologists as well as being a former Chair of the Board of the Society for Human Performance in Extreme Environments. Most recently he has been elected a Fellow of the Royal Aeronautical Society (RAeS) and in 2016 was named the 30th Honorary Member of the Institute of Industrial and Systems Engineers (IISE). He currently serves as a member of the United States Air Force, Scientific Advisory Board (SAB), and has also served on the US Army Science Board (ASB). He is also a Fellow of AAAS and IEEE. James Pharmer is the Chief Scientist for the Research, Development, Test, and Evaluation (RDT&E) Department and the Head of the Experimental and Applied Human Performance Research and Development (R&D) Division at the Naval Air

Editors

xi

Warfare Center Training Systems Division (NAWCTSD) in Orlando, Florida.. He is a Naval Air Warfare Center Aviation Division (NAWCAD) Fellow and has over 20 years of experience in training and human performance R&D for advanced military systems across a variety of warfare domains. His work includes conducting R&D and direct participation on systems acquisition teams to support human systems integration (HSI) implementation for Navy ships, aircraft, and systems. He chairs multiple working groups to develop HSI policy, processes, and education. He holds a doctoral degree in Applied Experimental Human Factors Psychology from the University of Central Florida and a master’s degree in Engineering Psychology from the Florida Institute of Technology. James C. Ferraro is a human factors research scientist specializing in simulation and game-based assessment of human performance in complex systems. He earned his Ph.D. in Human Factors and Cognitive Psychology from the University of Central Florida (UCF) in 2022 and his M.A. in Applied Experimental and Human Factors Psychology from UCF in 2019. Dr. Ferraro has led and contributed to a number of research efforts in support of government-sponsored (NAVAIR, USAF) projects to improve training and selection of personnel in various occupations. Areas include air traffic control, tactical urban warfare, explosive ordnance disposal, special forces rotary wing operations, and unmanned aircraft operations. His research on topics such as pilot/operator attentional strategies, trust in automated systems, and predictors of individual performance has been presented at local, regional, and international conferences (Human Factors and Ergonomics Society Annual Meeting, International Symposium on Aviation Psychology, Conference on Applied Human Factors and Ergonomics) and published in multiple academic journals (Ergonomics, Applied Ergonomics). He is the technical editor of the two-volume book set Human Performance in Automated and Autonomous Systems (2019) and the co-author of published book chapters pertaining to human monitoring of automated systems and the role of trust in unmanned vehicle operations. Dr. Ferraro is currently a Senior Research Scientist with Adaptive Immersion Technologies, based in Tampa, FL.

Contributors Winston Bennett Air Force Research Laboratory, Warfighter Interactions and Readiness Division Wright-Patterson Air Force Base, OH Christopher Best Human and Decision Sciences Division, Defence Science and Technology Group Melbourne, Australia Robert Bolia Air and Space Division, Defence Science and Technology Group Melbourne, Australia Aaron W. Calhoun University of Louisville School of Medicine Louisville, KY Gloria L. Calhoun Air Force Research Laboratory (retired) Wright-Patterson Air Force Base, OH Kendall Carmody Florida Institute of Technology Melbourne, FL States

John French Embry-Riddle Aeronautical University Daytona Beach, FL Gregory Funke Air Force Research Laboratory Wright-Patterson Air Force Base, OH Amy L. Hanson University of Louisville School of Medicine Louisville, KY Simon Hosking Human and Decision Sciences Division, Defence Science and Technology Group Melbourne, Australia Robert S. Kennedy* Graham King Florida Institute of Technology Melbourne, FL Kristen K. Liggett Air Force Research Laboratory (retired) Wright-Patterson Air Force Base, OH

John Deaton Florida Institute of Technology Melbourne, FL

Dahai Liu Embry-Riddle Aeronautical University Daytona Beach, FL

Dominique Doster Indiana University School of Medicine Indianapolis, IN

Ronald J. Lofaro* Federal Aviation Administration (retired)

Julianne M. Fox Decision Speed, Inc. Larkspur, CA

Nickolas D. Macchiarella Embry-Riddle Aeronautical University Daytona Beach, FL xiii

xiv

Contributors

Gerald Matthews George Mason University Fairfax, VA

Kevin M. Smith United States Navy (retired) Mesquite, NV

Mustapha Mouloua University of Central Florida Orlando, FL

Janan A. Smither University of Central Florida Orlando, FL

Salim A. Mouloua George Mason University Fairfax, VA

Dimitrios Stefanidis Indiana University School of Medicine Indianapolis, IN

Tiffany Nickens National Aeronautics and Space Administration (NASA) Washington DC

Christopher Thomas Indiana University School of Medicine Indianapolis, IN

Sarah A. Powers Old Dominion University Norfolk, VA

Michael Tolston Air Force Research Laboratory Wright-Patterson Air Force Base, OH

Brian Sanders Embry-Riddle Aeronautical University Daytona Beach, FL

Dennis A. Vincenzi Embry-Riddle Aeronautical University Daytona Beach, FL

Mark W. Scerbo Old Dominion University Norfolk, VA

Jiahao Yu Embry-Riddle Aeronautical University Daytona Beach, FL

Yuzhong Shen Old Dominion University Norfolk, VA

* The editors would like to pay their respects to Dr Robert S. Kennedy and Dr Ronald J. Lofaro, who sadly passed away prior to publication of this book project. We are very grateful for their contributions and dedication to the field of human factors in simulation and training.

1

Controls and Displays for Aviation Research Simulation A Historical Review Gloria L. Calhoun and Kristen K. Liggett

CONTENTS Disclaimer................................................................................................................... 3 Introduction................................................................................................................. 3 Fixed-Based Simulators.............................................................................................. 4 Integrated Information Presentation and Control System Study (IIPACSS) Simulator, Boeing, Seattle, WA........................................... 4 Display Technology............................................................................... 5 Control Technology............................................................................... 5 Representative Research........................................................................ 5 Impact��������������������������������������������������������������������������������������������������� 7 Digital Synthesis (DIGISYN) Simulator........................................................... 7 Display Technology............................................................................... 7 Control Technology............................................................................... 8 Representative Research........................................................................ 9 Impact������������������������������������������������������������������������������������������������� 11 Microprocessor Applications for Graphics and Interactive Communication (MAGIC) Simulator.................................................. 12 Display Technology............................................................................. 12 Control Technology............................................................................. 13 Representative Research...................................................................... 13 Impact������������������������������������������������������������������������������������������������� 14 Panoramic Cockpit Control and Display System (PCCADS) Simulator........ 14 Display Technology............................................................................. 15 Control Technology............................................................................. 15 Representative Research...................................................................... 16 Impact������������������������������������������������������������������������������������������������� 19 Helmet-Mounted Oculometer Facility (HMOF) Simulator.............................20 Display Technology.............................................................................20 Control Technology.............................................................................20

DOI: 10.1201/9781003401353-1

1

2

Human Factors in Simulation and Training

Eye and Head Monitor......................................................................... 21 Representative Research...................................................................... 21 Impact������������������������������������������������������������������������������������������������� 23 Synthetic Interface Research for UAV Systems (SIRUS) Simulator...............24 Display Technology.............................................................................25 Control Technology.............................................................................25 Representative Research......................................................................25 Impact������������������������������������������������������������������������������������������������� 27 Vigilant Spirit Control Station (VSCS) Simulator........................................... 27 Display Technology............................................................................. 27 Controls Technology............................................................................28 Representative Research......................................................................28 Sense and Avoid (SAA) Display Symbology Evaluation....................28 Cyber Threat Information Requirements Investigation for UAV Crews...................................................................... 29 Impact������������������������������������������������������������������������������������������������� 30 Intelligent Multi-Unmanned Vehicle Planner with Adaptive Collaborative/Control Technologies (IMPACT) Simulator................. 30 Display Technology............................................................................. 31 Control Technology............................................................................. 31 Representative Research...................................................................... 32 Impact������������������������������������������������������������������������������������������������� 34 Motion-Based Simulators......................................................................................... 35 Dynamic Environmental Simulator (DES)...................................................... 35 Display Technology............................................................................. 36 Control Technology............................................................................. 36 Representative Research...................................................................... 36 Impact������������������������������������������������������������������������������������������������� 38 Disorientation Research Device....................................................................... 38 Display Technology............................................................................. 38 Control Technology............................................................................. 39 Representative Research...................................................................... 39 Impact������������������������������������������������������������������������������������������������� 39 In-Flight Simulators.................................................................................................. 39 NASA’s OV-10.................................................................................................40 Representative Research......................................................................40 Impact������������������������������������������������������������������������������������������������� 42 Total In-Flight Simulator (TIFS) NC-131H Transport Aircraft....................... 43 Representative Research...................................................................... 43 Impact������������������������������������������������������������������������������������������������� 44 Variable In-Flight Stability Test Aircraft (VISTA) Lockheed NF-16D Fighter Aircraft....................................................................................44 Representative Research...................................................................... 45 Impact������������������������������������������������������������������������������������������������� 47

Controls and Displays for Aviation Research Simulation

3

University of Iowa Operator Performance Laboratory Aero L-29 Delfin Jet..... 48 Representative Research...................................................................... 49 Impact������������������������������������������������������������������������������������������������� 52 Multisensory Displays and Controls......................................................................... 52 Displays and Controls to Support Human–Machine Teaming.................................. 54 Summary................................................................................................................... 55 Acknowledgment...................................................................................................... 56 References................................................................................................................. 56 Acronyms/Abbreviations.......................................................................................... 62

DISCLAIMER The views expressed are those of the authors and do not reflect the official guidance or position of the United States Government, the Department of Defense, or of the United States Air Force. Statement from DoD: The appearance of external hyperlinks does not constitute endorsement by the United States Department of Defense (DoD) of the linked websites, or the information, products, or services contained therein. The DoD does not exercise any editorial, security, or other control over the information you may find at these locations.

INTRODUCTION This chapter will trace how controls and displays used in research simulators changed from 1970 through the present to effectively evaluate new crew station technologies for Air Force combat aircraft systems. The early 1970s marked the dawn of the electro-optical (E-O) era in aviation simulators. Actually, there were investigations utilizing E-O instruments as early as the 1930s. For example, in 1937, a cathode ray tube (CRT) based E-O display called the Sperry Flightray was evaluated on a United Airlines’ Flight Research Boeing (Bassett & Lyman, 1940). Over the next several decades, E-O displays were slowly integrated into predominately electromechanical (E-M) designs, such that pilots (private, commercial and military) were flying cockpits that incorporated a mix of E-M and E-O instruments. Thus, the time boundaries between the E-M and E-O approaches are very vague, even though the design boundaries are clear (Nicklas, 1958). By the early 1970s, although the majority of operational aircraft contained cockpits based on E-M instruments, cockpit designers were seriously considering the design of cockpits based primarily on E-O displays. Their research during this time frame had a definite influence on aircraft. For instance, the US Navy’s F-18 aircraft introduced in 1983 made extensive use of multifunction CRT displays. As part of this chapter, several research simulators will be described to illustrate the evolution of control and display technology; also, some lessons learned from the experiments carried out in these simulators will be cited. Unless otherwise stated, all of these simulators were (or are currently) at Wright-Patterson Air Force Base, Ohio.

4

Human Factors in Simulation and Training

Finally, the chapter will present changes that are anticipated in control and display technology for future simulations.

FIXED-BASED SIMULATORS Integrated Information Presentation and Control System Study (IIPACSS) Simulator, Boeing, Seattle, WA At the start of the 1970s, the state-of-the-art in cockpit instrumentation was exemplified by the F-4 Phantom. The Phantom’s two crew stations were composed of all E-M instruments, with the exception of a few single-function CRTs. However, advances in avionics were enabling the inclusion of an increasing number of computer-based functions within aircraft. If all of these additional computer-based functions had to be accessed through single-function E-M instruments, there would not be enough room in the crew station to accommodate all of required controls and displays. Moreover, it was likely that locations outside of the pilot’s primary reach and vision envelope would have to be used. Thus, a new approach to the design of fighter crew stations was clearly needed. One means of preventing the pilot from becoming overloaded was to restrict the information by “time sharing” controls and displays so that only the information, relevant to the pilot’s current task, was available. This restriction led to a change in design requirements. “The requirement exists to develop an integrated control and display system that will present only essential information in a format that can be translated easily by its user into direct control inputs” (Zipoy et al., 1970, p. 1). The answer was to substitute multifunction E-O displays for single-function E-M instruments. However, the question was, “How well can the operators use these new types of displays and their associated controls?” Research in the IIPACSS simulator (Figure 1.1) arose from a need to verify that the new approach (utilizing E-O displays that combined many of the functions of separate E-M instruments) would not degrade operator performance.

FIGURE 1.1  Integrated Information Presentation and Control System Study (IIPACSS) simulator, circa 1970. (Boeing photo produced under US Air Force Contract # F33615-73-C-1201.)

Controls and Displays for Aviation Research Simulation

5

Display Technology This simulator contained one color and six monochrome CRTs that were multifunctional in the sense that numerous menus and pictorial formats could be presented on the same device at different phases of the mission. Although an out-thewindow scene was available during experiments, it was not the computer projection that we know today. It was a film taken from a camera in an aircraft that flew a particular route. The film was run backward to give the illusion that the aircraft was indeed flying the route that was preprogrammed. However, about three years later, a terrain board coupled with a camera that flew over the board provided the out-thewindow scene for the pilot. Control Technology Besides the required throttle and flight controls, the simulator contained a great number of switches. If you examine Figure 1.1 carefully, you will see that there are 141 push-button switches in this simulator. In the early 1970s, multifunction switch technology was not yet available. Representative Research Introduction: Based on an initial evaluation of the cockpit seen in Figure 1.1, it was clear that work was needed on the design of multifunction keyboards, as well as other aspects of the cockpit, such as intuitive display formats. This study (Willich & Edwards, 1975) addressed these issues. This study also incorporated the functions of the A-7D operational crew station because it had the more sophisticated avionics systems at that time. After a detailed functional analysis was performed on the A-7D, whose crew station contained primarily E-M instruments, those functions were then assigned to various multifunction displays and multifunction keyboards. The cockpit in Figure 1.1 was modified to incorporate the functions of the A-7D as well as lessons learned from the initial pilot evaluations conducted with this older, unmodified simulator. An outline of the front instrument panel of the modified cockpit appears in Figure 1.2. The objective of this experiment was to evaluate a head-up display (HUD) format, different display formats for the horizontal situation display, and a newly designed multifunction keyboard. In this section, we will discuss only the results of the multifunction keyboard evaluation. (For other aspects of the study, see Willich and Edwards, 1975.) The detailed objectives relative to the keyboard were to “evaluate the utility of the multifunction keyboard in terms of matrix size, number of integrated functions, logic indenture levels, and operational suitability in accomplishing the mini mission scenarios” (Willich & Edwards, 1975, p. 67). A mini mission is a flight phase, for example, air-to-ground. To understand the detailed objectives, some explanation of how the multifunction keyboard was constructed is required. Multifunction Keyboard: At the time of this study, bezel-mounted switches with legends that appeared on the display surface (most Automatic Teller Machine [ATM] designs), had not yet been envisioned. However, the mechanism used by ATMs to obtain a cash advance in that one proceeds through multiple levels of menu logic,

6

Human Factors in Simulation and Training

FIGURE 1.2  Modified Integrated Information Presentation and Control System Study (IIPACSS) simulator. (Boeing photo produced under US Air Force Contract # F33615-73-C-1201.)

was employed in the following manner. To organize the over 100 switches in Figure 1.1, switches were created that had a limited number of legends – 12 legends per button in this case. A 4 × 6 matrix of these switches was created, thereby allowing a total of 288 switch legends. Two identical matrix-style keyboards were placed to the left and to the right of the horizontal situation display (bottom-large CRT) to allow operation by either hand (Figure 1.2). The keyboard worked in the following manner: As power was applied to the multifunction keyboard, the button legends on the top row showed the names of the major systems onboard the aircraft, such as communication, navigation, sensors, etc. When one of the top row buttons was pushed, for example the communication (COMM) button, the various types of radios (ultrahigh frequency [UHF], very high frequency [VHF], identify friend or foe [IFF], etc.) would then appear as legends on the keyboard, and the previous legends would disappear. The pilot would then be at the second logic level. If the pilot then pushed the UHF button, the sub-functions of the UHF would be shown (third logic level), and, as before, the previous legends would disappear. Successive changes of legends on the buttons allowed the pilot to proceed through various keyboard logic levels.

Controls and Displays for Aviation Research Simulation

7

However, the status of the radios (e.g., the current tuned frequency of the radio) did not appear on the keyboard; the radio status was shown on the small CRTs located at the top portion of the instrument panel. Test Procedures: All eight pilot participants were experienced in fighter or attack aircraft. Each flew three mini missions: air-to-air, air-to-ground, and instrument landing. During these mini missions, the pilots used the multifunction keyboard in normal and degraded modes (e.g., a CRT failure) to perform tasks involving the communications, navigation, sensors, and aircraft subsystem functions. Results: An examination of the three mini-missions found that there was no significant difference between normal and degraded mode performance in the air-toair and air-to-ground mini-missions. In the landing mini mission, it took significantly longer to enter a radio channel change in the degraded mode. The pilots also filled out a questionnaire to obtain opinions on the utility of the multifunction keyboard and associated multifunction displays. Pilots felt the multifunction keyboard was very easy to operate, failures were easy to correct, and it was equally suitable for day and night use. However, they were evenly split as to the efficiency of the keyboard. Those who liked the keyboard were especially fond of the compact nature (combining several functions into a fewer number of switches). However, those pilots who did not like the keyboard felt they could access singlefunction switches more quickly than going through the four levels of menu logic required with the multifunction keyboard to access some functions. Conclusions: There were two basic conclusions from the research performed in this experiment: (1) the functions of a state-of-the-art aircraft (at the time) could be successfully incorporated into a multifunction crew station and (2) the multifunction keyboard, coupled with its corresponding CRT status displays, was a viable means of performing tasks needed to accomplish the functions. However, the optimization of the relationships between the keyboard and the corresponding CRT status displays had not yet been achieved. The pilots manipulated the switch legends on the keyboard matrix (see Figure 1.2), but the changed functions appeared on the CRTs located at the top portion of the instrument panel. As the CRTs were a considerable distance from the multifunction keyboard, increased scanning time to verify the correct task had been performed was required. Impact This simulation was conducted as part of the US Air Force’s Digital Avionics Information System Program. The US Navy had a similar program called the Advanced Integrated Display System Program. The research conducted by these two programs served as the basis for the E-O crew stations we see today in modern aircraft.

Digital Synthesis (DIGISYN) Simulator Display Technology This simulator contained a HUD (but no external visual scene) and from four to six multifunction head-down CRTs, depending on the evaluation. The two CRTs in the center front panel (vertical situation display and horizontal situation display) were

8

Human Factors in Simulation and Training

color, as well as the left upper CRT in some studies. The other head-down CRTs were monochrome. For some experiments, a cluster of E-M engine instruments on the upper right front panel was employed. Besides these few E-M displays, the majority of displays were E-O, which offered a great deal of flexibility in crew station design. First, a particular format could be presented on any of the CRTs, and one evaluation focused on this advantage. Pilot performance was examined with eight arrangements of display formats depicting vertical situation, horizontal situation, and status information. The results failed to show a performance decrement across arrangements, demonstrating that this is a viable option, should one of the E-O displays fail during flight (Calhoun et al., 1980). With the flexibility afforded by computer-driven displays, the available graphics capability could also be exploited, rather than just transferring dedicated E-M display formats onto E-O surfaces. Moreover, the formats could be designed to integrate information from several dedicated E-M displays onto a single E-O display. However, to ensure that the resultant format provided information in a manner that the pilot could quickly assimilate and respond to, extensive research was required to determine which type of format and level of abstraction (e.g., alphanumeric, graphic, schematic, or pictorial) was best for the pilot’s specific task. Research was also required to examine whether color should be employed in computer-generated imagery, beyond the conventional sky and ground coding of the attitude director indicator (ADI) sphere and colors (green/amber/red) used in the aircraft advisory system. For several years, DIGISYN supported such display format evaluations. The section “Representative Research” provides a summary of a few studies examining the use of color coding. Control Technology Besides a joystick and throttle for flight control, the simulator utilized a combination of single-function controls (e.g., a telephone-style keyboard, forward of throttle on left console) as well as the multifunction control (lower-left front panel). Each switch of a multifunction control addressed logic that both determined the function of the switches and initiated the execution of those functions when the switches were selected. Obviously, if the function of a switch changed, it was important that its current function be displayed. To reflect what operation they controlled, multifunction switch legends changed, using one of two technologies available at that time: projection switches (Figure 1.3a) and CRT-based bezel-mounted switches (Figure 1.3b). Projection switches contained a filmstrip with a series of light bulbs behind the strip. Based on the legend desired, the computer sent a signal to the appropriate light bulb, thereby lighting up the correct legend. A limitation of this technology was that only ten legends would fit on the filmstrip that was below the switch surface. Further, if a different legend was desired other than the current ten, a new filmstrip had to be created. In CRT-based multifunction controls, the switches are adjacent to the bezel of a CRT. Thus, the switches could have as many legends as the CRT could generate and changing a legend only involved a software modification. However, this technology

Controls and Displays for Aviation Research Simulation

9

FIGURE 1.3  (a) Digital synthesis (DIGISYN) simulator with projection switches, circa 1976. (US Air Force photo.) (b) Digital synthesis (DIGISYN) simulator with CRT-based bezel-mounted switches, circa 1976. (U.S. Air Force photo.)

also had limitations. Because of the switch depth, switches could not be mounted on the bezel itself, but rather had to be mounted outboard. The distance between the switch and its corresponding legend could result in parallax problems at certain viewing angles or seat adjustments, making the association of a switch to a displayed legend ambiguous and not immediately apparent. Several investigations examining how best to implement a multifunction control were conducted with this simulator. More specifically, this research: • Compared projection switch-type multifunction control to CRT bezelmounted switches, and evaluated their location in the cockpit (Reising, 1977); • Compared two logic design implementations, that is, branching logic for each individual aircraft system versus tailored logic that presents options that are most likely needed options for the current flight phase (Herron, 1978); and • Generated design criteria for multifunction controls (e.g., how to label switches, implement switching logic, maximize the accessibility of frequently used functions, optimize switch and function assignment, and minimize hand motion; Calhoun, 1978; Calhoun & Herron, 1982). Representative Research Introduction: Prior to the availability of the DIGISYN simulator, the majority of research examining the utility of color coding used participants who devoted their full attention to the color display and performed single relatively simple tasks (Christ, 1976). This research also showed that the impact of color coding is highly situation specific and depends on a number of diverse factors such as operator task, display medium, and display environment (Krebs et al., 1978). DIGISYN, with color E-O displays, was an ideal platform to examine the utility of color coding on formats that were used in a somewhat peripheral manner as the highly loaded pilot also performed multiple complex tasks.

10

Human Factors in Simulation and Training

Test Procedures: Similar procedures were used in three separate experiments to examine the utility of color coding. At least 16 A-7D pilots participated in each experiment. After training, pilots flew one or more flights with each of the conditions being examined in the respective experiment. The mission tasking was designed to represent the workload present in operational flights. Pilots were required to maintain flight parameters (using the HUD as the primary flight display) as well as complete communications, navigation, and weapons tasks using a multifunction control and keypad. Also, pilots had to respond to information retrieval questions that required them to utilize the display format under evaluation. With the number of ongoing and intermittent tasks, pilots only had time to quickly glance at the format under evaluation to retrieve requested information. Performance on all tasks was recorded. Subjective comments were also obtained with questionnaires. Color Formats: Three different display formats were evaluated in separate experiments examining the utility of color coding: threat format (Kopala, 1979), engine format (Calhoun & Herron, 1981), and weapons format (Aretz & Calhoun, 1982). Threat Format: This format appeared on the color CRT directly below the HUD. Besides navigation information, symbology was presented to denote locations of aircraft (symbol “.”), surface-to-air missiles (“S”) and anti-aircraft artillery (“A”). Each symbol was augmented with a state designator, one of three shapes to denote friendly, unknown, or hostile. These states were color coded in one condition: green, yellow, and red respectively. The two coding conditions (shape-coded symbology versus shape- and color-coded symbology) were tested under three different symbol density levels: 10, 20, and 30 symbols. Engine Format: On the upper-left CRT, each of eight engine parameters was represented by a box that contained the current parameter value. Vertical rectangular bars extended from the top or bottom of the boxes, as the corresponding parameter deviated from the normal operating range midpoint. All parameters were normalized to the same range for easier interpretation. Normal, cautionary, and emergency states were indicated by shade and flash codes on the monochrome format (unfilled bar/ white bar/flashing white bar) and by color codes on the color format (green/yellow/ red). Performance on retrieval of engine information was recorded for the two CRT engine formats (monochrome and color), as well as a cluster of conventional E-M instruments (fuel flow, turbine outlet temperature, RPM, oil pressure, oil quantity, and three hydraulic pressure indicators) on the right front instrument panel. These E-M instruments operated as in conventional cockpits, with colored tape to denote operating ranges. For all three-format conditions, the simulation included implementation of the master caution indicator and corresponding messages on failed parameters. Weapons Format: In three of four experimental conditions evaluated, the upperleft CRT presented information pertaining to all the weapons onboard the aircraft, as well as information pertaining to the weapon option selected. The format consisted of a white planform against a darker background. Shapes on the planform presented the weapons onboard, and a different shape was used for each type of weapons, one shape for each of six different types. The station from which the selected weapons would be delivered was indicated by the location of the symbols on the planform.

Controls and Displays for Aviation Research Simulation

11

Line/flash (monochrome) or shade (color) coding were used to code the status of each selected weapon. This included weapons selection status, master arm switch activation, drop mode, interval, weapon fuzing, release status, and presence of a hung bomb. Besides the monochrome-coded and the color-coded pictorial formats, an alphanumeric format was also evaluated that presented information on the CRT used in the multifunction control. In a fourth condition, both the alphanumeric and color pictorial formats were presented. Results: In the experiment that utilized a threat format, the results showed a 40% increase in time to identify friendly, unknown, and hostile symbols when monochrome shape coding was used, compared to redundant color-coding. The effectiveness of redundant color coding became more pronounced as the symbol density of the threat format increased. Performance with the monochrome-coded pictorial weapons format was also found to be significantly worse than the color pictorial format. Moreover, the monochrome format was also worse than the alphanumeric format and the combined alphanumeric and color pictorial format. The subjective data showed pilot preference for the combined format. One pilot commented that the pictorial format helped one acquire situation awareness with a quick glance, with the alphanumeric information as a backup if there was any confusion. Different results were obtained with the engine format. There were no significant performance differences between the monochrome and color CRT formats. With regard to having the engine information integrated onto a single format versus the conventional array of E-M instruments, both CRT formats were superior as measured by pilots’ speed and accuracy in identifying failed engine parameters. Conclusions: The results from these experiments concur with the literature review provided by Reising and Calhoun (1982). Color coding resulted in performance improvements when the format was unformatted, highly dense, involved a search for relevant information, and had a logical relationship between color and the tasks. Both the threat and weapons formats can be viewed as dense and unformatted (e.g., 30 threat symbols at some levels and weapons information changed depending on weapon option). Both formats also involved an active search for information, either to find a particular threat or determine a parameter of a weapon store. The color coding also had a relationship to the task (e.g., red for hostile threat and hung bomb). The engine format, in contrast, was a simple display with the information clearly shown in histograms. The location of a specific parameter was constant: the corresponding box at the center of the display. Additionally, the master caution alerting system served as an additional cue of abnormal states. Thus, monochrome codes were sufficient for the E-O format presenting engine information; color coding did not show a payoff. Impact The DIGISYN can be viewed as one of the earliest test-beds primarily based on multifunction technology; the cockpit featured multifunction control and the majority of displays were E-O. Thus, this simulator was ideal for research focused on exploiting the advantages of computer-based controls and displays. As a result of the numerous experiments that were conducted over many years, the utility of multifunction

12

Human Factors in Simulation and Training

controls and integrated formats on multifunction displays was demonstrated. Many design guidelines were identified in the process as well. Without question, the research conducted with the DIGISYN was a strong contributor to the glass-cockpit crew station designs operational today.

Microprocessor Applications for Graphics and Interactive Communication (MAGIC) Simulator The MAGIC simulator was employed to conduct part-task pilot-in-the-loop research studies investigating pilot–vehicle interfaces for cockpit applications (Figure 1.4). The simulator was a single-seat fighter shell. Six computers were used to support the simulation: three personal computers (PCs) and three graphics workstations. The fact that MAGIC relied on PCs for its operation demonstrates the low-cost aspect of this type of simulator. The cockpit was outfitted with various off-the-shelf products over the years for the purpose of comparing different controls and displays. Studies included the use of various HUD symbology sets to recover from unusual attitudes (Reising et al., 1988), and pathway-in-the-sky HUD symbology for complex, curved approaches and landings (Reising et al., 1995). Additional studies (see the section “Representative Research”) compared the use of three-dimensional (3-D) joysticks, touch screens, and speech recognition to designate targets. Display Technology MAGIC contained five color CRTs that provided dynamic graphics capability. Typical displays for the head-down CRTs were system status formats, computerized checklists, radar sensor displays, and digital images from laser disks. There was no

FIGURE 1.4  Microprocessor Applications for Graphics and Interactive Communication (MAGIC) simulator, circa 1985. (US Air Force photo.)

Controls and Displays for Aviation Research Simulation

13

out-the-window visual scene, so subjects used the topmost monitor to view HUD symbology. The center CRT, typically containing a moving map display, could be exchanged with a 3-D display that pilots could use to view images with liquid crystal display (LCD) shutter glasses. Control Technology An F-16 side-mounted limited-displacement control stick was used to fly the F-16 aeromodel for the flight tasks. The stick also contained a weapon’s release button, a trigger, and a pitch-trim switch. An A-7 aircraft throttle was also employed and included speed brakes and communications switches. The cockpit also contained three banks of four programmable display push buttons each. These were pixel-addressable light-emitting diode displays capable of displaying alphanumeric or pictorial information. There was also a bank of four multicolor switches below the topmost monitor. Three of the CRTs housed touch screen overlays that were used as control interfaces to change the graphics on the screen. MAGIC also contained various speech recognition systems, again used as a control interface to change displays for the pilot. Other control devices included a magnetic tracker and an ultrasonic tracker attached to the pilot’s glove to manipulate cursor control in 3-D space. Voice Systems: Over the years, MAGIC hosted different speech recognition systems, each with its own strengths and weaknesses. It was this observation of several systems that led to a unique study geared at increasing the recognition accuracy of state-of-the-art speech system of that time (Barry et al., 1992). The idea was to combine the strengths of each of three individual systems (two working in isolated mode and one in connected-speech mode) to increase recognition accuracy in the following manner. When a person spoke a word, all three systems reported a best-guess word, a second-choice word, and a distance score for each of the two words reported. A “majority rules” algorithm was implemented that determined the word recognized as a best guess by the majority of the three systems. If there was no majority, the second-choice words were added to the set of words, and a majority was looked for again. Finally, if there was still no majority, the response with the lowest distance score was reported. Using this algorithm, word recognition accuracy increased from 92.99% (the average of the three systems’ individual accuracies) to 99.43% (the accuracy using the “majority rules” algorithm). Representative Research Introduction: One series of experiments conducted in MAGIC evaluated methods for designating targets residing in a stereoscopic 3-D volume. This research was published in several articles (Barthelemy et al., 1991; Liggett et al., 1993; Reising et al., 1992; and Solz et al., 1994). Three different cursor-control devices were used to designate targets. These included a three-axis joystick, an ultrasonic tracking device, and a voice control system. The ultrasonic tracking device was attached to a glove, and participants moved the cursor using this device by pointing to the target

14

Human Factors in Simulation and Training

of interest. Because this type of task requires both gross and precise positioning, two aiding techniques were implemented. One was simple in that a color change of the target was instituted when the cursor penetrated the target area, informing the participants that the cursor was indeed in the same physical space as the target. The other aiding algorithm, referred to as enhanced aiding, used a mathematical algorithm to compute the distance from the cursor to the target closest to it as it traversed the 3-D space. Once this distance was computed, the closest target was highlighted, thus eliminating the need for precise positioning (Osga, 1991). The algorithm continuously computed the distance between the cursor and all targets in the depth volume. The 3-D volume within which targets and the cursor interacted extended from 7 in. in front of the physical display surface to 15 in. behind the physical display surface. Participant performance differences based on two target densities were also investigated. Results: Results showed that the hand tracker provided the best performance with respect to total target designation time. Both target designation time and accuracy were improved with the enhanced aiding approach. A speed–accuracy trade-off was observed when the density variable was analyzed; the low-density condition provided faster total target designation times, but the high-density condition had fewer errors. Impact This cockpit simulation provided a consistent, uniform experimental environment for conducting a number of part-task evaluations. Consistency is especially important in stereographic evaluations as the distance between a participant’s eyes and the display affects image disparity and, therefore, perceived stereographic effect. Also, the versatility of the cockpit supported the easy integration of the various control devices. The evaluations supported by this simulation also enabled the investigation of alternatives to traditional control and display devices for cockpit tasks that were becoming more challenging as information being presented on the traditional twodimensional (2-D) displays increased. As such, pilots’ visual processing capabilities were being overloaded, and 3-D displays offered a potential solution to this problem. However, introducing this type of display also introduces control challenges. This simulator facilitated the evaluation of numerous control techniques that may compensate for new control issues associated with the incorporation of 3-D displays in future cockpits.

Panoramic Cockpit Control and Display System (PCCADS) Simulator During the second half of the 1980s, the continued maturation of larger flat-panel displays (e.g., LCDs) started researchers thinking about the design of a cockpit in that the whole front panel would be a single display. As with any new technology many questions arose, such as: “What is it the best way to optimize the design of the crew station when operators can place display formats wherever they wish?” In addition, new display formats could extend across either the entire or a part of the display

Controls and Displays for Aviation Research Simulation

15

FIGURE 1.5  Panoramic Cockpit Control and Display System (PCCADS) simulator, circa 1988. (US Air Force photo.)

such as a half, a third, etc. Also, improvements in helmet-mounted displays (HMDs) warranted investigation of how they would interact with the new displays available with the single instrument panel. The PCCADS research simulator was developed to evaluate the potential improvements in mission effectiveness by providing a large color display area and the effectiveness of including an HMD and a helmet-mounted sight (HMS) in the cockpit (Figure 1.5). Display Technology The HMD provided airspeed, altitude, attitude, heading, and weapon status cues. It also portrayed a line of sight (LOS) for radar, and one for a weapon seeker. This was accomplished with the use of a magnetic head-tracker. The tracker-head position data and status information were used to point the simulated radar antenna or the simulated weapon seeker. The head-down portion of the simulator was unique in that it was one large display (18 × 24 in.) containing an integrated picture of mission-essential information. This also allowed for the rapid reconfiguration of numerous head-down configurations. For example, via software, the head-down display could be configured to look like an F-15E or an F-16 cockpit instrument panel. PCCADS employed a projection system to show a realistic out-the-window scene (37 degrees horizontal by 27 degrees vertical field-of-view [FOV]), driven by a stateof-the-art graphics generator. Control Technology The simulator employed a touch-sensitive overlay in order to manipulate switches displayed, as well as present formats showing aircraft attitude and flight status in

16

Human Factors in Simulation and Training

various locations on the head-down display. Speech recognition and control were other options for interaction. An F-15E stick and throttle, along with their additional switches and buttons, provided “HOTAS” functionality, that is, function selection while keeping the pilots’ “hands on throttle and stick.” Representative Research Introduction: The research discussed in this section dealt with one of the basic aspects of flying – maintaining flight safety when there is no dedicated head-down primary attitude indicator (AI). At the time this research was conducted, there was a definite desire to provide the pilot with as much mission-related information as possible. There was a second idea that the HUD could be used as the primary flight reference display and be substituted for a head-down primary AI. However, there was a concern that loss of attitude awareness (a potential flight safety problem) could result. The Evolution of the Background Attitude Indicator: With limited panel space, one design solution was to decrease the size of the ADI and move it out of the primary viewing area, with pilots employing the HUD as the primary flight display. Researchers at Lockheed, Ft. Worth (Spengler, 1988) designed an alternate approach. They created a background attitude indicator (BAI) format designed with the goal of replacing the conventional dedicated head-down ADI while maintaining flight safety. The BAI uses a ¾ in. electronic border around the outer edge of a headdown display format. The evolution of this concept is illustrated in Figure 1.6 and its implementation is shown in Figure 1.7.

FUSELAGE DOT

HORIZON LINE

AIRCRAFT WINGS

TYPICAL ATTITUDE INDICATOR AND DIGITAL READOUTS

EXTEND HORIZON LINE AND WINGS

DIGITAL READOUTS

OVERLAY TACTICAL FORMAT

FIGURE 1.6  Evolution from attitude director indicator (ADI) to background attitude indicator (BAI).

17

Controls and Displays for Aviation Research Simulation HORIZON LINE

SKY

GROUND

HORIZON LINE TACTICAL FORMATS

FIGURE 1.7  Spengler background attitude indicator. (Adapted from Spengler, R.P. 1988. Advanced Fighter Cockpit (Tech. Rep. ERR-FW-2936), Fort Worth, TX: General Dynamics.)

In Figure 1.7, three display formats are shown on a front panel, the central rectangular portion of each presenting mission-related information. The background border extended across all three displays and presented a single attitude format. The attitude information, in essence, framed the mission-essential display format and acted as one large AI. The BAI consisted of a white horizon line with blue above it to represent positive pitch, and brown below it to represent negative pitch. This display worked very well for detecting deviations in roll, but was less successful in showing deviations in pitch because; once the horizon line left the pilot’s field-ofview (FOV), the only attitude information present in the BAI was solid blue (sky) or brown (ground). Because the concept was effective in showing roll deviations but lacking in the pitch axis, enhancing the pitch axis became the focus of the research using this simulator. PCCADS BAI Research – Part 1: The initial work began by enhancing the pitch cues for a BAI, that framed one display format only (as opposed to framing three display formats as in the original Lockheed work; Liggett et al., 1992). Eight variations of the BAI were evaluated, and they each contained the following common elements (Figure 1.8): 1. Digital readouts of airspeed, altitude, and heading 2. Wing reference lines to provide an attitude reference (extensions of the normal miniature aircraft wings)

18

Human Factors in Simulation and Training GHOST HORIZON

AIRCRAFT WINGS

DIGITAL READOUTS

FIGURE 1.8  Digital readouts, wings, and ghost horizon (plane is in a 45  degree roll, negative pitch).



3. Ghost horizon (a dashed white line that appeared when the true horizon left the pilot’s FOV, and that indicated the direction of the true horizon)

This configuration was tested alone, as well as with the additions of color shading (the lightest shade of blue or brown appeared at the horizon and became gradually darker as positive or negative pitch increased to 90  degrees), color patterns (a vertical wedge with the thinnest portion at the horizon and the thickest portion at the zenith or nadir), and pitch lines with numbers. These design features were compared individually, in combinations of two, and with all three present. To determine if effective pitch information was being portrayed, the PCCADS study simulated the task of recovering from unusual attitudes. This task is often used to determine if adequate pitch information is present, as it is a key factor in a successful recovery. Results of BAI – Part 1: Results showed that the combination of color shading and color patterns was the format that had the quickest initial stick input time. When using this format, the pilots moved the control stick to begin their recoveries more quickly than when using any other format. This measure of initial stick input time related to the interpretability of the format because the pilots looked at the format, determined their attitude via the cues on the BAI, and began their recovery as quickly as possible. PCCADS BAI Research – Part 2: Follow-on research (Reising et al., 1995) was conducted to evaluate the use of color shading and patterns to portray pitch information when the BAI extended across three horizontally adjacent head-down

Controls and Displays for Aviation Research Simulation

19

formats (the display configuration employed by Lockheed). The procedures and pilot tasking were similar to the first study. There were, however, two different mechanizations of the BAI: Triplets and Global. The Triplets format consisted of each of the three displays presenting individual, identical attitude information. Each display acted as a single, independent AI. Because the pilot could be focusing on the information from any of the three display formats at a given time, it was thought that being able to interpret the aircraft’s attitude from using just the information from that specific BAI may be beneficial. The Global format consisted of all three horizontally adjacent BAIs acting as one large AI as in the original Lockheed study. It was anticipated that using the global BAI would be similar to seeing the outside world in its entirety and thus provide a benefit to the pilot. The Triplets and the Global formats had the same common elements of digital readouts, wing reference lines, a ghost horizon, and sky pointers. The pitch cues used were of two styles: (1) color shading and color patterns (the best format from the previous research) and (2) color shading, color patterns, and pitch lines with numbers. Although the second format was not considered the most beneficial from the previous research, the pilots expressed a unanimous preference for the BAI format that included pitch lines and numbers. Results of BAI – Part 2: Objective results were inconclusive; however, subjective results revealed that the pilots highly favored the Global format that provided color shading, color patterns, and pitch lines with number references. Thirteen of 16 subjects ranked this type of format highest. The global aspect tended to give the pilots excellent peripheral bank cues, also the combination of shaded patterns and pitch lines with numbers gave both qualitative and quantitative pitch reference, as well as pitch rate information. The Triplets format was rated low because the individual formats tended to distract the pilot with each BAI moving separately and displaying identical yet independent attitude information. The pilots were inclined to use only the center display for attitude information and completely ignore the two outboard displays. Conclusions: Based on the results of these simulation studies, BAIs appear to be a viable means of enabling the pilot to recover from unusual attitudes. Single BAIs work best with visual cues (such as color shading and color patterns) that create a flow pattern to facilitate pilot detection of motion while not requiring the pilot to focus on a specific readout. When using multiple BAIs, pilots preferred having a global BAI that uses a combination of shaded patterns and pitch lines. The pitch lines with numbers allowed the pilot to make an exact, quantitative assessment of attitude, and the color shades and wedge width gave the pilot “quick glance” qualitative orientation information. Impact This research demonstrated the feasibility of a new and innovative display format that would not be possible without the inclusion of a large CRT in the cockpit. Because of the CRT’s large surface area, the display formats can be configured in non-traditional ways. The trend of duplicating the E-M instrumentation with an E-O

20

Human Factors in Simulation and Training

display format may finally disappear as a paradigm shift takes place, and the full potential of large E-O displays becomes apparent.

Helmet-Mounted Oculometer Facility (HMOF) Simulator The HMOF simulator was established to capitalize on the unique capability for unobtrusive and accurate monitoring of eye and helmet positions using Honeywell’s oculometer. This oculometer system was incorporated into a single-seat simulator (A-7 geometry) that contained various controls and displays to support research that was more basic in nature. Test participants were not pilots and the tasks they performed were designed to represent cockpit workload demands, rather than simulate actual piloting tasks. One line of research in this simulator focused on determining whether eye and head measures were valuable objective indicators of the effectiveness of attention cues and control and display design. Parameters of eye and head movements (e.g., sequence and latencies) were examined in comparison to the conventional performance index, manual reaction time, as a function of several factors: attention cue modality, tasks, attention allocation between tasks, and information location (Calhoun et al., 1985). One of the cue modalities evaluated was the application of 3-D auditory signals (Calhoun & Janson, 1990). Results suggested that these relatively unobtrusive measures may be valuable indices for evaluating candidate crew station designs by detecting a pilot’s awareness of cues and changes in information presented. Another line of research evaluated the application of the operator’s LOS as an alternative control. With such control, the computer initiated a predefined action once it received an input based on the operator’s point-of-gaze. Use of eye control eliminated the need for a selective manual response by substituting the natural movement of the eye that was inherent to the visual task. Thus, in cockpit applications, pilots would be afforded a useful hands-free, head-up control mechanism. Research in this facility examined the spatial and temporal parameters for implementing the eye-control algorithm and quantified the efficiency of eye control compared to other control mechanisms (Calhoun et al., 1986). Display Technology In its basic configuration, the simulator contained two monitors. The upper centrally located monochrome monitor (approximately 10 × 12 cm) presented symbology for a pursuit tracking task that could be varied in difficulty level. A color monitor (approximately 20 × 30 cm) was located below the front switch panel. This simulator had no external visual scene. During testing, the cockpit was darkened by a lighttight curtain that surrounded the simulator. Control Technology The right-console joystick was used for the participants’ inputs to the tracking task. The stick was fitted with four switches, two of which were thumb-actuated

Controls and Displays for Aviation Research Simulation

21

pushbuttons. A pressure-sensitive 12.5 × 12.5 mm switch plate was mounted on the left console. The front switch panel contained seven dedicated switches. These momentary switches measured 14 × 20 mm. The middle switch subtended a visual angle of 1.2 × 1.7 degrees. The switches were labeled with black numerals. For some experiments, control based on eye LOS was activated. Eye and Head Monitor The participant’s eye was illuminated by a halogen lamp filtered to pass nearinfrared light. This light was collimated and reflected from a small coating on a parabolic helmet visor into the right eye. Some light was reflected from the cornea and a portion of the light that entered the pupil was reflected by the retina, passed out of the eye through the pupil, and was scanned by a miniature charge-coupled device (CCD) video camera. As the eye rotated about its center of rotation to look around the visual field, the corneal reflection moved differentially with respect to the pupil. Thus, eye direction could be determined from the relative positions of the center of the pupil and the center of the corneal reflection. At extreme angles of fixation, eye direction was determined from the shape of the pupil. A magnetic HMS provided accurate helmet position and attitude determination in six degrees of freedom with respect to a fixed coordinate system. The HMD utilized a transmitter mounted behind and above the helmet to create a magnetic field around the cockpit and a helmet-mounted receiver that responded to movement through the field with varying output voltages. A computer calculated helmet position and rotation based on these voltages. These data were combined with eye-angle data to determine eye LOS with respect to a fixed coordinate system. Representative Research Introduction: In that the visual system is the primary channel for pilots to acquire information, and eye muscles are extremely fast, it is advantageous to have the direction of eye gaze also serve as a control input. In other words, if the pilot is looking at a target or button, it is more efficient to use the pilot’s gaze to aim a weapon or select a switch, as shown in Figure 1.9. One approach to implementing gaze-based control is to combine LOS data with LOS dwell-time criteria. The operator selects an item on a display simply by looking at it for the criterion time. Using dwell time to initiate the control action is particularly useful if the operator’s gaze is only being utilized to call up additional data. In this manner, the operator’s sequential review of a series of icons can be made more rapidly, with detailed information popping up, as the gaze briefly pauses on each icon. Typically, required dwell times ranged from 30 ms to 250 ms. Longer dwell times tend to mitigate the speed advantage of gaze-based control. However, shorter dwell times increased the likelihood of a Midas touch, with commands activating wherever the operator gazes. One solution was to require a consent response such that gaze-based control was similar to the operation of a computer mouse and button press. The gaze (or mouse) indicated the response option on a display, and the consent (or button press)

22

Human Factors in Simulation and Training

FIGURE 1.9  Illustration of cockpit application of eye control. (US Air Force graphic.)

triggered the control action. This mechanism was evaluated in an experiment in which the participants selected discrete switches on the simulator’s front panel while manually tracking a target (Calhoun et al., 1986). In two of the three control methods, participants directed their gaze at the switch indicated by an auditory cue and then made a consent input (either a manual response via a joystick button or a verbal response). In a third condition, participants selected the switches with their left hand. Procedures: Six participants were randomly assigned to a sequence of the three switching methods. The order of the switching methods was such that, across participants, each method was preceded equally often by each of the other methods. In the conventional manual method, participants selected the cued switch with their left hand. The switch was illuminated during switch closure. Between switch selections, participants were required to keep their left hand on the left console switch plate, and the position of this switch was recorded continuously throughout the run. In the two eye-control methods, participants directed their gaze at the cued switch. The participants’ resulting eye LOS was computed at a rate of 60 Hz. When the system detected an eye LOS within 2.54 cm of the center of a switch for two of three consecutive samples (at least 33.4 ms), that switch was illuminated as feedback to the participant. The switch remained illuminated until (1) another switch was selected, (2) a five-second time-out interval had expired, or (3) a consent response was made. Thus, the operator could make the consent response while not looking at the switch (e.g., return attention to the tracking task). In one eye-control method, participants manually closed a push-button on the joystick for the consent response. In the second eye-control method, the consent consisted of uttering the word “Go” into the microphone. The participant then heard either the word “Go” or a beep through the intercom, to provide feedback as to whether the speech system successfully recognized the utterance or not.

Controls and Displays for Aviation Research Simulation

23

In each five-minute run, an auditory cue (“one”, “two”, … , “six”) corresponding to the switches numbered 1 through 6 was presented 42 times while the participant was completing a tracking task (manual inputs on a joystick to overlay a dot on a continuously moving cursor). Eight five-minute runs constituted a session. Sessions were conducted with each method until tracking error and switching time performance met training criteria. Switching time and accuracy data from the final four runs for each of six participants per switching method were analyzed (over 3,000 switching trials). Results: Switch activations that were not completed (i.e., switch not selected or consent not made) or completed incorrectly were dropped. The remaining data for accurate switch selections (96% of the trials) showed that it took less than a tenth of a second longer for the participants to select these switches with their eye LOS and push a button on the joystick (manual consent) than when using their left hand. This lack of a significant difference in average selection time indicates that eye control is a practical method for activating switches mounted in the central FOV, and that eye control switching is a feasible alternative to manual switching, especially when it is desirable to keep the hands on the left- and right-console controls. The results also showed that average switching time was significantly longer with the eye and voice consent method (2.83 seconds) than with the eye and manual consent method (1.78 seconds) or the manual method (1.72 seconds). It is important to note for the eye and voice consent method, that the time required for the voice system to recognize an utterance and transmit the results to the computer made up a component of the total switching time (0.92 seconds). Subtraction of the equipment induced response lag from each eye and voice consent switching time and examination of these data indicated that the differences in mean times for the three switching methods were not significant. Conclusions: The very small difference in selection time between the eye and manual consent method and the manual method indicated that eye-controlled switching is a feasible alternative to manual switching. The longer switching time with eye control and voice consent illustrated the importance of examining the total switching time, from the beginning of the switching task to the closing of the consent switch, when comparing control mechanisms. The delay introduced by the equipment components and by the duration of the utterance resulted in a corresponding inflation in overall switching time. Impact The HMOF was unique, as all the research conducted on this simulator was geared toward exploiting the capabilities afforded by a single technology – LOS tracking. Additionally, the research conducted using this simulator illustrates how a representative scenario and task environment can be utilized iteratively to identify optimal settings for the numerous parameters involved in a new concept. Ideally, the implementation of any new control and display approach should be fine-tuned with such a research test bed, before evaluation in a higher-fidelity simulation. Research with this simulator also marked a significant change in control technology, switching

24

Human Factors in Simulation and Training

from comparing alternative candidate approaches (e.g., manual versus speech) to an approach that integrates two or more technologies such that they are used together to perform a task. In this instance, the controls were mapped to different subcomponents of a task. The operator used eye gaze to designate a desired function and either a generic button or voice command for a consent response, commanding the system to execute the designated function. The use of both technologies capitalizes on the ability of eye gaze to rapidly designate a position on 2-D surfaces and a button press or voice command to quickly initiate an action.

Synthetic Interface Research for UAV Systems (SIRUS) Simulator Unmanned aerial vehicles (UAVs) have become key to aerospace intelligence, surveillance, and reconnaissance operations. More recently, their role has been expanded into search and rescue, chemical and biological detection, communication relays, and various combat operations. Many UAVs are remotely operated as multipletask teleoperated control systems via stick-and-throttle manipulations. The physical separation of the crew from the aircraft makes this control challenging, as groundbased operators do not receive the rich stream of multisensory information that onboard pilots receive regarding the surrounding environment, and the information that is received is often delayed or degraded due to limitations to communication. The SIRUS ground control station simulator (Figure 1.10) was established in the late 1990s to support research that evaluated the potential value of multisensory interfaces for improving control station operations where the UAV is under direct tele robotic control.

FIGURE 1.10  Synthetic Interface Research for UAV Systems (SIRUS) simulator, circa 1998. (US Air Force photo.)

Controls and Displays for Aviation Research Simulation

25

This simulator consisted of two operator stations. The Air Vehicle Operator (AVO) sat at the left workstation to control UAV flight, manage subsystems, and handle external communications. From the right workstation, the sensor operator (SO) was responsible for locating and identifying targets by controlling cameras mounted on the UAV. Using this simulator, the validity of novel concepts was tested by having participants employ the technology while completing representative control operations. For example, research addressed the utility of head-coupled headmounted display applications (Draper et al., 2002), joystick haptic vibration alerts (Draper et al., 2000), wrist tactile alerts (Calhoun et al., 2004), and voice-based control (Draper et al., 2003). A series of experiments specifically addressed visual display enhancements ranging from adding a simple symbology element (Draper et al., 2000) to overlaying more detailed symbology from synthetic vision systems that highlight, in real time, key information elements of interest directly on the camera video image. The latter technology proved especially useful for improving sensor operator performance on several types of tasks and virtually expanding the field-ofview to increase operator situation awareness and improve task performance (e.g., Calhoun & Draper, 2010; Calhoun et al., 2006; Draper et al., 2006). Display Technology Each operator station had an upper and a head-level 17 in. color CRT display, as well as two 10 in. head-down color displays. The upper CRT of both stations generally displayed a bird’s-eye area map (fixed, north up) with overlaid symbology identifying such things as current UAV location, mission waypoints, and current sensor footprint. The head-level CRTs (i.e., camera display) presented simulated video imagery from the cameras mounted on the UAV. HUD symbology was overlaid on the AVO’s camera display, whereas sensor-specific data were overlaid on the SO’s camera display. The four smaller head-down displays presented detailed subsystem and communication information. This simulator had no external visual scene. Control Technology Both stations had right-hand and left-hand joysticks, as well as two left-hand levers. At the AVO’s station, the joystick and throttle were used to control the UAV’s flight path and speed. At the SO’s station, the right-hand joystick controlled the gimbaled camera position and the left-hand joystick controlled camera zoom factor. Each station also had a trackball and a QWERTY-type alphanumeric keyboard with a horizontal row of function keys on top. Representative Research Introduction: Current UAV missions require a high degree of crew coordination to successfully locate and identify ground targets. For example, the AVO’s camera display can be configured to look at large FOV imagery from the gimbaled camera controlled by the SO, whereas the SO views higher-resolution (smaller-FOV) imagery from the same camera to facilitate individual target identification. Thus, while the SO is zoomed in on a particular area, the AVO can spot potential targets that lie outside the SO’s instantaneous FOV. Typically, this target information is communicated

26

Human Factors in Simulation and Training

verbally between the operators, but this is complicated because each operator uses a different frame of reference (earth- versus sensor-referenced). The AVO views the target in cardinal directions: north, south, east, and west. The SO then has to map these directions to where the camera is currently pointing with respect to the direction the UAV is flying. A common frame of reference would help communicate target information. This study (Draper et al., 2000) evaluated the following four display concepts (Figure 1.11) superimposed on the SO’s camera-view display:​ • Baseline: No additional symbology provided. • Floating Compass Rose: This provided a constant reference to realworld cardinal headings (N, S, E, W), regardless of air vehicle or camera orientation. • Locator Line/Telestrator: Via a cursor/track ball, the AVO designated a target location on the AVO’s (10 degrees FOV) display, resulting in a locator line being presented on the SO’s (1 degree FOV) display, indicating direction and angular distance the camera’s LOS should traverse in order to overlay it on the target. • Combined locator line/telestrator and Floating Compass Rose (N, S, E, W). Procedures: Twelve participants acted as SOs, and four rated pilots were trained to serve as AVOs. The AVO directed the SO to a ground-target area. The SO’s task was to maneuver the camera aimsight reticle onto the target and designate it. Targets initially appeared either within the AVO head-level display (near condition: 5 degrees radial distance from center) or outside the display (far condition: 20 degrees radial distance). The far condition required the AVO to initially utilize the upper (map) display to instruct the SO to maneuver the camera to the local area. Results: The results indicated that target designation time was significantly reduced for conditions that utilized the locator line (alone or with the Compass Rose) for both near and far targets. Time to designate targets was reduced by an average

FIGURE 1.11  Three display concepts for target search and localization by the SO of a UAV ground control station.

Controls and Displays for Aviation Research Simulation

27

of almost 50%. There was also less verbal communication when the locator line was used, freeing the audio channel for other tasks. Conclusions: The locator line expedited transfer of target location information between the UAV operators (AVO and SO). The locator line concept was based on the effective use of similar symbology on aircraft HUDs and HMDs. Thus, an interface concept that was found useful for manned crew stations was found to be useful also for unmanned aircraft control. For both applications, the locator line concept may have additional utility for potential targets identified by sources external to the crew. Impact Research with this simulator demonstrated the potential for reducing workload and improving operator situation awareness and task performance. However, it has also shown that technology proven useful for other complex control applications may not be useful for UAV control. For instance, it was thought that HMD technology would enhance the SO’s wide-area searches and spatial orientation, as it has for some manned aircraft applications. However, the results of a series of studies showed that there must be a fundamental limitation of head-coupled control for performing teleoperated search tasks. Use of the joystick and workstation display resulted in better performance on all measures compared to several HMD configurations evaluated (Calhoun et al., 2003). These findings illustrate that it is critical to test candidate interfaces for UAV control in representative ground control station simulators.

Vigilant Spirit Control Station (VSCS) Simulator As the role of UAVs increases in military and civilian operations, there is a growing need for research simulators to address the unique challenges these vehicles pose to the operators. Examples include how to: identify approaches to improve the operator’s presence in remote environments; design controls and displays that will facilitate the transition of the operator’s role from directly flying a vehicle to managing multiple UAVs, and inform the UAV operator of the detection of cyber activity on their vehicles and provide recommended actions. The VSCS (see Figure 1.12) is a mature, open architecture Windows PC-based multi-UAV testbed that has been widely used for over 20 years to develop and evaluate new autonomous capabilities as well as associated operator interfaces, in both high-fidelity simulations and DoD and NASA flight tests (Feitshans et al., 2008; Rowe et al., 2009). Furthermore, it has been used to examine prototype controls and displays for future applications; two of these experiments will be summarized here. Display Technology Typically, two large monitors present multiple display panels. A Tactical Situation Display (TSD) provides ownship/route information on a moving map. The health and status panel includes telemetry data, a chat room, communications panel, electronic

28

Human Factors in Simulation and Training

FIGURE 1.12  Vigilant Spirit Control Station (VSCS) Simulator. (US Air Force photo.)

checklists, and a subsystem annunciator panel. Other display panels provide information pertinent to the objective of the specific research. Controls Technology The TSD serves as the pilot’s primary control interface via UAV symbol selection and inputs on pull-down menus and text boxes via the mouse and keyboard. Representative Research Two specific experiments will be summarized to illustrate the flexibility with which VSCS can support research. The first experiment addresses display symbology to aid in the operation of UAVs in the National Airspace System  (NAS). The other examines how information regarding the cyber security of their vehicle should be presented to UAV operators. Sense and Avoid (SAA) Display Symbology Evaluation Introduction: This experiment evaluated how information from a Sense and Avoid (SAA) maneuver decision aid (Jointly Optimal Conflict Avoidance [JOCA] algorithm developed by Bihrle Applied Research, Inc.; Graham et al., 2011) should be presented to aid UAV pilots. Three stand-alone SAA Displays were evaluated differing in how the information portrayed the ranges of potential heading and altitude maneuver changes to avoid collisions (Bartik et al., 2017). Color coding was used in all three displays but the symbology varied: arcs in the “Banding Display,” dots and a scale with interactive functionality in the “Probing Display,” and a square indicating combinations of maneuvers in the “Dual Perspective Display.” Two automation thresholds were also evaluated that differed in terms of the degree of separation maintained between aircraft (“Well Clear” larger than “Near Mid-Air Collision” (NMAC)). These thresholds, in turn, determined when the automation would take over and initiate a collision avoidance maneuver. Procedure: Each of 22 pilots (16 unmanned and 8 manned) completed six trials: two trials with each automation threshold (Well Clear and NMAC) with each of

Controls and Displays for Aviation Research Simulation

29

three stand-alone SAA Displays (Banding, Probing, and Dual Perspective). Pilots in each trial responded to scripted health and status tasks while operating a simulated UAV along a flight path, maintaining safe separation as if they were operating in the NAS. This involved monitoring a SAA display populated with traffic information in proximity to ownship driven by the VSCS generation of six simulated unique traffic encounters per trial. Results: Pilot performance with the baseline Banding Display was just as good as with the two new novel display concepts. These results suggest the additional algorithm transparency features included in the Probing and Dual Perspective Displays are not critical for UAV operations in the NAS. SAA violations were also consistent throughout both automation thresholds, but the Well Clear threshold paired with the algorithm resulted in less time spent in violation of Well Clear. Lastly, despite the algorithm maneuvering far later than when the operators preferred, the type of maneuvers performed by the algorithm was quite similar to the participants’ maneuvers, in terms of their directions and magnitudes. Conclusion: This study demonstrated the display and performance implications of integrating an advanced SAA algorithm. Further research is needed, though, to evaluate whether increased transparency may be of use for more complex engagements and help tease apart any differences across SAA Displays. Incorporation of adjustable automation thresholds rather than one or the other may prove to be the best solution for more complex environments. Cyber Threat Information Requirements Investigation for UAV Crews Introduction: This experiment was designed to determine the level of awareness UAV crews had of cyber threats to their vehicles, and to determine the information requirements for designing displays to help them understand and resolve threatening cyber activity. In addition to traditional threats to their aircraft, UAV crews will have to deal with future threats, such as cyberattacks on their vehicles. To combat this threat, a Cyber Security Module (CSM) consisting of technologies to detect and defend UAVs from cyberattacks is being developed. In order to determine the best way to integrate this new technology and the information it could provide the UAV crews, a study was conducted using the VSCS (Liggett et al., 2017). The VSCS simulation component allows researchers to create ecologicallyvalid and repeatable mission scenarios. For this study, the basic scenario for data collection was developed such that, if there was a cyberattack on the vehicle, the crew could not successfully complete their mission. Two different cyberattacks were simulated in VSCS to imitate a low-sophistication cyberattack (loss of the sensor ball feed) and a high-sophistication cyberattack (gradual drift of the sensor’s global positioning system coordinates). To simulate the CSM in VSCS, alerts and checklists were provided to the crews during scenarios in which cyber threats were present and the CSM was active. Procedure: Five two-person crews (pilot and sensor operator) participated in the study. There were two independent variables: two levels of CSM (active and not active) and two levels of cyberattack (low- and high-sophistication) for a total of

30

Human Factors in Simulation and Training

four conditions. The dependent measure was the percent of mission tasks accurately completed. Results: When no cyberattack was present, crews, on average, were able to accurately perform 95% of the tasks in the mission, but when a cyberattack was present without the CSM, crews completed only 25% of mission tasks. However, adding the CSM brought task completion back up to 83%. Conclusion: This study provided a significant first step in understanding how UAV operators need to receive information about cyberattacks in order to maintain mission effectiveness. Clearly, the best situation is to have a CSM that can detect the type of threat and provide information on how best to respond. Integrating cyber security alerts and checklists into the standard format for mechanical and electrical alerts and checklists provides a sense of familiarity for the operators when dealing with these new types of threats. However, in the future, new interfaces will need to be designed so operators can cross-check multiple sources of information quickly and efficiently so they can also detect and appropriately respond to cyberattacks that have gone undetected by the CSM. Impact The reported research illustrates the utility of the VSCS simulation to explore how best to provide information from decision-aiding technologies. For example, it is effective in simulating cyberattacks on UAVs and showing the advantage of presenting information from a cyber security module to crews to maintain mission effectiveness. In that this government-developed extensible system utilizes a plug-in architecture without proprietary software, VSCS can be easily modified to simulate different UAV platforms and support a range of mission roles. It can also serve as a useful simulation to evaluate prototype human–machine teaming concepts under development, as exemplified by the incorporation of the commercially developed JOCA SAA algorithm into the reported experiment.

Intelligent Multi-Unmanned Vehicle Planner with Adaptive Collaborative/Control Technologies (IMPACT) Simulator Besides enabling an operator to manage multiple UAVs, there is also interest in the ability to employ a heterogeneous mix of unmanned vehicles (UVs) to provide synergetic, multi-domain strategic capabilities for complex, unpredictable situations. This will require more advanced intelligent-agent support, typically referred to as “autonomy”, which has the capability to achieve goals independently, without intervention. An entirely new controls/display interface approach is also required to support single operator management of multiple heterogeneous UVs that provides the operator transparency into the supporting autonomy and supports bi-directional collaboration and high-level tasking between the operator and autonomy. Also, for joint human– autonomy teaming, the operator must maintain overall situation awareness of the status of the autonomy’s processing and the rational for its recommendations, the basis for a shared mental model of who is doing what (as well as when and why). To ensure agility, the interface must support a range of control options whereby the operator

Controls and Displays for Aviation Research Simulation

31

can, depending on mission demands, be “on the loop” (supervising the autonomy), as well as being “in the loop” (exercising teleoperation to precisely control a particular vehicle/sensor temporarily). In response to this need, a tri-service team designed and implemented the IMPACT simulator to enable command and control of 12 UVs (4 air, 4 ground, and 4 sea surface vehicles) in a simulated mission to defend a military base perimeter by responding to multiple unexpected events (Draper et al., 2018). Display Technology Figure 1.13 illustrates IMPACT’s typical configuration featuring four displays. The top TSD provides a map and current information such as pertinent mission locations, UVs and their associated routes, as well as a vehicle panel showing a summary of each UV’s status. The left monitor provides system information and the right monitor provides a detailed dashboard for multiple UVs that includes status information and each UV’s sensor feed. The bottom monitor provides a “sandbox” display that mirrors the TSD but also allows the operator to create “what-if” scenarios by generating and comparing possible UV plans before implementing them. The sandbox is also where most of the interfaces are located that support operator management of numerous UVs. Given the large amount of information to present in the control station, the displays employ video game-inspired pictorial icons that present information in a concise, integrated manner to facilitate retrieval of the states/goals/progress for multiple systems and support direct perception and manipulation principles. Control Technology IMPACT employs a “playbook” delegation approach that enables seamless transition between control states (from manual to fully autonomous). With this adaptable automation scheme, the operator retains authority and decision-making responsibilities

FIGURE 1.13  Intelligent Multi-Unmanned Vehicle Planner with Adaptive Collaborative/ Control Technologies (IMPACT) Simulator. (US Air Force photo.)

32

Human Factors in Simulation and Training

that help avoid “automation surprises.” By supporting flexible operator– autonomy teamwork, agility is enabled to better respond in dynamic mission environments. At one extreme, the operator can manually control UV movement or build plays from the ground up, specifying detailed parameters. At the other extreme, the operator can quickly task one or more UVs by only specifying play type and location with an intelligent agent determining and executing all other parameters. For example, when an IMPACT operator calls a play to achieve air surveillance on a building, the intelligent agent recommends a UV to use (based on estimated time en route, fuel use, environmental conditions, etc.), a cooperative control algorithm provides the shortest route to get to the building (taking into account no-fly zones, etc.), and an autonomics framework monitors the play’s ongoing status (e.g., alerting if the UVs won’t arrive at the building on time). IMPACT’s play calling interfaces also facilitate operator-autonomy communication on mission details key to optimizing play parameters (e.g., target size and current visibility) as well as supporting operator/autonomy shared awareness (e.g., a display showing the tradeoffs of multiple autonomy-generated courses of actions [COAs] across mission parameters). Play progress is depicted in a matrix display reflecting autonomics monitoring, as well as a tabular interface that aids play management (e.g., allocation of assets across plays). Figure 1.14 illustrates the interfaces on the sandbox display to call plays, tweak plays via a workbook, monitor active plays, and manage chat communications. Each UV symbol and its respective route is presented in a unique color. Additional details are available (Calhoun et al., 2018; Frost et al., 2019). Besides determining the degree to which the autonomy assists with UV control, IMPACT’s interfaces were initially designed to also provide the operator flexibility in terms of which control modality could be employed to make inputs. Specifically, plays could be called or edited (1) via mouse/click inputs, (2) by using a touchscreen monitor, or (3) via speech commands (Calhoun et al., 2017). Furthermore, the operator could flexibly employ any of the three control modalities. In other words, the interfaces were designed to support all three modalities for each step in utilizing the play-based interfaces. Representative Research Introduction: The research summarized here was one of the earlier human-in-theloop experiments using the IMPACT simulation. Its objective was to compare the IMPACT prototype to a baseline UV control system that represented the state-of-theart for that timeframe (Behymer et al., 2017). The baseline system included a subset of IMPACT’s capabilities including the route planner and its associated interface, in that route planners were operational at the time. However, the baseline system lacked autonomy assistance in terms of UV asset recommendations, plan monitoring, and speech-based control. Procedures: The experimental design was a 2 (Baseline, IMPACT) × 2 (low, high mission complexity) within-participant design with the order of conditions blocked by system (half of the participants used IMPACT first, the other half used

FIGURE 1.14  IMPACT simulator’s sandbox map showing six unmanned vehicles on patrol and three ongoing plays: ground UV inspecting point with an air UV providing communication relay support, air UV and two ground UVs escorting a ground entity, and one sea UV inspecting a water threat. Dashed symbology shows a proposed plan for an air sector search at a point. (US Air Force photo.)

Controls and Displays for Aviation Research Simulation 33

34

Human Factors in Simulation and Training

the baseline first) and counterbalanced across task complexity. Mission complexity was manipulated by varying the number and timing of tasks. Each of the eight participants familiar with base defense and/or UV operations performed four 60-minute base defense missions. Participants completed a variety of defense mission-related tasks in each mission involving 12 simulated heterogeneous UVs. Results: Participants’ task performance was better on multiple mission performance metrics with the IMPACT system in comparison to the baseline system. Participants were also able to execute plays using significantly fewer mouse clicks with IMPACT as compared to baseline. The overall usability of each system was assessed using the System Usability Scale (SUS; Brooke, 1996). Participants rated IMPACT higher than the baseline on all ten SUS items, and IMPACT’s overall SUS score was significantly higher than the baseline’s overall SUS score. Participants also subjectively rated IMPACT significantly better than the baseline in terms of its perceived value to future multi-UV operations as well as its ability to manage workload. In fact, every participant gave IMPACT the highest possible score for potential value, and all but one participant gave IMPACT the highest possible score for its ability to aid workload. Subjective data with respect to the speech and touch input modalities, however, was not as positive. Instead, participants favored the mouse input modality and their inputs were also faster and more accurate with mouse-based input compared to touch or speech input (Calhoun et al., 2017). Conclusions: Evaluation results indicated that the play-based innovative control and display-based approach supports operator–autonomy teaming for effective management of a dozen simulated vehicles performing base defense tasks (Behymer et al., 2017). Impact The IMPACT simulation has successfully supported a series of laboratory experiments and live tests including the Autonomy Strategic Challenge Warrior live exercise supported by The Technical Cooperative Program (TTCP) – a five-nation collaborative project (Bartik et al., 2020). Results using this novel instantiation of adaptable automation have been very positive. This proven play-based approach, employing concise icons that facilitate direct, intuitive, and efficient task workflow, also informs other Air Force systems/programs. Examples include how it can support operational efforts involving small UAVs, as well as provide human–automation teaming interfaces for the Advanced Battle Management Systems, currently under development. Research with IMPACT has also identified needed improvements for effective human–autonomy teaming, such as in the bi-directional human–autonomy communications methods and aids that support collaborative problem-solving (e.g., naturalistic dialogue and sketch interactions). Examinations of a variety of team structures on overall human–autonomy teaming are needed as well as mechanisms that improve management of temporal constraints. Additionally, interest in shifting to larger numbers of collaborating systems to provide joint all-domain command

Controls and Displays for Aviation Research Simulation

35

and control (JADC2) strategic capabilities (integrated air, land, maritime, cyber, and space capabilities) has prompted ongoing IMPACT research threads. One effort aims to enable distributed collaborative support between an IMPACT operator and ground-dismounted soldiers equipped with an Android-Based Tactical Awareness Kit (ATAK). Another is exploring how cyber-related effects can be initiated with the play-based approach, either separately or in conjunction with plays involving UVs.

MOTION-BASED SIMULATORS The simulators discussed thus far are all fixed-based simulators, that is, they do not include simulated realistic platform motion. However, as some aircraft become more and more agile, there are many human factors issues that need to be considered for cockpit design. Because of this, motion-based simulators provide a means for exploring human-related issues prior to more costly flight test options. Primarily, motion-based simulators are used for demonstration and training but some are used for gravity (G)-tolerance testing, such as centrifuges, and others are used to study the incidence and effects of pilot spatial disorientation (a pilot’s misperception of the attitude, position, or motion of his/her aircraft). In contrast to fixed-based simulators, there has been little control or display research conducted in motionbased simulators to date. This is unfortunate as motion environments have numerous physiological and psychological consequences that could impact the utility of controls and displays. For instance, under high acceleration it is difficult to move the arm or hand to select functions on the front panel. Hence, the effects of acceleration on the utility of eye gaze and speech-based control would be of interest to see if these are viable alternative controls. High acceleration is also known to affect color vision. In fact, research was conducted in one motion-based simulator to evaluate this effect. This simulator and research will be described in the next section.

Dynamic Environmental Simulator (DES) The DES centrifuge provided multi-axis G exposures in a gimbaled cab (Figure 1.15). It exposed participants up to a maximum of 9Gs, and could combine accelerations (Gx + Gy + Gz). A variety of physiological and experimental measurements could be collected during simulation testing. These included heart rate, skin response, eye blink rate, blood flow, head movement, and G-exposure data. There was also closedcircuit video and audio available for data recording purposes. The DES was used to test personal protective equipment, helmet-mounted systems, and cockpit systems. It supported both sustained acceleration and spatial disorientation research. After modification, the DES allowed closed-loop motion-based studies. So, in addition to the previously mentioned collectible data parameters, pilot flight performance metrics could be collected and analyzed.​

36

Human Factors in Simulation and Training

FIGURE 1.15  Dynamic Environmental Simulator (DES), circa 1969. (US Air Force photo.)

Display Technology The cab contained a domed visual scene provided by a front projection system. It displayed a 180  degrees horizontal by 160  degrees vertical out-the-window FOV. In addition, HUD symbology could be projected onto the out-the-window scene for HUD symbology evaluations. The head-down display contained one 23 in. diagonal LCD, and the format presented on this display could be changed via software to represent a number of head-down instrument panels (e.g., an F-18 head-down suite). Control Technology The cab contained an F-16 control stick and throttle that was used by participants to fly an F-16 aeromodel. As with many of the simulators described, this system could be switched with other aeromodels, sticks, and throttles to represent a variety of aircraft cockpits. Representative Research Introduction: One of the biggest advances in head-down information presentation was the addition of color to the displays in the 1970s. Pilots could now rely on known color schemes to determine the meaning of specific objects on a display (e.g., green = good, red = bad) and could learn new color schemes for displays use. Color has been shown to improve pilot performance and reduce workload. However, when pilots are pulling high Gs or sustaining acceleration, the blood pressure in the eye is reduced, and this may cause changes in color vision. Four experiments were conducted in the DES to help determine the effects of sustained acceleration on color vision (Chelette et al., 1999). A few will be summarized here. Background: To get an idea of what actually happens to color vision under high Gs, a preliminary study was conducted in which participants with normal color

Controls and Displays for Aviation Research Simulation

37

vision viewed a color map as the G profile ramped up from the baseline (1.4Gs) at a rate of 0.1Gs per second until the participants experienced almost complete blackout. Participants were to report what they saw during this process. A number of participants said that the river (a cyan color) faded away first, then the yellow and green of the terrain faded together, and finally the red and dark blues faded to black. This observation led to numerous studies conducted in the DES to explore visual contrast sensitivity, night vision, and visual acuity under Gs. Luminance Study: Four colors (red, green, blue, and yellow) at various luminance levels were tested in the following manner. A grid display was developed that contained four rows of digits in various colors; one color per row. The columns had varying luminance contrast ratios. For instance, each column had the same luminance contrast ratio with its background, regardless of the digit color. Participants were subjected to a G profile that progressed from the baseline to near blackout at a slow onset rate. Results showed that digits were recognized longer when they had greater luminance contrast ratios with their backgrounds than digits with lesser luminance contrast ratios with their backgrounds. Color Identification Study: The objective of this study was to determine if participants could identify colors at high Gs. The study employed five colors (red, green, blue, yellow, and gray), at three contrast ratios to represent dark viewing, daylight viewing, and twilight conditions. Six G levels were tested from 1G-9Gs (1.0, 7.0, 7.5, 8.0, 8.5, and 9.0). Results showed that there were no significant differences in terms of reaction time and accuracy between the colors, the contrast ratios, or the G levels. However, one participant had a large number of errors with the color yellow. Therefore, even though it does not appear that most participants have a hard time distinguishing colors under G, this study showed how an undetected color perception deficit of a particular participant may become evident at high Gs. Color Discrimination Study: The objective was to determine if colors could be discriminated under high Gs when the task involved mathematical judgment and choice. This task was more representative of a pilot task in that the display contained seven targets, four of one color and three of another. The participants’ task was to simply press a button that indicated the color of the most number of targets. For this study, contrast ratio was held constant, and four G levels were used (1.0, 7.0, 8.0, and 9.0). Although the overall error rate was below 10%, trends showed that there were more errors in this task than in the previous ones, and the majority of errors occurred at 9Gs. The most common errors encountered were not being able to discriminate between yellow and green (yellow was commonly mistaken for green) and not being able to discriminate between gray and blue (gray was commonly mistaken for blue). Conclusions: On the basis of results of the reported studies, colors with similar luminance contrast ratios should not be used on the same display because they may fade together during high-G maneuvering. These types of studies may be instrumental in detecting color deficiencies for pilots who intend to fly high-G aircraft. Also, under high-G conditions, pilots may be prone to not being able to discriminate between yellow and green. Given that green is commonly used to represent friendly objects, this may compromise pilot performance if they cannot discriminate between friendly and unknown entities.

38

Human Factors in Simulation and Training

Impact A motion-based simulator like the DES is the perfect avenue for conducting many types of control and display research in that pilots can fly high-G profiles in a realistic manner with closed-loop control. For example, studies like these can produce display design guidance for color displays in high-G aircraft and can help determine color vision screening recommendations.

Disorientation Research Device One of the newer motion-based devices built to explore challenges associated with pilot spatial disorientation (SD) is the Disorientation Research Device (DRD), housed at the Naval Medical Research Unit – Dayton (NAMRU-D) at WrightPatterson Air Force Base (see Figure 1.16). This six-degrees-of-freedom device (roll, pitch, yaw plus planetary motion) can replicate angular and linear flight accelerations with up to 3 Gs of sustained force. It was also designed to mimic some SD vestibular illusions. The system rests on a 35-foot-diameter platform which can turn either clockwise or counter-clockwise at up to 150 degrees/sec. Linear motion capabilities include 33 feet of horizontal translation and six feet of vertical motion. One of the objectives for building the DRD was to create a motion-based simulator capable of inducing pilot SD to determine the best methods for combating these situations in flight. Therefore, one of the first studies conducted in the DRD was to compare fixed-based and motion-based simulation capabilities (the DRD with and without motion activated) on their effectiveness in inducing SD (Williams et al., 2021). Display Technology The cockpit instrument panel in the DRD is a wide-field high-resolution visual display system (26 in. diagonal monitor; 1,366 × 768 pixel resolution) with touchscreen

FIGURE 1.16  Disorientation Research Device (DRD): The KrakenTM. (US Navy photo,)

Controls and Displays for Aviation Research Simulation

39

capability. The out-the-window (OTW) scene is displayed on a 65 in. diagonal super ultra-high definition flat panel display that provides a 83 degrees horizontal × 53 degrees vertical field-of-view. Both the instrument panel and the OTW graphics are generated with X-Plane flight simulation software (Version 11.41). Control Technology As mentioned previously, the instrument panel monitor has touchscreen capability for pilot/study participant interaction with the display. Engine power is controlled with a Thrustmaster Warthog throttle, and pitch and roll are controlled with a Flightlink G-Stick III joystick. Yaw is controlled with adjustable rudder pedals. The cockpit is fully networked to an external control station with full monitoring and recording capabilities. The DRD can be operated in pilot-in-the-loop or preprogrammed modes. Research data recording capabilities include two-way cockpit voice communications, a 3-camera video system for visual recordings, eye and head tracking, physiological monitoring, and full “flight” data recording. Representative Research To date, the limited research conducted in the DRD has not been focused on controls and displays. However, a forthcoming study will compare simulation approaches (in-flight simulation and motion-based simulation) on the evaluation of candidate technologies (visual symbology sets presented on a helmet-mounted display and spatialized auditory displays) for SD prevention (Geiselman et al., 2017). More information is presented in the next section on in-flight simulators as the preliminary study was conducted in the University of Iowa Operator Performance Laboratory Aero L-29 Delfin Jet and will be repeated in the DRD for comparison of results. This study will also serve to verify the SD illusions created in the motion-based DRD with that of real flight. Impact Pilot SD continues to be a problem for both military and civilian pilots and is the leading cause of aviation mishap fatalities. On the military side, it is one of the most costly challenges measured by both life and equipment lost. The ability to accurately replicate SD conditions/illusions is paramount to conducting effective research on ways to combat this threat with effective control and display design. The DRD is a state-of-the-art motion-based simulator specifically designed to study and address pilot SD and will contribute to meaningful research in this area for years to come.

IN-FLIGHT SIMULATORS Although the motion-based simulators described in the previous section allow for testing in some aspects of the flight regime, there are many other aspects that can only be examined in flight. The purpose of in-flight simulators is to get as close as possible to the environment in which the crew station technology will ultimately be employed. A few will be discussed in the following sections.

40

Human Factors in Simulation and Training

NASA’s OV-10 Although not designed initially to be an in-flight simulator, the OV-10 has been used extensively to test spatial audio and speech recognition technologies. The OV-10A aircraft is a twin-engine, two-crew-member, tandem-seating turboprop aircraft (Figure 1.17). The displays in the rear seat were dependent on the research taking place at the time of flight. For instance, during a speech recognition study, a monochrome monitor was installed that displayed words the participants were to say into a microphone. There was a keyboard in the backseat, as well as a push-to-talk switch, an acoustic microphone, and a noise-canceling boom microphone for the speech recognition studies. Representative Research Introduction: Speech recognition has long been advocated as a natural and intuitive method by which humans could potentially communicate with complex systems. Research in the area of robust speech recognition, in addition to advances in computational speed and signal processing techniques, has resulted in significant increases in recognition accuracy, key to users accepting the technology. Speech recognition systems have advanced to the point where applications of this technology have significantly increased. The demands on military pilots are extremely high because of the very dynamic environment within which they operate. The pilot has only limited capability to effectively manage available onboard and off-board information sources using just hands and eyes. Because workload is high, and the ability to maintain situational awareness is imperative for mission success, voice control is ideal for military cockpit applications. For these reasons, and because recognition accuracy rates were approaching acceptable levels for cockpit applications, research began to evaluate the potential use of automated speech recognition technology as a natural, alternative method for the management of aircraft subsystems. The key objective was to

FIGURE 1.17  OV-10A Aircraft. (US Air Force photo.)

Controls and Displays for Aviation Research Simulation

41

confirm that performance would not deteriorate in the operational flight environment due to high noise, acceleration, or vibration. Williamson et al. (1996) conducted a study to measure word-recognition accuracy of the ITT VRS-1290 speech recognition system in an OV-10A test aircraft both on the ground and in 1G and 3G flight conditions. A secondary objective of this study was to compile a speech database that could be used to test other speech recognition systems. Test Procedures: Sixteen participants were involved in this study. All participants were tested in the laboratory, in the hangar (sitting in the aircraft cockpit with no engines running), and in flight. During flight, participants experienced a 1G data collection session (referred to as 1G1), followed by a 3G data collection session, and then another 1G data collection session (referred to as 1G2) to test for possible fatigue effects. The study was divided into two separate sessions. The first session consisted of generating the participants’ speech templates (samples of them saying each word in the vocabulary to be tested) in a laboratory setting and collecting some baseline performance data. Participants were briefed on the nature of the experiment, and template enrollment was performed. A system identical to the one in the aircraft was used as the ground support system for template generation. The participants used the same helmet and boom-mounted microphone that was used in the aircraft. Template training involved the participants’ speaking a number of sample utterances. Once template generation was completed, a recognition test followed that consisted of reciting the utterances to collect baseline recognition data. The first aircraft test session was performed in the hangar to provide a baseline (the aircraft in quiet conditions). This consisted of each participant speaking the 91 test utterances twice, for a total of 182 utterances. During both ground and airborne testing, participants needed little or no assistance from the pilot of the aircraft. The participants sat in the rear seat of the OV-10A and were prompted with a number of phrases to speak. All prompts appeared on a 5 × 7 in. monochromatic LCD in the instrument panel directly in front of the participants. Their only cockpit task was to reply to the prompts. Close coordination was required, however, between the pilot and participants while the 3G maneuvers were being performed as the pilot had to execute a specific maneuver in order to keep the aircraft in a 3G state. Results: Three comparisons of word-recognition accuracy were of primary interest:

1. Ground (lab + hangar) versus air (1G1 + 3G + 1G2) 2. 1G (1G1 + 1G2) versus 3G 3. 1G1 versus 1G2

Orthogonal comparisons were done for each of these scenarios. However, no significant differences were found (Figure 1.18). Conclusions: Results showed that the ITT VRS-1290 Voice Recognizer/ Synthesizer system performed very well, achieving over 97% accuracy overall flight

42

Human Factors in Simulation and Training

FIGURE 1.18  Mean word accuracy for each test condition. (From Williamson, D. T., Barry, T. P., and Liggett, K. K. 1996. Flight test performance optimization of ITT VRS-1290 speech recognition system. Audio Effectiveness in Aviation: Proceedings of the Aerospace Medical Panel Symposium.)

conditions. The concept of speech recognition in the fighter cockpit is very promising. Any technology that enables a pilot to stay head-up and hands-on will greatly improve flight safety and situational awareness. Impact This flight test represented one of the most extensive in-flight evaluations of a speech recognition system ever performed. Over 5,100 utterances comprising more than 25,000 words or phrases were spoken by the 12 participants during the flight (4 of the 16 participants’ flight test data was not usable). This number combined with the two ground conditions resulted in a test of over 51,000 words and phrases. The audio database of digital audio tape (DAT) recordings was transferred onto CD-ROM and was used to facilitate laboratory testing of other speech recognition systems. It was also made available for distribution to the speech recognition community. The DAT recordings proved to be extremely valuable because many new voice recognition systems had been produced after this study was conducted. With this database, new systems could be tested against speech recorded in an extremely harsh environment (the participants’ crew station was directly in line with the noisy engines) without requiring additional flight tests. Finally, the example study illustrates the importance of flight-testing controls and displays in the environment in which they will be used. The OV-10, as well as the in-flight simulators discussed in this section, are invaluable as risk-reduction vehicles for ensuring that the control and display technology can be integrated into the airborne environment.

Controls and Displays for Aviation Research Simulation

43

Total In-Flight Simulator (TIFS) NC-131H Transport Aircraft The purpose of TIFS (Figure 1.19) was to perform airborne simulations of existing or new aircraft to evaluate flying qualities, flight control characteristics, human factors concerns, or other issues of interest. This simulator allowed for the variation of numerous parameters in-flight, such as aircraft flight characteristics, controller feel characteristics, and HUD formats. In addition, the cockpit could be reconfigured to represent other aircraft. The front seat served as the evaluation cockpit and the rear seat served as the safety cockpit. The HUD and head-down displays were driven by a programmable display generator that allowed for quick changes to the display formats. The method for aircraft control was also variable; control sticks, wheels, and throttles were easily interchanged in the evaluation cockpit. Representative Research The TIFS aircraft has evaluated a number of control and display configurations for different aircraft, for example, the “glass” version of the Air Force’s C-141 transport aircraft, called the Control/Display System (CDS). The objective of this study was to test the adequacy of the LCDs that replaced the E-M primary flight displays (ADI and horizontal situation indicator). The pilots performed tasks such as unusual attitude recovery and instrument system landing approaches. The objective and subjective flight test data showed that performance, pilot workload, spatial orientation, and air crew acceptance using the proposed CDS display format were improved or no worse than commensurate with, that obtained using the current C-141 instrument format, in almost every instance of analysis (Gawron & Bailey, 1995, p. 84).

FIGURE 1.19  Total In-Flight Simulator (TIFS). (US Air Force photo.)

44

Human Factors in Simulation and Training

The aircraft has also been used by NASA (National Aeronautics and Space Administration) to test the Synthetic Vision System (SVS) that was a key part of the High-Speed Research Program. “The SVS project develops synthetic vision technologies with practical applications to eliminate low visibility conditions as a causal factor in civil aircraft accidents.” (Willshire et al., 2000, p. 376). As part of the flight testing of the SVS technologies, the TIFS crew station was configured with displays showing different views of the outside world that would aid the pilot and aircraft landing during poor visibility conditions. The results showed that, whereas some features (such as ground texturing) were beneficial, others (such as minimized display formats) did not aid in the landing task. Impact The two studies just discussed serve to illustrate the versatility of the TIFS aircraft in evaluating different control and display configurations. The real impact of TIFS was its ability to be reconfigured to evaluate everything from transport cockpit displays such as in the C-141, to handling qualities of the space shuttle. After supporting over 2,500 research flights, this aircraft was retired to the USAF Museum.

Variable In-Flight Stability Test Aircraft (VISTA) Lockheed NF-16D Fighter Aircraft VISTA (Figure 1.20) was an F-16D flight test vehicle in which the front seat served as an evaluation cockpit, and the rear seat served as a safety cockpit. It was used to conduct a variety of airborne simulations to evaluate flying qualities, flight controls, and control and display issues. One of these simulations was to further evaluate voice recognition, a follow-on to the OV-10 evaluation reported earlier in this chapter

FIGURE 1.20  Variable In-Flight Stability Test Aircraft (VISTA). (US Air Force photo.)

Controls and Displays for Aviation Research Simulation

45

(Williamson et al., 1996). Briefly, this flight test verified recognition engine parameters, identified microphone audio issues, and provided additional samples including non-native English speakers to supplement the database available to the speech recognition research community (Barry et al., 2006). To further describe use of VISTA, a study that evaluated HMD symbology is described next. Representative Research Introduction: HMDs can provide an important function in aircraft – off-boresight targeting. This capability highlights a major difference between HMDs and HUDs. HUDs are mounted to the instrument panel of the cockpit and can provide on-boresight information only. On-boresight refers to the visual area the pilot sees when looking down the longitudinal axis of the aircraft (i.e., looking straight ahead). HMD’s are mounted to the pilot’s head and can provide on-boresight as well as offboresight information. Off-boresight refers to all other visual areas the pilot views (i.e., not looking straight ahead). Some of the challenges with integrating HMDs in the cockpit are determining how to present information, and what information needs to be presented to the pilot not only at various phases of the mission but also at various head positions. This study (Jenkins et al., 2002) focused on determining the best off-boresight HMD symbology for targeting, as well as attitude maintenance during realistic air-to-air and air-to-ground target acquisition scenarios. Unusual attitude recovery performance was also examined to determine if the HMD symbology helped or exasperated the pilots’ ability to recover from challenging, SD-like situations. Symbology Sets: The off-boresight symbology sets tested included a nondistributed flight reference (NDFR) format and a visually coupled acquisition and targeting system (VCATS) format. The NDFR allowed ownship status information to always be available on the HMD regardless of where the pilot is looking. It included both digital and analog information. The digital information provided airspeed, altitude, and heading. The analog information was portrayed with the arced portion of the display. Attitude was interpreted by comparing the position and length of the arc with the aircraft symbol. Four examples are shown in Figure 1.21: straight and level, 180 degrees roll, 30 degrees right roll, 135 degrees left roll. VCATS (Figure 1.22) was designed as a high-altitude ownship attitude reference. It included a horizon line split by an aircraft symbol representing climb/dive angle. The horizon line on this symbology set rotated as does the standard ADI and HUD horizon line. The VCATS horizon line also changed shape as climb/dive angle increased or decreased. For instance, when climb/dive angle was negative, the line became dashed and portrayed a chevron-type symbol. Digital readouts of airspeed, heading, and altitude were presented around the outer boundaries of the HMD FOV. The military standard 1787 HUD (MIL-STD HUD) symbology was present on the HUD for the air-to-air and air-to-ground tasks, and on the HMD in virtual HUD mode during the unusual attitude recovery tasks when pilots looked on-boresight. The reason the on-boresight symbology was presented on the HMD for this task was that the pilot-subjects wore leather visors to prevent themselves from seeing the outside world as the evaluation pilot flew the aircraft into the unusual attitude. These

46

Human Factors in Simulation and Training

FIGURE 1.21  Non-distributed flight reference (NDFR) symbology.

visors also prevented them from seeing the HUD. Thus, the virtual HUD on the HMD was utilized for this task. This symbology included a climb/dive ladder, moving horizon line, fixed aircraft reference, bank scale, clocks to represent airspeed and altitude, and a heading tape (Figure 1.23). Procedures: Participants evaluated the various HMD formats for a total of 11.7 flight hours. Test points were flown from the front cockpit by the evaluation pilot. The safety pilot in the rear cockpit set up the HMD configurations, performed routine F-16 flight procedures, and monitored the safety of the flights. Pilots became familiar with the symbology while sitting in the cockpit of the VISTA test aircraft. This essentially functioned as a ground simulator. For the air-to-air and air-toground tasks, pilots flew with the standard HUD symbology on the HUD when they were looking on-boresight, and either nothing, NDFR, or VCATS symbology on the HMD when they looked off-boresight. For the unusual attitude recovery task, pilots were looking off-boresight when the task began, and they flew with the standard HUD symbology on the HMD when looking on-boresight, and either nothing, the NDFR, or VCATS symbology on the HMD when looking off-boresight. Results: For the unusual attitude recovery task, pilots performed 37% faster in initiating a correct input with the NDFR format than with the MIL-STD HUD format.

Controls and Displays for Aviation Research Simulation

47

FIGURE 1.22  Visually coupled acquisition and targeting system (VCATS) symbology.

Pilots also performed 18% faster with the NDFR than VCATS for the same measure. For the air-to-ground task, both the NDFR and the MIL-STD HUD provided an adequate or desired amount of off-boresight search time, while the NDFR allowed the highest percentage (longer search times mean that the off-boresight symbology provided adequate information for the pilots to maintain safe flight without having to return attention to the on-boresight flight display). For the air-to-air task, the NDFR format allowed pilots to achieve the highest percentage of off-boresight search time while still maintaining aircraft parameters. Although the VCATS symbology also provided an adequate amount of off-boresight search time, one of the primary task performance metrics was degraded when pilots used VCATS for off-boresight searching. Conclusions: This study shows the advantages of using off-boresight attitude symbology not only for air-to-air and air-to-ground target acquisition tasks but also for recovering from unusual attitudes. Impact All new technologies that are incorporated into the cockpit need to be thoroughly tested to determine the applicability of the technology, not only for its intended

48

Human Factors in Simulation and Training

FIGURE 1.23  Standard head-up display (HUD) symbology.

purpose but for all aspects of flight. Using an in-flight simulator like VISTA provided a realistic environment to conduct such technology testing. This type of simulation can lower the risk of integrating new technologies by conducting proof-of-concept testing and solving integration issues. Research, such as reported here, helped accelerate the application of HMDs, both retrofitting older aircraft and incorporation into newer fighter aircraft to enhance mission performance.

University of Iowa Operator Performance Laboratory Aero L-29 Delfin Jet To support both civilian and military aircraft cockpit research, the University of Iowa Operator Performance Laboratory (OPL) has two L-29 single-engine, tandem-seat flight jet trainers (Figure 1.24), as well as Mi-2 helicopters. The Aero L-29 Delfín is a military jet trainer developed and manufactured by Czechoslovakian aviation manufacturer Aero Vodochody. The jets are equipped with oxygen systems, g suits, and pressurization making them capable of performing high dynamic maneuvers. The front seat is used by the safety pilot, with the evaluation pilot in the rear seat. Typically, the rear seat is covered with an opaque cloth hood, occluding the outside view to create highly realistic nighttime conditions. With this configuration, flight in the L-29’s backseat simulates single-seat 5th generation flight with an HMD. The aircraft can also flexibly serve as a Live, Virtual, Constructive (LVC) simulator in the hanger, particularly useful for training in preparation for flight tests. The aircraft

Controls and Displays for Aviation Research Simulation

49

FIGURE 1.24  University of Iowa Operator Performance Laboratory Aero L-29 Delfin Jet. (Photo courtesy of University of Iowa Operator Performance Laboratory.)

and supporting systems support both air-to-air and air-to-ground simulations. The aircraft are also equipped for human performance state assessment, including the collection of physiological data. The rear seat is equipped with a high-brightness, 21 in. touch screen display, as well as an F-35 cueing helmet that includes a head tracker, and binocular eye tracker. The technology is state-of-the-art including a synthetic vision system. Controls include a stick and throttle with numerous system switches to support HOTAS. Representative Research Objective: Although HMD technology has been shown to be advantageous in that it presents critical visual information (such as primary flight reference symbology, targeting information, and imagery) regardless of whether the pilot is looking directly forward (on-axis) or elsewhere (off-axis or off-boresight [OBS]) (see VISTA Representative Research above), it has also been shown to cause pilots to experience spatial disorientation when switching between on- and off-axis views. This research aimed to aid pilots in maintaining attitude awareness, especially when looking away from the HUD to acquire information off axis, for example accessing ownship attitude information within the HMD field-of-view. Three different OBS (off boresight) HMD visual symbology sets were compared using operationally representative scenarios to determine which is best to prevent pilot loss of spatial orientation when off-axis tasks are performed (Schnell et al., 2017). Symbology Sets: All three symbology sets included a heading tape, but differed in how OBS HMD symbology was presented. The HMD’s Current Display Format (CDF; Figure 1.25) was one of the three sets. It did not provide attitude (climb/dive/ roll) information; only speed, head heading, and altitude were shown. To obtain aircraft attitude information, pilots were required to either interpret the rate of change in speed, heading, and altitude readouts or look in the forward direction at the HMD combiner to employ a virtual HUD, because there was not a separate HUD display

50

Human Factors in Simulation and Training

FIGURE 1.25  HMD’s Current Display Format (CDF). (US Air Force graphic and photo)

FIGURE 1.26  HMD Distributed Flight Reference (DFR) Format.

surface. Thus, the “vHUD” symbology was rendered in the HMD whenever the pilot was looking forward. When the head tracker sensed the pilot was looking off boresight, the vHUD symbology disappeared. The second symbology set, HMD Distributed Flight Reference (DFR; Figure 1.26), added aircraft attitude information in the upper right corner of the HMD FOV. It included a forward-referenced HMD stabilized aircraft symbol with a movable earth reference circle that rotated around the symbol center reflecting bank angle. It was either “opened” or “closed” with regard to flightpath angle (climb/dive vs. pitch). The earth reference circle had two end-tick marks that referenced the nearest horizon on each side. Thus, for a flight path that pointed straight up, or nearly so, the

Controls and Displays for Aviation Research Simulation

51

earth circle perimeter was fully “open” to denote the absence of ground and thus, not drawn but end-tick marks remained. This indicated that the aircraft was in a climb attitude. For a flight path that pointed more straight down, the earth reference was closer to a full circle, indicating that there was no sky left around the forward direction (only ground) and that the aircraft was in a dive attitude. To summarize, the end-tick marks indicated the direction of the nearest horizon. In a level flightpath attitude, the earth reference was a semi-circle (the upper or sky half “open” and ground half drawn). A full description of the symbology mechanization is available in Geiselman (1999).​ The third symbology set for this study was the same symbology set that provided pilots with the best performance in the VISTA flight test reported previously – the non-distributed flight reference (NDFR; see Figure 1.27). Information included twodigit aircraft heading in the center circle and airspeed and altitude on the left and right wing, respectively, of the aircraft symbol. Procedures: Ten evaluation pilots viewed the image on the HMD’s combiner as a binocular, fully overlapped monochrome green picture of 1,280 × 1,024 pixels on a 40 h × 30 v degrees FOV. The OBS symbology sets were shown when the HMD rotated more than 15 degrees laterally or tilted more than 25 degrees vertically from the aircraft forward center line. To determine the benefits of candidate symbology sets to prevent loss of spatial orientation, three sorties were developed that were representative of SD-prone conditions identified in prior 5th-generation fighter aircraft incidences. During three sorties, each evaluation pilot followed instructions of an experimenter playing the role of the Joint Terminal Attack Controller (JTAC). The HMD presented a world stabilized diamond symbol as well as an azimuth steering line superimposed over the general target area. By employing a “talk-on” between the JTAC and pilot, target information was provided followed by a standardized nine item (nine-line) brief that the pilot noted. During the talk-on, references were made to features available on an

FIGURE 1.27  HMD Non-Distributed Flight Reference (NDFR) Format.

52

Human Factors in Simulation and Training

onboard map, starting with more prominent visible features (rivers and highways) to more detailed items. The pilot was also given altitude block assignments that became increasingly restrictive across scenarios. This “talk-on” procedure required the pilot to look OBS for long and frequent periods of time and was designed to require the pilot to divide attention between airmanship and weaponeering; visually identifying target features in the virtual world simulated with a Distributed Aperture System (DAS) using a head-tracked graphics processor. The pilots’ tasks were to maneuver to visually acquire each target, identify with the HMD DAS, and employ an Mk-82 bomb using a Continuous Computed Impact Point (CCIP) delivery method. The pilots also provided immediate time-on-target for either a show-of-force or a bomb-on-target delivery and provided subjective workload and situation awareness ratings. Results: Detailed results are available in Schnell et al. (2017). One key result pertains to mean talk-on duration (talk-on start to talk-on complete) since the less time spent during the talk-on process, the sooner the weapon can be deployed. Also, recall that the talk-on process requires a significant amount of OBS time. Therefore, this metric tests the efficacy of the OBS symbology for completing the task of delivering a weapon to a visually acquired target while not contributing to pilot SD. Results showed that the DFR and the NDFR symbology enabled study participants to accomplish the talk-on process with fewer OBS head movements of longer duration when compared to the CDF. This was also evident in the heat-maps of head tracker data. The DFR and NDFR symbology enabled the pilots to complete each talk-on with fewer long-duration OBS head movements compared to the CDF. Conclusion: The DFR and the NDFR symbology sets both allowed pilots to successfully perform their targeting task and maintain attitude awareness. Based on this study and the previous study describing additional testing of the NDFR, it has been shown to be an effective set of symbology for pilots using HMDs to perform a variety of tasks and should be considered as a candidate for transition to operational use. Impact The use of an in-flight simulator was very effective in evaluating the different symbology sets in a more realistic, demanding flight environment, ensuring that they weren’t also causing new challenges for pilots (e.g., SD). It also provided guidance for subsequent research. For example, follow-on research is planned to examine if a Spatial Audio Horizon Cueing system would augment or be an improvement over different candidate HMD symbology sets (Geiselman et al., 2017; Schnell, 2019). In-flight simulators can also inform the design of motion-based simulators, as mentioned under the DRD simulator description in the “Motion-Based Simulators” section. In turn, data collected in the L-29 can be used to validate the DRD’s ability to accurately and economically reproduce the forces of flight in a highly controlled laboratory setting.

MULTISENSORY DISPLAYS AND CONTROLS As the preceding sections illustrate, both display and control technologies available for use in simulation research have significantly advanced over the past 50

Controls and Displays for Aviation Research Simulation

53

years. This is illustrated, for example, by the shift from single-function to multifunction displays and controls and the shift from HUDs to HMDs. Research using simulators has also provided valuable data into whether advanced/novel technologies are applicable for aviation. For example, the use of stereoscopic threedimensional displays did not result in improved flight performance (McIntire et al., 2014). Similarly, simulations have shown that many novel multisensory display and control approaches, popular in the mid-1990s, need further technological maturity before being considered candidates for aviation application. Some are briefly described here. Multisensory displays can convey important information without redirecting the operators’ gaze point and provide relief when operators are visually saturated. One alternative display is the use of localized audio (commonly referred to as 3-D audio) that consists of tones or cues presented via headphones at fixed positions in the external environment, regardless of listener’s head position. The tone placement can vary in azimuth, elevation, and range. Research in the DES (see “Motion-Based Simulators” section) showed that pilots’ ability to localize virtual auditory tones is relatively unchanged up to approximately 5.5 +Gz, but begins to deteriorate at 7.0 +Gz (Nelson et al., 1998). Candidate aviation applications include enhancing pilots’ spatial orientation (Endsley & Rosiles, 1995; Geiselman et al., 2017), redirecting gaze (Perrott et al., 1996), reducing target search and detection time (Simpson et al., 2002), and flying in degraded visual environments such as fog (Milam et al., 2019). Tactile displays are another novel display example (van Erp, 2002). Wrist-worn tactors can alert pilots of automation interventions (Sarter, 2000), cue system faults (Calhoun et al., 2002), and alert weather severity (Rodriguez-Paras et al., 2021). A thigh-mounted vibrotactile display can provide critical directional orientation in the vertical plane (Salzer et al., 2011). Torso vests containing arrays of tactors that vibrate to convey various types of information have been tested (e.g., to show attitude information; Rupert et al., 2016). Also, a combination of multisensory approaches can be more effective than singular modalities, such as using both 3-D audio and tactile cues to help operators counter spatial disorientation (Brill et al., 2015). The interest in multisensory controls for aviation stems from the concern that head-down glances can cause disorientation and vertigo, as well as distract pilots from attending to primary flight tasks. Several hands-free, head-up control technologies have been evaluated, two of which have merited aviation applications. Control inputs based on head-aiming are one example (Highland et al., 2021), primarily used in conjunction with head-mounted displays (see “SIRUS” and “University of Iowa Operator Performance Laboratory” sections). Also, speech-based input (see sections on “MAGIC”, NASA OV-10, and “VISTA”) more recently was enabled in the F-35, but its use has been problematic due to faulty input recognition in extreme flight conditions (Hush-kit aviation magazine, 2021). Examples of other multisensory controls include inputs based on (1) eye line of sight (complicated by varying lighting conditions and extreme look angles; Calhoun & Janson, 1991), (2) lip movement gestures measured with ultrasonic sensors (Jennings & Ruck, 1995), (3) eyebrow or clenched jaw inputs detected with electromyographic signals (Junker et al., 1995), and (4)

54

Human Factors in Simulation and Training

brain electrical activity (e.g., electroencephalographic signals detecting luminance modulation of different control station options (Middendorf et al., 1999; Nasman et al., 1997). Given that the technological limitations of such novel approaches can be overcome, their application is likely to involve multimodal control. For instance, speech recognition may improve if ultrasonic lip motion is also measured (especially for noisy environments; Jennings & Ruck, 1995) or eye line of sight is tracked to focus the vocabulary search to the most likely commands associated with the gaze point.

DISPLAYS AND CONTROLS TO SUPPORT HUMAN–MACHINE TEAMING As noted earlier, there have been a number of changes in controls and displays over the past 50 years in aviation research simulators, primarily reflecting advancements in specific interface technologies. More recently, interface designs are addressing advances in intelligent-agent support technologies (e.g., enabled with artificial intelligence, machine learning, etc.). This chapter, for example, describes control/ display interface designs that team pilots with intelligent agents to aid decisionmaking and task completion for Sense and Avoid de-conflictions and cyberattacks (see section on “VSCS”), as well resource allocation, course of action analysis, and operation assessment (see section on “IMPACT”). Human–machine teaming, in fact, is now considered essential for dynamic, complex, and uncertain military operations because the operator’s decision-making is limited by available information, cognitive processing power, and available time (Chen et al., 2018; Voshell et al., 2016; USAF, 2019). By augmenting the human’s pattern recognition ability and intuition/flexibility with intelligent systems’ computation/reasoning, the unique capabilities of both human and machine team members can be harnessed for more informed, robust, and timely operations. Taylor (1993) referred to this teaming as “cooperative functioning” in that the human and machine members should interact at multiple levels and on all tasks. Calhoun et al. (2016) illustrated this goal in two scenarios that contrast control and display approaches that are collaborative versus supervisory (when humans have authority over the machine). Details are also included on design challenges for supporting collaboration and how to provide intent/decision support and information fusion required for human–machine interactions. Requirements have also been identified by examining the capabilities and limitations of intelligent agents from the standpoint of human teams (Joe et al., 2014) and pinpointing challenges to establishing common ground and coordination across team players (Klein et al., 2004). For instance, bi-directional communication is needed in human–machine teams that enables the operator to convey mission objectives and constraints to the machine agents, as well as the intelligent agent(s) explaining the basis and details of its decision guidance and actions (i.e., provide transparency on its reasoning at an appropriate level of detail; Chen et al., 2017). Ideally, a give-and-take human–machine dialogue can be mechanized that is characteristic of human–human teams (Bradshaw et al., 2004).

Controls and Displays for Aviation Research Simulation

55

Driven by advancements in artificial intelligence, more and more systems will include some form of intelligent aiding. This will result in changes to the role and tasking of the human operators. Research simulators will be a critical component for evaluating new human–machine teaming solutions with candidate control and display technologies to determine their effectiveness in providing mutual support and assistance. A potential source to inform the design of controls and displays in aviation research simulators is a review of test environments and experimental designs addressing human–machine teaming (O’Neill et al., 2020). Ideally, the simulators should support a multi-task environment by which the level of automation for each task can be dynamic and context dependent. In this manner, adaptable automation (Calhoun, 2021) can be applied. This will enable the evaluation of candidate control and display interfaces on their effectiveness in establishing and updating working agreements that define each human and machine member’s responsibilities for completing task-related functions, as well as coordinate courses of action, communicate pertinent information, track task completion/system status, and support shared situation awareness.

SUMMARY Prior to the 1970s, controls and displays in aircraft research simulators tended to replicate the operational approach; each control and display device performed a dedicated function. However, with the advent of E-O devices and computer-based systems, a crew station revolution began and interfaces were implemented that integrated multiple functions onto fewer control and display devices. Multifunction displays presented information that used to be on one or more analog displays. Adding switches to the joystick and throttle enabled pilots to control multiple functions while keeping their hands-on throttle and stick. Moreover, the number of functions that could be controlled by a single multifunction keyboard was only limited by the workload involved in navigating multiple pages to access functions several levels deep in the menu structure. Still, these controls and displays required some “head-down” time in the cockpit. In the 1980s and 1990s, research simulation shifted to exploiting this digital technology and exploring how it could be used in innovative ways. Rather than merely duplicate conventional formats on multifunction displays, the use of color, pictorial representation, and 3-D displays were explored for facilitating information acquisition. In addition, head-mounted displays driven by head position had the advantage of allowing pilots to control the aircraft while keeping their “eyes out of the cockpit.” Simpler, less-cumbersome control technologies were also evaluated including use of speech commands and head and/or eye gaze. More recently, research started to evaluate controls and displays that will tap into the pilots’ biosignals to an even greater extent. Displays considered including tactile displays and spatial audio displays. Novel controls included interfaces based on gesture input and biopotential signals from muscles or the brain. Each of these technologies promoted head-up, hands-free control and display. However, aside from

56

Human Factors in Simulation and Training

voice-based systems, these technologies are still immature and have either limited bandwidth or integration hurdles to be solved with further development and simulation evaluation. The 21st century marked a shift to a human-centered control/display design approach. The pilot’s requirements were paramount, and as a result, attention shifted to what control/display options best fulfilled them. Multimodal approaches were also stressed, in order not to overload any single modality of the pilot. Along with this increased focus on the pilot, the role that artificial intelligence advancements can best assist the pilot and respective operations is now a key focus, examining, for example, dynamic function allocation across human and machine team members and what authorization procedures should be in effect. Although this chapter’s treatment of the evolution of controls and displays focused on research simulators for Air Force cockpits and UAV operator consoles, similar trends are evident in other domains such as medicine, power plants, and transportation in terms of improving how information is controlled and displayed in any complex workstation. Additionally, this chapter illustrates the importance of research simulation. Iterative use of a representative task environment in research should be employed to optimize the implementation parameters of candidate controls and displays. However, it is important to also evaluate candidate interfaces in highfidelity simulators to be more similar to operational environments. Ultimately, testing in the actual application environment is ideal because that enables true validation of interface designs. Regardless of the testing environment, simulators will play a key role in the advancement of workstation control and display technologies for efficient operator–autonomy teams.

ACKNOWLEDGMENT The authors would like to acknowledge Dr. John M. Reising who spearheaded much of the fixed-based simulator research reported herein. It is due to his expertise in this area that this chapter was initiated, and we are grateful for both his significant contributions to it and his valuable insight.

REFERENCES Aretz, A. J., & Calhoun, G. L. (1982). Computer generated pictorial stores management displays for fighter aircraft. Proceedings of the Human Factors Society 26th Annual Meeting, 455–459. Barry, T. P., Liggett, K. K., Williamson, D. T., & Reising, J. M. (1992). Enhanced recognition accuracy with the simultaneous use of three automated speech recognition systems. Proceedings of the Human Factors Society 36th Annual Meeting, 288–292. Barry, T. P., Williamson, D. T., & Snyder, R. A. (2006). Evaluating the Dynaspeak Speech Recognition System for Use in the Joint Strike Fighter (JSF) Using VISAT Flight Test Recordings. Technical Report. Available at 711 Human Performance Wing, 45433– 7022. Wright-Patterson Air Force Base, Dayton, OH: Human Effectiveness Directorate. Barthelemy, K. K., Reising, J. M., & Hartsock, D. C. (1991). Target designation in a perspective view, 3-D map using a joystick, hand tracker, or voice. Proceedings of the Human Factors Society 35th Annual Meeting, 97–101.

Controls and Displays for Aviation Research Simulation

57

Bartik, J., Darrah, S., Moulton, S., & Lemasters, L. (2017). Detect and Avoid (DAA) Automation Maneuver Study (AFRL-RH-WP-TR-2017-0018). Wright-Patterson Air Force Base, Dayton, OH: Air Force Research Laboratory. Bartik, J., Rowe, A., Draper, M., Frost, E., et al. (2020). Autonomy Strategic Challenge (ASC) Allied IMPACT Final Report. TTCP Technical Report (ASC-01-2020). Bassett, P., & Lyman, J. (1940, July). The flightray, a multiple indicator. Sperryscope, 9(3), 10. Behymer, K., Rothwell, C., Ruff, H., Patzek, M., Calhoun, G., Draper, M., Douglass, S., Kingston, D., & Lange, D. (2017). Initial Evaluation of the Intelligent Multi-UxV Planner with Adaptive Collaborative/Control Technologies (IMPACT). Technical Report AFRL-RH-WP-TR-2017-0011. Wright-Patterson Air Force Base, Dayton, OH: Air Force Research Laboratory. Bradshaw, J. M., Acquisti, A., Allen, J., Breedy, M. R., Bunch, L., Chambers, N., Galescu, L., et al. (2004). Teamwork-centered autonomy for extended human-agent interaction in space applications. Artificial Intelligence and Human-Robot Interaction (AAAI) Spring Symposium Series, 136–140. Brill, J., Lawson, B. D., & Rupert, A. H. (2015). Audiotactile aids for improving pilot situation awareness. 18th International Symposium on Aviation Psychology, 13–18. https:// corescholar​.libraries​.wright​.edu​/isap​_2015​/105 Brooke, J. (1996). SUS: A ‘quick and dirty’ usability scale. In P. Jordan, B. Thomas, I. McClelland, & B. Weerdmeester (Eds.), Usability Evaluation in Industry. Bristol, PA: Taylor & Francis Group (pp. 4–7). Calhoun, G. L. (1978). Control logic design criteria for multifunction switching devices. Proceedings of the Human Factors Society 22nd Annual Meeting, 383–387. Calhoun, G. (2021). Adaptable (not adaptive) automation: The forefront of human–automation teaming. Human Factors. https://doi​.org​/10​.1177​/00187208211037457 Calhoun, G. L., Arbak, C. J., & Janson, W. P. (1985). Eye and head response to an attention cue in a dual task paradigm. Proceedings of the Human Factors Society 29th Annual Meeting, 1125–1129. Calhoun, G. L., & Draper, M. H. (2010). Unmanned aerial vehicles: Enhancing video display utility with synthetic vision technology. In M. Barnes & F. Jentsch (Eds.), HumanRobot Interactions in Future Military Operations (Chapter 13, pp. 229–248). London, UK: Ashgate. Calhoun, G. L., Draper, M. H., & Ruff, H. A. (2003). Multi-sensory interface concepts for teleoperated unmanned air vehicle (UAV) systems. Proceedings of the 12th International Symposium on Aviation Psychology, 190–195. Calhoun, G. L., Draper, M. H., Ruff, H. A., & Fontejon J. V. (2002). Utility of a tactile display for cueing faults. Proceedings of the Human Factors and Ergonomics Society 46th Annual Meeting, 2144–2148. Calhoun, G. L., Draper, M. H., Ruff, H. A., Nelson, J., & Lefebvre, A. (2006). Simulation assessment of synthetic vision system concepts for UAV operations. SPIE Defense & Security Symposium: Enhanced & Synthetic Vision (Vol. 6226, 62260E–1-62260E-12). Orlando, FL. Calhoun, G. L., Fontejon, J. V., Draper, M. H., Ruff, H. A., & Guilfoos, B. (2004). Tactile versus aural redundant alert cues for UAV control applications. Proceedings of the Human Factors and Ergonomic Society 48th Annual Meeting (Vol. 48, No. 1, 137–142). Calhoun, G. L., Goodrich, M. A., Dougherty, J. R., & Adams, J. A. (2016). Human-autonomy collaboration and coordination toward multi-RPA missions. In N. Cooke, L. Rowe, & W. Bennett (Eds.), Remotely Piloted Aircraft: A Human Systems Integration Perspective, (Chapter 5, pp. 101–136). Hoboken, NJ: Wiley. Calhoun, G. L., & Herron, E. L. (1981). Computer generated cockpit engine displays. Proceedings of the Human Factors Society 25th Annual Meeting, 127–131.

58

Human Factors in Simulation and Training

Calhoun, G. L., & Herron, E. L. (1982). Pilot-machine interface considerations for advanced avionics systems. AGARD 43rd Symposium of the Avionics Panel on Advanced Avionics and the Military Aircraft Man/Machine Interface (Vol. 24, 1–7). Calhoun, G. L., Herron, E. L., Reising, J. M., & Bateman, R. P. (1980). Evaluation of Factors Unique to Multifunction Controls/Displays Devices. Report No. AFWAL-TR-80-3131. Wright-Patterson Air Force Base, Dayton, OH: Air Force Wright Aeronautical Laboratories. Calhoun, G. L., & Janson, W. P. (1990). Eye and head response as indicators of attention cue effectiveness. Proceedings of the Human Factors Society 34th Annual Meeting, 1–5. Calhoun, G. L., & Janson, W. P. (1991). Eye control interface considerations for aircrew station design. Sixth European Conference on Eye Movements. Leuven, Belgium: University of Leuven. Calhoun, G. L., Janson, W. P., & Arbak, C. J. (1986). Use of eye control to select switches. Proceedings of the Human Factors Society 30th Annual Meeting, 154–158. Calhoun, G. L., Ruff, H. A., Behymer, K. J., & Frost, E. M. (2018). Human-autonomy teaming interface design considerations for multi-unmanned vehicle control. Theoretical Issues in Ergonomics Science, 19(3), 321–352. Calhoun, G. L., Ruff, H. A., Behymer, K. J., & Rothwell, C. D. (2017). Evaluation of interface modality for control of multiple unmanned vehicles. Human Computer Interaction International (HCII), International Conference on Engineering Psychology and Cognitive Ergonomics, 15–34). Cham, Springer. Chelette, T., Allnutt, R., Tripp, L., Esken, R., Bolia, S., & Post, D. (1999). Polychromatic percepts during hypergravity. Journal of Gravitational Physiology, 6(1), 13–4. Chen, J. Y. C. (2018). Human-autonomy teaming in military settings. Theoretical Issues in Ergonomics Science, 19, 255–258. https://doi​.org​/10​.1080​/1463922X​.2017​.1397229 Chen, J. Y. C., Lakhmani, S. G., Stowers, K., Selkowitz, A. R., Wright, J. L., & Barnes, M. (2017). Situation awareness-based agent transparency and human-autonomy teaming effectiveness. Theoretical Issues in Ergonomics Science, 19(3), 259–282. Christ, R. E. (1976). Analysis of color and its effectiveness. Paper Presented at the Naval Air Test Center Third Advanced Aircrew Display Symposium. Patuxent River, MD. Draper, M., Calhoun, G., Nelson, J., & Ruff, H. (2006). Evaluation of Synthetic Vision Overlay Concepts for UAV Sensor Operations: Landmark cues and Picture-in-Picture. Technical Report AFRL-HE- WP-TR-2006-0038. Wright-Patterson Air Force Base, Dayton, OH: Air Force Research Laboratory. Draper, M. H., Calhoun, G. L., Ruff, H. A., Williamson, D. T., & Barry, T. P. (2003). Manual versus speech input for unmanned aerial vehicle control station operations. Proceedings of the Human Factors and Ergonomics Society 47th Annual Meeting, 109–113. Draper, M. H., Geiselman, E. E., Lu, L. G., Roe, M. M., & Haas, M. W. (2000). Display concepts supporting crew communication of target location in unmanned air vehicles. Proceedings of the Human Factors and Ergonomics Society 44th Annual Meeting, 385–388. Draper, M., Rowe, A., Douglass, S., Calhoun, G., Spriggs, S., Kingston, D., ... & Reeder, J. (2018). Realizing Autonomy via Intelligent Hybrid Control: Adaptable Autonomy for Achieving UxV RSTA Team Decision Superiority (Also Known as Intelligent MultiUxV Planner with Adaptive Collaborative/Control Technologies (IMPACT)) (AFRLRH-WP-TR-2018-0005). Wright-Patterson Air Force Base, Dayton, OH: Air Force Research Laboratory. Draper, M. H., Ruff, H. A., Fontejon, J. V., & Napier, S. (2002). The effects of head-coupled control and head-mounted displays (HMDs) on large-area search tasks. Proceedings of the Human Factors and Ergonomics Society 46th Annual Meeting, 2139–2143.

Controls and Displays for Aviation Research Simulation

59

Draper, M. H., Ruff, H. A., Repperger, D. W., & Lu, L. G. (2000). Multi-sensory interface concepts supporting turbulence detection by UAV controllers. Proceedings of the Human Performance, Situation Awareness and Automation Conference, 107–112. Endsley, M. R., & Rosiles, S. A. (1995). Auditory localization for spatial orientation. Journal of Vestibular Research, 5(6), 473–485. Feitshans, G. L., Rowe, A. J., Davis, J. E., Holland, M., & Berger, L. (2008). Vigilant Spirit Control Station (VSCS): The face of COUNTER. Proceedings from American Institute of Aeronautics and Astronautics (AIAA). AFRL-RH-WP-TR-2012-0015. WrightPatterson Air Force Base, Dayton, OH: Air Force Research Laboratory. Frost, E., Calhoun, G., Ruff, H., Bartik, J., Behymer, K., Spriggs, S., & Buchanan, A. (2019). Collaboration interface supporting human-autonomy teaming for unmanned vehicle management. International Symposium of Aviation Psychology. Gawron, V. J., & Bailey, R. E. (1995). In-flight evaluation of the C-141 all-glass cockpit. Proceedings of the 8th International Symposium on Aviation Psychology, 80–85. Geiselman, E. (1999). Practical considerations for fixed wing helmet-mounted display symbology design. Proceedings of the Human Factors and Ergonomics Society 43rd Annual Meeting. Santa Monica, CA. Geiselman, E. E., Williams, H. P., & Schnell, T. (2017). Use of a live, virtual, constructive simulation approach to evaluate visual symbology on a helmet-mounted display for spatial disorientation prevention. Proceedings of IMAGE Society 2017 Conference. Graham, S., Chen, W., De Luca, J., Kay, J., Deschenes, M., Weingarten, N., Raska, V., & Lee, X. (2011). Multiple intruder autonomous avoidance flight test. Infotech@Aerospace 2011. St. Louis, MS, March 29–31. Herron, E. L. (1978). Two types of system control with an interactive display device. Proceedings of the 1978 International Symposium and Exhibition of the Society for Information Display, 84–85. Highland, P., Harp, D., Schnell, T., Geiselman, E., & Havig, P. (2021). The customer is always right… Towards rhino pointing and eye tracking interfaces for combat aviators. 48th International Symposium on Aviation Psychology, 86. Hush-kit Aviation magazine. (2021). https://hushkit​.net​/2021​/01​/21​/what​-is​-good​-and​-bad​ -about​-the​-f​-35​-cockpit​-a​-panthers​-pilots​-guide​-to​-modern​-cockpits/ Jenkins, J. C., Thurling, A. J., Havig, P. R., & Geiselman, E. E. (2002). Flight test evaluation of the non-distributed flight reference off-boresight helmet-mounted display symbology. In R. J. Lewandowski, L. A. Haworth, & H. J. Girolamo (Eds.). Proceedings of SPIE Helmet-Mounted Displays VII (Vol. 4711, pp. 341–355). Los Angeles, CA: SAGE Publishing. Jennings, D. L., & Ruck, D. W. (1995). Enhancing automatic speech recognition with an ultrasonic lip motion detector. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 868–871. Joe, J., O’Hara, J., Medema, H., & Oxstrand, J. (2014). Identifying requirements for effective human-automation teamwork. In  Proceedings of the 12th International Conference on Probabilistic Safety Assessment and Management (PSAM 12, Paper# 371), (INL/ CON-14-31340). Junker, A., Berg, C., Schneider, P., & McMillan, G. (1995). Evaluation of the Cyberlink Interface as an Alternative Human Operator Controller. US Air Force Technical Report AL/CF-TR-1995-0011. Wright-Patterson Air Force Base, Dayton, OH: Armstrong Laboratory. Klein, G., Woods, D. D., Bradshaw, J. M., Hoffman, R. R., & Feltovich, P. J. (2004). Ten challenges for making automation a “team player” in joint human-agent activity. IEEE Intelligent Systems, 19(6), 91–95.

60

Human Factors in Simulation and Training

Kopala, C. J. (1979). The use of color-coded symbols in a highly dense situation display. Proceedings of the 23rd Human Factors Society Annual Meeting, 397–401. Krebs, M. J., Wolf, J. D., & Sandvig, J. H. (1978). Color Display Design Guide. Report ONRCR213-136-2F. Minneapolis, MN: Office of Naval Research. Liggett, K. K., Reising, J. M., Beam, D. J., & Hartsock, D. C. (1993). The use of aiding techniques and continuous cursor controllers to designate targets in 3-D space. Proceedings of the Human Factors Society 37th Annual Meeting, 11–15. Liggett, K. K., Reising, J. M., & Hartsock, D. C. (1992). The use of a background attitude indicator to recover from unusual attitudes. Proceedings of the Human Factors Society 36th Annual Meeting, 43–47. Liggett, K., Venero, P., & Thomas, G. (2017). An Investigation into the information requirements for remotely piloted aircraft crew when dealing with cyber threats. Proceedings of the 19th International Symposium on Aviation Psychology, 317–322. McIntire, J. P., Havig, P. R., & Geiselman, E. E. (2014). Stereoscopic 3D displays and human performance: A comprehensive review. Displays, 35(1), 18–26. Middendorf, M. S., McMillan, G. R., Calhoun, G. L., & Jones, K. S. (1999). EEG-based control of virtual buttons. Proceedings of the 43rd Annual Meeting of the Human Factors and Ergonomics Society, 942–946. Milam, L., Akins, E., Simpson, B., Williams, H., & Jones, H. (2019). Techniques to Explore Spatial Audio Cues for Aiding Helicopter Navigation in Degraded Visual Environments. Army Aeromedical Research Lab. Fort Rucker, United States. Nasman, V. T., Calhoun, G. L., & McMillan, G. R. (1997). Brain-actuated control and HMDs. In J. Melzer & K. Moffitt (Eds.), Head-Mounted Displays: Designing for the User (285–310). New York: McGraw-Hill. Nelson, W. T., Bolia, R. S., McKinley, R. L., Chelette, T. L., Tripp, L. D., & Esken, R. L. (1998). Localization of virtual auditory cues in a high +Gz environment. Proceedings of the Human Factors and Ergonomics Society 42nd Annual Meeting, 97–101. Nicklas, D. (1958). A History of Aircraft Cockpit Instrumentation 1903–1946. Technical Report No. 57-301. Wright-Patterson Air Force Base, Dayton, OH: Wright Air Development Center. O’Neill, T., McNeese, N., Barron, A., & Schelble, B. (2020). Human–autonomy teaming: A review and analysis of the empirical literature. Human Factors. https://doi​.org​ /0018720820960865 Osga, G. A. (1991). Using enlarged target area and constant visual feedback to aid cursor pointing tasks. Proceedings of the Human Factors Society 35th Annual Meeting, 369–373. Perrott, D. R., Cisneros, J., McKinley, R. L., & D’Angelo, W. R. (1996). Aurally aided visual search under virtual and free-field listening conditions. Human Factors, 38, 702–715. Reising, J. M. (1977). Multifunction keyboard configurations for single-seat, air-to-ground fighter cockpits. Proceedings of the Human Factors Society 21st Annual Meeting, 363–366. Reising, J. M., & Calhoun, G. L. (1982). Color display formats in the cockpit: Who needs them? Proceedings of the Human Factors Society 26th Annual Meeting, 446–450. Reising, J. M., Liggett, K. K., & Hartsock, D. C. (1995). New flight display formats. Proceedings of the 8th International Symposium on Aviation Psychology, 86–91. Reising, J. M., Liggett, K. K., Rate, C., & Hartsock, D. C. (1992). Three-dimensional designation using two control devices and an aiding technique. Proceedings of SPIE Electronics Imaging Symposium (Vol. 1669, 146–154). Reising, J. M., Liggett, K. K., Solz, T. J., & Hartsock, D. C. (1995). A comparison of two head up display formats used to fly curved instrument approaches. Proceedings of the Human Factors and Ergonomics Society 39th Annual Meeting, 1–5.

Controls and Displays for Aviation Research Simulation

61

Reising, J. M., Zenyuh, J. P., & Barthelemy, K. K. (1988). Head-up display symbology for unusual attitude recovery. Proceedings of the National Aerospace and Electronics Conference, 926–930. Rodriguez-Paras, C., McKenzie, J. T., Choterungruengkorn, P., & Ferris, T. K. (2021). Severity-mapped vibrotactile cues to support interruption management with weather messaging in the general aviation cockpit. Atmosphere, 12, 341. https://doi​.org​/10​.3390​ /atmos12030341 Rowe, A. J., Liggett, K., & Davis, J. E. (2009). Vigilant spirit control station: A research testbed for multi-UAS supervisory control interfaces. Proceedings of the Fifteenth International Symposium on Aviation Psychology, 287–292. Rupert, A. H., Woo, G., Brill, J. C., & Lawson, B. (2016). Countermeasures for loss of situation awareness: Spatial orientation modeling to reduce mishaps. 2016 IEEE Aerospace Conference, 1–9. https://doi​.org​/10​.1109​/AERO​.2016​.7500725 Salzer, Y., Oron-Gilad, T., Ronen, A., & Parmet, Y. (2011). Vibrotactile “on-thigh” alerting system in the cockpit. Human Factors, 53(2), 118–131. Sarter, N. B. (2000). The need for multisensory interfaces in support of effective attention allocation in highly dynamic event-driven domains: The case of cockpit automation. The International Journal of Aviation Psychology, 10(3), 231–245. Schnell, T., Geiselman, E., Simpson, B., & Williams, H. (2019). Helmet mounted display format and spatial audio cueing flight test. 20th International Symposium on Aviation Psychology, 241. Schnell, T., Reichlen, C., Reuter, C., Geiselman, E., Knox, J., & Williams, H. (2017). A comparison of helmet-mounted display symbologies during live flight operational tasks. 19th International Symposium on Aviation Psychology, 485. Simpson, B. D., Bolia, R. S., McKinley, R. L., & Brungart, D. S. (2002). Sound localization with hearing protectors: Performance and head motion analysis in visual search task. Proceedings of the Human Factors and Ergonomics Society 46th Annual Meeting, 1618–1622. Solz, T. J., Reising, J. M., Liggett, K. K., Lohmeyer, T., & Hartsock, D. C. (1994). The use of aiding techniques and varying depth volumes to designate targets in 3-D space. Proceedings of the Human Factors Society 38th Annual Meeting, 1–5. Spengler, R. P. (1988). Advanced Fighter Cockpit. Technical Report ERR-FW-2936. Fort Worth, TX: General Dynamics. Taylor, R. (1993). Human factors of mission planning systems: Theory and concepts. AGARD-LS-192 New Advances in Mission Planning and Rehearsal Systems, 2-1–2-22. USAF Science and Technology Strategy: Strengthening USAF Science and Technology for 2030 and Beyond. (2019). Available at: https://www​.airforcemag​.com​/ PDF​ /DocumentFile​ / Documents​ / 2019​ / USAF​ %20Science​ %20and​ %20Technology​ %20Strategy​.pdf van Erp, J. B. (2002). Guidelines for the use of vibro-tactile displays in human computer interaction. EuroHaptics 2002, 18–22. Voshell, M., Tittle, J., & Roth, E. (2016, March). Multi-level human-autonomy teams for distributed mission management. 2016 AAAI Spring Symposium Series. Williams, H. P., Horning, D. S., Etgen, C., & Powell, C. R. (2021). Effects of Cockpit Workload and Motion on Incidence of Spatial Disorientation in Simulated Flight. Technical Report NAMRU-D-21-034. Wright-Patterson Air Force Base, Dayton, OH: Naval Medical Research Unit-Dayton. Williamson, D. T., Barry, T. P., & Liggett, K. K. (1996). Flight test performance optimization of ITT VRS-1290 speech recognition system. Audio Effectiveness in Aviation: Proceedings of the Aerospace Medical Panel Symposium.

62

Human Factors in Simulation and Training

Willich, W., & Edwards, R. E. (1975). Analysis and Flight Simulator Evaluation of an Advanced Fighter Cockpit Configuration. Technical Report AFAL-TR-75-36. WrightPatterson Air Force Base, Dayton, OH: Air Force Avionics Laboratory. Willshire, K. F., Latorella, K. A., & Glaab, L. J. (2000). NASA Langley crew systems contributions to aviation safety technology: Results of studies to date. Proceedings of the IEA 2000/HFES 2000 Congress, 4-376–4-379. Zipoy, D. R., Premselaar, S. J., Gargett, R. E., Belyea, I. L., & Hall, H. J. (1970). Integrated Information Presentation and Control Systems Study, Vol. 1, System Development Concepts. Technical Report AFFDL-TR-70-79, Vol. 1. Wright-Patterson Air Force Base, Dayton, OH: Air Force Flight Dynamics Laboratory.

ACRONYMS/ABBREVIATIONS 2-D: 3-D: ADI: AI: ATAK: ATM: AVO: BAI: CCD: CCIP: CDF: CDS: COMM: CRT: CSM: DAS: DAT: DES: DFR: DIGISYN: DoD: DRD: E-M: E-O: FOV: G: HMD: HMOF: HMS: HOTAS: HUD: IFF: IIPACSS:

two-dimensional three-dimensional attitude director indicator attitude indicator Android-Based Tactical Awareness Kits automatic teller machine Air Vehicle Operator Background Attitude Indicator charge-coupled device Continuous Computed Impact Point Current Display Format Control/Display System communication Cathode Ray Tube Cyber Security Module Distributed Aperture System digital audio tape Dynamic Environmental Simulator Distributed Flight Reference Digital Synthesis Simulator Department of Defense Disorientation Research Device electro-mechanical electro-optical field-of-view gravitational force helmet-mounted display Helmet-Mounted Oculometer Facility Simulator helmet-mounted sight hands on throttle and stick head-up display identify friend or foe Integrated Information Presentation and Control System Study

Controls and Displays for Aviation Research Simulation

IMPACT: JADC2: JOCA: JTAC: LCD: LOS: LVC: MAGIC: MIL-STD: NAMRU-D: NAS: NASA: NDFR: NMAC: OBS: OPL: OTW: PC: PCCADS: SAA: SD: SIRUS: SO: SUS: SVS: TIFS: TSD: TTCP: UAV: UHF: UV: VCATS: VHF: vHUD: VSCS: VISTA: VSCS:

63

Intelligent Multi-Unmanned Vehicle Planner with Adaptive Collaborative/Control Technologies Simulator joint all-domain command and control Jointly Optimal Conflict Avoidance Joint Terminal Attack Controller liquid crystal display line of sight Live, Virtual, Constructive Microprocessor Applications for Graphics and Interactive  Communication military standard Naval Medical Research Unit – Dayton National Airspace System National Aeronautics and Space Administration non-distributed flight reference Near Mid-Air Collision off-boresight Operator Performance Laboratory out-the-window personal computer Panoramic Cockpit Control and Display System Simulator Sense and Avoid spatial disorientation Synthetic Interface Research for UAV Systems Simulator sensor operator System Usability Scale Synthetic Vision System Total In-flight Simulator Tactical Situation Display The Technical Cooperative Program Unmanned aerial vehicles ultra-high frequency unmanned vehicles Visually Coupled Acquisition and Targeting System very high frequency virtual head-up display Vigilant Spirit Control Station Variable In-Flight Stability Test Aircraft Vigilant Spirit Control Station

2

Augmented Reality as a Means of Job Task Training in Aviation Dan Macchiarella, Jiahao Yu, Dahai Liu, and Dennis A. Vincenzi

CONTENTS Augmented Reality (AR).......................................................................................... 67 Historical Overview: AR and Training..................................................................... 67 Cognition and AR..................................................................................................... 73 Elaboration and Recall..................................................................................... 73 Spatial Relations.............................................................................................. 74 Memory Channels and AR............................................................................... 77 Knowledge Development and Training Transfer...................................................... 78 What Is the Future of Job Training – Training on the Job Literally?........................80 Conclusion................................................................................................................ 83 References................................................................................................................. 83 Historically, the aviation industry expends significant amount of time and resources training and retraining its workforce to perform psychomotor and cognitive maintenance tasks necessary to keep aircraft safely flying (Ott, 1995). The industry continues to dedicate a substantial amount of its effort and capital ensuring that its workforce is prepared to maintain modern and complex aircraft systems. Despite the rapid advances in computer-based training technologies (e.g., augmented reality [AR]), aviation maintenance workers presently participate in job task training in traditional face-to-face settings that would be familiar to aviation maintenance workers from generations past. Changing the manner in which aviation maintenance workers are trained, with the goal of capturing the positive effects associated with computer-based training technologies, has the potential to optimize training. Airframe and Powerplant (A&P) certified mechanics are serving as the primary workers in the nation’s aviation industry. The United States General Accounting Office (2003) completed a study that highlights the need for curriculum reform by the Federal Aviation Administration (FAA) for the training and certification of A&P mechanics. A relatively large number of workers in the aviation maintenance field possess an A&P license. The number of A&P mechanics in the US labor market was not forecasted to meet the industry’s needs (US General Accounting Office, 2003). DOI: 10.1201/9781003401353-2

65

66

Human Factors in Simulation and Training

As in Boeing pilot and technician outlook 2020–2039, new personnel demand was calculated based on a 20-year fleet forecast, aircraft utilization, attrition rate, and regional differences. There will be a 192,000 gap for mechanics in North America while 739,000 new technicians will be needed to maintain the global fleet in the next 20 years (Boeing, 2020). Even though COVID-19 caused a temporary shortterm oversupply in the aviation industry, the long-term demand is still robust as mechanics are still retiring faster than they are replaced (Aviation Technician Education Council, 2020). The average age of an FAA mechanic is 52, and 33% of mechanics are over 60 while new mechanics make up only 2% of the mechanics’ population annually (Aviation Technician Education Council, 2020). A panel convened by the US General Accounting Office (2003) cited the current curriculum as being “obsolete geared to smaller less complex aircraft” (p. 1). Within the next several years, institutions training future aviation maintenance workers will receive a new curriculum for training A&P mechanics. This new curriculum will address the modern complexities of systems and materials being used in aircraft. This change of curriculum, combined with significant cost in resources and time necessary to train and retrain aviation maintenance workers, creates an opportunity to change the fundamental nature of instructional delivery systems (IDS) being used in the aviation maintenance training field. AR has the potential to help the aviation industry meet its training need due to its visual-spatial dynamic that is analogous to a spatial graphical user interface (Kaplan et al., 2021; GUI; Majoros & Boyle, 1997; Majoros & Neumann, 2001; Neumann & Majoros, 1998). Several key factors are associated with training aviation workers: Aviation maintenance work tasks require a high level of knowledge in the field, from entry level (i.e., novice) to the highly skilled level (i.e., expert); the FAA rigidly regulates training curriculum and certification of workers; workers perform work tasks at irregular intervals (e.g., replacing an oil pump on a turbine engine may only occur once in 5 years); and when workers fail to perform work tasks properly, the consequences could be dire. Aviation flight-related, and aviation accidents in general, often result in the loss of human life and large-scale destruction of property. Highlighting the consequences of improper maintenance, the National Transportation Safety Board (NTSB) determined that the crash of Alaska Airlines Flight 261 in January 2000 was due to maintenance irregularities (US General Accounting Office, 2003). The FAA licenses and regulates aviation maintenance workers as part of its effort to ensure safe aviation operations and protect the public in general. The FAA originally developed its core curriculum for repairing and maintaining aircraft over 50 years ago (US General Accounting Office, 2003). Aviation maintenance workers inspect and repair engines, landing gear, instruments, pressurized sections, and other parts of the aircraft. Additionally, they conduct routine maintenance and replacement of parts; repair surfaces for both sheet metal and composite materials; and inspect for corrosion, distortion, and cracks in the fuselage, wings, and tail. While performing maintenance, A&P mechanics test parts and equipment to ensure that they are working properly and then certify that the aircraft is ready to return to service. Aviation maintenance workers often work under time pressure to maintain flight schedules. The majority of them obtain an A&P license through

Augmented Reality as a Means of Job Task Training in Aviation

67

certification by the FAA. Those who do not possess an A&P license can only perform maintenance tasks under the direct supervision of an A&P-licensed mechanic. There are 181 active schools that hold FAA certificates which were issued under Title 14 of the Code of Federal Regulations part 147 (Aviation Technician Education Council, 2020). Candidates for the A&P license must successfully complete a minimum of 1900 hours of classroom instruction at any of these FAA-approved aviation maintenance technician schools or acquire documented evidence that they have at least 30 months of on-the-job training (e.g., service as an aviation mechanic in the military) or show evidence detailing work experience with aircrafts’ engines and bodies. After meeting the requisites for licensing, A&P candidates must pass written and oral tests and demonstrate through a practical test that they can perform maintenance tasks (US General Accounting Office, 2003). Any instructional delivery system or new learning paradigm that has a significant positive effect on aviation maintenance workers during their initial training or retraining after receiving an A&P license could reduce training time and costs, helping to meet the industry’s, need for trained workers.

AUGMENTED REALITY (AR) AR presents a visual-spatial dynamic that may elicit efficiencies during aviation maintenance training (Macchiarella, 2004; Neumann & Majoros, 1998; Valimont, 2002). AR applications that deliver composite virtual and real scenes during aviation maintenance training are analogous to the spatial GUI that now dominates human– computer interaction and may aid attention, memory, and recall (Neumann & Majoros, 1998). However, AR is an emerging technology, and essentially very little research regarding its effectiveness as a training paradigm has been conducted. At the same time, it should be noted of the low fidelity of computer-based systems as they are not realistic enough to completely replace conventional face-to-face training, especially in maintenance which involve complex hands-on experiences (Gonzalez-Franco et al., 2017). However, new research is constantly expanding the body of knowledge of the technologies necessary to bring AR into the real world for application (Azuma, 2004). As the special nature of AR, computer-generated virtual imagery information can be overlaid onto a live direct or indirect real-world setting (Lee, 2012), as “it positions the learner within a real-world physical and social context while guiding, scaffolding and facilitating participatory and metacognitive learning process” (Dunleavy & Dede, 2014).

HISTORICAL OVERVIEW: AR AND TRAINING Essential to understanding the concept of AR is the need to distinguish between real objects, virtual objects, and objects that display characteristics of both reality and virtuality. Milgram and Kishino (1994) effectively defined AR and placed it into a mixed-reality continuum (see Figure 2.1). Milgram’s virtuality continuum is useful for categorizing surroundings as perceived by the human mind. On one end of the continuum is the real environment. It is comprised of real objects that have an actual

68

Human Factors in Simulation and Training

FIGURE 2.1  Milgram’s reality–virtuality continuum. (Adapted from Milgram, P., and Kishino, F. 1994. A taxonomy of mixed reality visual displays. IEICE Transactions Information Systems, E77-D(12), 1321–1329.)

existence. The virtual environment at the other end of the continuum comprises objects that exist in essence or effect but not in a formal or actual state of being. Between these two ends lies the world of mixed reality (Azuma, 1997; Azuma et al., 2001; Billinghurst et al. 2001; Milgram & Kishino, 1994). The distinction between varying degrees of reality and virtuality is not significant in terms of human interaction with the mixed-reality world. However, from a technical perspective for creating a mixed-reality world, varying degrees of reality and virtuality are significant. It is more difficult to bring virtual elements into real environmental settings (viz., outside a laboratory setting) than it is to bring a real environment object into a computergenerated virtual environment scene (e.g., using one’s own hand, fitted with a haptic input device, to grasp a virtual object). Effectively, AR is any scene or case in which the real environment is supplemented by using computer-generated graphics. While in a broader perspective, extended reality (XR) is used as an umbrella term for virtual reality, augmented reality, and mixed reality. Training in XR does not express a different outcome than training in a non-simulated, control environment which means the effects may be equal (Kaplan et al., 2021). Azuma’s (1997) monograph defines AR as a variation of virtual environments (VE) and provides detailed information on all key aspects of AR-based systems. VEs are more commonly referred to as virtual reality (VR). Users of VE technologies are fully immersed in a synthetic environment. An AR system supplements the real world with virtual (i.e., computer-generated) objects that appear to coexist in the same space as the real world. Azuma et al. (2001) defines AR systems as having the following properties: combine real and virtual objects in a real environment, run interactively, run in real time, and register (i.e., align) real and virtual objects with each other. AR is a machine vision and computer graphics technology that merges real and virtual objects into unified, spatially integrated scenes. Azuma (Azuma, 1997; Azuma et al., 2001) deconstructs all AR systems into three subsystems: scene generator, display device, and tracking–sensing device. He clearly defines AR as its own field of study due to AR’s unique blending of computer-generated worlds and the real world to form a new world for humans to function within. AR systems fall into either one of two categories (Feiner et al., 1997; Kalawsky et al., 2000). The categories are: optical-based technologies (see Figure 2.2) and video-based technologies (see Figure 2.3). Optical-based systems typically employ a

Augmented Reality as a Means of Job Task Training in Aviation

69

FIGURE 2.2  Simple schematic of an optical see-through HMD AR system. (Adapted from Azuma, 1997.)  

FIGURE 2.3  Simple schematic of a monitor-based AR system. (Adapted from Azuma, 1997.)  

headmounted display (HMD) that is comprised of see-through lenses that enable the user to see the real world with the virtual world projected on combiner lenses positioned in front of the eye. The combiner lenses are partially transmissive, enabling the user to look directly through them to see the real world. The user sees the virtual world superimposed over the physical view of the real world. Video-based systems use video cameras that provide the user a view of the real world. Video from these cameras is combined with the graphic images created by a scene-generating computer to blend the real and virtual worlds. The result is sent to the monitors in front of the user’s eyes in a closed-view HMD or to a traditional computer monitor. Fishkin et al. (2000) propose that AR-like systems have the potential to transform human–computer interaction as drastically as the GUI-transformed computing. They state that “the physical configuration of computational devices is a major determinant of their usability” (p. 75). The authors highlight that traditional physical interaction with computers is limited. Humans primarily interact with computers through a pointing device, display, buttons, or keys. This means that the human–computer

70

Human Factors in Simulation and Training

interaction is identified as the windows, icons, menus, and pointing devices (WIMP) approach (Shneiderman, 1998). Applying the uses of a piece of paper by humans as a metaphor for the human–computer interaction, humans use paper in numerous and varying ways while recording data, including writing, flipping, thumbing, bending, and creasing. Humans have developed dexterity, skills, and practices that are not brought fully to bear on computational device interfaces; human interaction with paper is more varied than typical human–computer interaction. Billinghurst and Kato (2002) provide an overview of the technologies associated with creating AR and some of the possible applications for enhancing collaborative work in educational settings. The authors use scenes from the movie Star Wars as a metaphor. In Star Wars, characters communicate with each other, across great distances, while observing computer-generated and projected three-dimensional (3-D) life-size virtual images. These images are superimposed on the real world. The authors cite these scenes as foreshadowing collaborative AR. They state that the long-term goal of AR research is to make it possible for the real world and virtual world to blend seamlessly together; real and virtual worlds would become indistinguishable from one another. Billinghurst et al. (2000) discuss a technology and its implications for collaborative learning through the use of AR. The authors developed “The MagicBook” to explore the use of AR to bring text-based books to life with virtual animations. The reader, or readers when used in a collaborative learning environment, read the book while looking through a handheld see-through display. The handheld see-through display is similar to a heads-up display in a fighter aircraft. As the reader observes pages, virtual 3-D avatars and images appear on the book page and act out scenes that are described in the text. This article illustrates the stunning technology available to transform two-dimensional (2-D) books into the “third dimension.” Neumann and Majoros (1998) provide a review of cognitive studies and analyses relating to how AR interacts with human abilities. They describe how these AR-enhanced human abilities may benefit manufacturing and maintenance tasks. The authors describe possible applications for AR and a prototype system designed to facilitate aviation worker training and performance of aviation maintenance tasks. They state that AR has a considerable effect on recall by establishing to-be-recalled items in a highly memorable framework; by using AR to develop scenes in an easyto-remember framework, AR can complement human information processing. This complement can reveal itself in training efficiency applicable to a wide variety of maintenance tasks. The authors provide a list of potential AR uses and state that the possible applications of AR are nearly limitless. Majoros and Neumann (2001) propose that AR can complement human information processing during the performance of aviation maintenance tasks (e.g., on-orbit maintenance procedures). They provide analysis of cognitive models that suggest that scenes merging real and synthetic features (i.e., AR) will complement human information processing by controlling attention, supporting short- and longterm memory, and aiding information integration. They state that applications of AR enable immediate access to information; immediate access to information is akin to an expert’s retrieval from short-term memory or well-encoded long-term memory.

Augmented Reality as a Means of Job Task Training in Aviation

71

Easy interaction with the design interface should allow rehearsals and stable links between graphics and the real world. Yeh and Wickens (2001) report their findings regarding an application of AR as a means of providing “intelligent cueing.” Intelligent cueing is the application of AR to a scene assumed to be important by a computer-based optical searching assistant. In their experiment, the authors used 16 participants actively serving in the US Army or US Marine Corps. Participants were presented with a high-definition virtual-reality scene of a desert environment. Virtual targets were placed into the scene and were observable by the participant and the computer-based optical searching assistant. Reliability of cueing was manipulated between 100% and 75% reliability to help the authors develop inferences regarding cue reliability and detection behavior (i.e., detection distance and accuracy). The researchers defined reliability as the degree of accuracy the cue provided to the participant as it pointed to the virtual object in the desert scene (e.g., cueing that is 75% reliable accurately points to the virtual target three out of four times). Unreliable cueing was found to induce the cognitive response of disuse of the cue. Reliable cueing was found to induce user reliance on the cue, or in some cases, overuse. Kalawsky et al. (2000, 1999) provide a brief background of AR to define terms and provide information on psychological and physiological aspects associated with AR. They highlight that AR does not have to be a purely visual augmentation; additionally, it may encompass the use of other sensory modalities. One of the sensory modalities highlighted is the use of 3-D sound to provide enhanced spatial awareness. The authors do make a key point that AR is not widely used due to technical problems associated with registering the virtual world to the real world. Registration is the process of creating one coherent scene. It is a difficult process outside static settings such as those found in laboratories. Poupyrev et al. (2002) report on their development of a “tile” system approach to implementing an AR environment. Each tile has a unique marker that a computerbased AR system can recognize and then use to render a virtual image as an overlay on a real-world scene. The authors positioned the tiles on a magnetic whiteboard to demonstrate an application for the rapid prototyping of an aircraft instrument panel. In addition to tiles replicating aircraft instruments, the authors included tiles with the functionalities of delete, copy, and help. These “functionality” tiles enabled the user to manipulate the AR environment in a manner similar to the way icons interact with the common GUI found on personal computers. Cooper et al. (2016) compared the performances of the three groups of regular training, virtual training, and virtual with augmented cues. Participants react faster at the virtual training, but augmented cues did not make a significant difference. In the scenario with augmented cues, however, participants performed the task with fewer errors than participants in the minimal cues training group. At the same time, some systematic differences in subjective ratings that reflected objective performance were also observed (Cooper et al., 2016). Several studies identify that the development of AR environments for training purposes is an inherently interdisciplinary pursuit (Macchiarella, 2004, 2005; Macchiarella et al., 2005a, b; Vincenzi et al., 2003). The design of an effective AR environment entails incorporating theories of

72

Human Factors in Simulation and Training

computer design, empirical research in several fields, the nature of human perceptual and cognitive systems, reasoning with diverse forms of information, human learning under varying situations, technology for presenting information to the human user, and getting information to and from the user and the computer in an effective manner. Developing an understanding of human abilities and complementary applications of AR to create mixed-reality worlds is an essential element in the design of any AR learning environment that complements human cognitive activity. Vincenzi and Gangadharan (2001) identify distinctive human abilities as being able to: • Detect meaningful stimuli and patterns; • Integrate information within and between sensory modalities (e.g., sight, sound, and smell as indicators of condition); • Compare information/events to standards; and • Perform qualitative judgments. They identify complementary applications of AR annotations as follows: • Tethering virtual annotations to real-world workpieces minimizes the need to search for information. • Virtual images can provide examples of correct conditions. • Markers or flags can direct attention to specific real-world work-piece features. • Virtual annotations can influence the users’ anticipation, (e.g., knowledge of possible defects with the real-world workpiece. • With input options, users can obtain the desired level of information detail for the work task. • Virtual objects offer an easy-to-use interface for recording work task steps. In a mega-analysis conducted by Santos et al. (2013), AR learning experiences had a widely variable effect on student performance from a small negative to large depending on the device, method, and scenario. Overall, AR incurs three fundamental advantages including real-world annotation, contextual visualization, and vision-haptic visualization (Santos et al., 2013). AR is a relatively new field within the computer science field of study, and its nature is inherently interdisciplinary. The concept of augmenting an observer’s perception of reality has age-old roots (Stapleton et al., 2002). Reality alteration or augmentation was, and still is, used by magicians and entertainers in the form of illusions and other gimmicks to bewilder, amaze, and entertain. The desired goal is to make people perceive ordinary objects in extraordinary ways. The modern development of computer-based AR has the ability to bewilder, amaze, and entertain. However, commercial applications of AR designed with the goal of improving education, training, and work task performance can create a new mixed-reality world inconceivable just a few decades ago. AR requires connecting reality with imagination to make people perceive ordinary objects in extraordinary ways. In order to integrate AR technologies into the specific training environment

Augmented Reality as a Means of Job Task Training in Aviation

73

will require a significant investment in the development of new training content, improved processes, and procedures using these new digital capabilities (Osborne & Mavers, 2019).

COGNITION AND AR Elaboration and Recall Ormrod (1999) and Haberlandt (1997) identify key aspects of elaboration and recall. The manner in which information is encoded and retained determines how easy it will be to retrieve for future use. Cues can be used to aid this retrieval immediately and for the long term from memory. Although not yet thoroughly tested, researchers have theorized that AR-based learning may inherently possess a great potential for facilitating retention of learned material to be retrieved later for real-world application during work tasks (Macchiarella, 2004; Majoros & Neumann, 2001; Valimont, 2002). AR-based learning can affect many more modalities of human senses than present learning paradigms. By complementing human associative information processing, and aiding information integration through multimodal sensory elaboration by the use of visual-spatial, verbal, proprioceptive, and tactile memory while the learner is performing a knowledge acquisition task, AR can enable increased elaboration during the time the learner participates in an AR-based learning environment (Bjork & Bjork, 1996; Majoros & Neumann, 2001; Vincenzi et al., 2003). Hypothesizing that the uses of text labels in AR scenes serve as cues for retrieval is consistent with the Tulving and Osler (1968) study. The study found that, when subjects studied a list of words with an associated mnemonic aid (i.e., cue word), they had a significantly higher level of recall as compared to a group that did not use a cue word. Applying the same principle to the AR environment, virtual text labels appearing on real-world objects serve as a word cue, or mnemonic, for the object. However, even though augmented cues have been shown to enhance performance and satisfaction in the transfer of virtual training, the effects may not be significant (Cooper et al., 2016). Elaboration is the process by which one expands upon new information by creating multiple associations among the incoming information. Stein et al. (1984) conducted research to determine the effectiveness of elaboration on immediate recall. They found that, when the elaborative cue was closely related to the to-be-recalled material (e.g., information to be recalled, the strong man read a book; the cue, about weight lifting), the learners displayed a significantly higher level of recall. With regard to educational practice and cues, Reigeluth (1999) defines four key elements of elaboration: selection, sequencing, synthesizing, and summarizing of the subjectmatter content. It draws from different sensory inputs and past information already held in long-term memory. In terms of learning procedural tasks, the learner focuses on sequential steps to help select and sequence acquisition of knowledge in a manner that will optimize attainment. Elaboration has been shown to greatly improve the encoding and retention of newly acquired information. When precise cues are

74

Human Factors in Simulation and Training

applied in AR scenes (i.e., virtual text annotations naming functions and components of a real-world object) higher levels of recall can be anticipated. Mayer (1992) provides a brief monograph of psychology theory and research. He begins with E. L. Thorndike’s work (1905) and concludes by citing contemporary authors who address the application of cognitive psychology in educational practice. He ties together developments in the fields of psychology and learning theory to show the origins of recent educational practice. The author concludes that the behaviorist influences in educational practice are waning, and educational practice based on cognitive psychology is prevailing. Research has shown that retrieval and recall of learned information is most effective when the similarities between the learning environment and the task environment are maximized. As participants percept knowledge from the AR training, actions and responses need to align with the expectations, and information should be given in an appropriate manner that is specific and realistic to the working scenario (Petersen & Stricker, 2015). Meaningful learning occurs when the learner has relevant prior knowledge to form a frame of reference from which to draw (Bjork & Bjork, 1996; Knowles, 1984; Stein et al. 1984). Elaboration, within domains of knowledge the users are familiar with, may be one of the key strengths associated with using AR in learning settings. In terms of elaboration and recall, AR may have the ability to facilitate the sequencing of ideas that will assist in learning cognitive and complex psychomotor tasks (e.g., isolating a fault in an aircraft electrical system).

Spatial Relations Spatial cognition (i.e., cognitive functioning that enables the ability to process and interpret visual information regarding the location of objects in an environment—often referred to as visual-spatial ability), relates the representation of spatial information (e.g., location) in memory. Spatial information has been found to be an extremely powerful form of elaboration for establishing associations in memory that facilitate recall. Researchers have found that spatial information is automatically processed when visual scenes are encoded into long-term memory (Lovelace & Southall, 1983; Neumann & Majoros, 1998; Pezdek & Evans, 1979). Pezdek and Evans (1979) conducted four experiments to assess the role of verbal and visual processing in memory for aspects of a simulated, real-world spatial display. Participants viewed a 3-D model of a city with 16 buildings that were placed on the display. The buildings were represented on the model with, or without, an accompanying name label on each building. The participants studied the display and subsequently were tested on recall and recognition of the building names, picture recognition of the buildings, and spatial memory for where the buildings were located within the model. Overall, picture recognition accuracy was low, and the presence of a name label on each building significantly reduced picture recognition accuracy but improved location recognition accuracy. The authors concluded that spatial location information was not encoded independently of verbal and visual identity information. In this study, labeling facilitated location identification accuracy. It did not significantly

Augmented Reality as a Means of Job Task Training in Aviation

75

affect visual recognition. The authors’ real-world spatial display (i.e., 3-D model) in several ways is comparable to the AR environment. AR environments are inherently 3-D in nature. The real-world objects occupy three dimensions of space, and the virtual component of the scene can be rendered to present a 3-D appearance. Saariluoma and Kalakoski (1997) conducted four experiments to test the effects of imagery on the long-term working memory of skilled chess players. The purpose of their experimentation was to gain insights on effects of visual and auditory inputs on game play. The authors hypothesized that the visual modality would have the most significant effect on how chess players form mental images of game play. The authors concluded that skilled imagery is built on long-term working-memory retrieval structures and that effective transformation of information between these retrieval structures and visual working memory is required to construct complex mental images. Expert chess players are better able to construct complex mental images of task-specific materials than less skilled chess players. Regardless of modality of information transmission, the chess move is transformed into visuospatial code and stored as such by the chess player. Participants in AR learning environments view scenes that contain both the real-world object being studied and the corresponding virtual overlay. It is reasonable to believe that these mixed-reality scenes are encoded into long-term memory as one integrated scene and, when the scene is recalled, a transformation to long-term working memory is required to construct mental images in the visual working memory. Should this effect prove true, participants in AR-based learning would demonstrate significantly higher levels of recall when compared to participants using traditional forms of learning. Nakamura (1994) describes research conducted to measure the effect, on recall, of different types of spatial relations. The spatial relations are grouped into three categories: scene-expected, scene-unexpected, and scene-irrelevant. The author’s findings contribute to the body of knowledge dealing with spatial relations with regard to attention and recall. When spatial scenes incorporate elements that are not naturally associated with the scene, viewer attention is drawn to the scene. When the scene contains multiple surprising but naturally occurring elements, the viewers demonstrate higher levels of recall. Application of these findings can facilitate learner recall in various training and educational settings. Phaf and Wolters (1993) report on four experiments they conducted to examine the processes that determine the effectiveness of rehearsal on long-term memory. They cite previous research that divided rehearsals into one of two categories: maintenance rehearsal and elaborative rehearsal. Maintenance rehearsal involves rote repetition of an item’s auditory representation. Elaborative rehearsal involves deep semantic processing of to-be-remembered items, resulting in the production of durable memories. The authors’ experiments led them to several conclusions. First, the effectiveness of a rehearsal depends on the degree of attentional processing applied to the material being rehearsed. Second, an important criterion for attentional processing seems to be the “novelty” of stimuli being rehearsed. Third, attention may result in faster learning because novel patterns may enable the development of new associations. These findings may affect instructional design; increasing attention during rehearsals could lead to higher levels of recall.

76

Human Factors in Simulation and Training

Pham and Venkataramani (1997) report on their investigation of the processes of source identification and its effect on effectual communication. The authors propose a framework that identifies four types of source identification processes: semantic cued retrieval, memory-trace refreshment, schematic inferencing, and pure guessing. They hypothesize that these processes are sequential in nature. The authors report on two experiments. They support their position that these processes occur in a contingent manner; their experimental cases all supported this position and were statistically significant. Additionally, the authors hypothesized that cued retrieval was the dominant process. Moreno and Mayer (1999) review previous research and report on their research regarding the learning effects associated with multimedia instructional designs that employ varying combinations of text, narration, images, and animation. They elaborate upon the contiguity principle that states: “The effectiveness of multimedia instruction increases when words and pictures are presented contiguously in time and space” (Moreno and Mayer, 1999, p. 385). The authors refine the contiguity principle into the temporal-contiguity effect and spatial-contiguity effect. The spatial-contiguity effect occurs when text and images are integrated into one visual scene. The temporal-contiguity effect occurs when visual and spoken materials are temporally synchronized. They conclude that, when learners are presented with a visual presentation that incorporates text or narration, narration has a more significant effect on the learner. The authors qualified their findings by calling for more research. In this experiment, they did not factor in individual differences in spatial ability, coordination ability, and experience. Waller (2000) conducted a multivariate study of relationships between several factors and the ability to acquire and transfer spatial knowledge from a VE. The author bisects spatial ability into related dimensions: spatial visualization and spatial orientation. Spatial visualization is the ability to manipulate figures mentally. Spatial orientation is the ability to account for changes in viewpoint. When both factors are psychometrically assessed as being higher in an individual, that individual demonstrated an increased ability to acquire spatial information from a VE. Additionally, proficiency with the VE’s interface was found to significantly affect performance measures of spatial knowledge. The author postulates that a likely explanation for this finding centers on user attention while engaged in spatial learning (i.e., effortful processing of the interface interferes with the user’s ability to learn in the environment). Waller’s research empirically demonstrates that measured spatial abilities correlate to the ability to learn from a VE, and additionally, that the degree of attention or level of difficulty associated with the user interface detracts from one’s ability to learn spatially. Replication of this study with an AR environment would quantitatively substantiate the position that AR inherently leads to efficiencies while learning due to its low-effort interface and attentional nature that creates spatial scenes for learning. Several studies have found that gender affects spatial ability, and males tend to have higher levels of spatial ability (Cutmore et al., 2000; Czerwinski, Tan, & Robertson, 2002; Hamilton, 1995; Waller, 2000). Cutmore et al. (2000) conducted research into cognitive factors affecting virtual navigation performance, while navigating within

Augmented Reality as a Means of Job Task Training in Aviation

77

a desktop computer-generated VE clearly describes the differences in spatial ability between males and females. Various cues were used as treatments to experimental groups (e.g., compass pointers, icons for association with locations, and icons for association with landmarks). Males acquired route knowledge from landmarks quicker than females. The specific cause of this difference is speculative. However, multiple studies substantiate its existence. Cutmore et al. (2000) make an important point that gender should be a factor when designing VE training environments. Further research into gender differences with regard to spatial ability is necessary for mixed-reality worlds. However, postulating that it exists is prudent. By its inherent nature, AR presents a visual-spatial dynamic that can be expected to enable learning advantages associated with spatial cognition that helps effective encoding of information into memory and facilitating recall, which is extremely important for the aviation industry where spatial information is vital (Kaufmann et al., 2003; Kaplan et al., 2021). Virtual text labels, or virtual overlays in general, become associated with the real-world object and encoded into memory as one visual image. Spatial cognition is an integral element of AR and human learning (Majoros & Neumann, 2001; Majoros and Neumann, 2001).

Memory Channels and AR AR interfaces affect more modalities of human senses than present learning paradigms (Bjork & Bjork, 1996; Macchiarella, 2004; Neumann & Majoros, 1998). AR is believed to complement human associative information processing by aiding information integration through multimodal sensory elaboration. Multimodal sensory elaboration occurs by utilizing visual-spatial, verbal, proprioceptive, audio, and tactile memory while the learner is encoding the information into long-term memory. This elaboration on the subject material may occur due to an increase of memory channels, enabling a greater chance for information to be encoded properly and retained in long-term memory. Effective encoding is key to the learner’s ability to recall information for application in a real-world environment. Mania and Chalmers (2001) studied the effects of immersion in a VE on recall and spatial perception. Several of their findings were inconclusive, but they did find a significant correlation between recall and environments that presented multimodal sensory elaboration as found in three different environments with corresponding inherent levels of immersion. The environments for the research comprised the real world and a virtual world, in which the subjects were fully immersed, and a virtual world created with a desktop computer, in which subjects were partially immersed. Their research found overall that relevant multimodal stimuli enhanced recall. Gamberini (2000) studied groups of subjects who were exposed to a fully immersive VE or a nonimmersive VE (i.e., a virtual world depicted within a real-world setting on a desktop computer). The researcher found that subjects in the nonimmersive group scored higher in the areas of recall for spatial and visual memories. He postulated that several factors affected this outcome. His key factor for consideration was that the nonimmersive environment is more familiar to subjects because they see both real-world and virtual-world objects. In an AR learning environment,

78

Human Factors in Simulation and Training

real-world objects (e.g., turbine engine aircraft oil pump) are presented to learners, and the learners can engage in learning in a multimodal sensory fashion. Multimodal sensory elaboration can create a framework of associations that aid recall and learning (Majoros & Neumann, 2001; Neumann & Majoros, 1998). Each association of a virtual object (e.g., virtual text label) with a real-world object serving as a workpiece is the basis for a link in memory that might not otherwise exist. Together these links (e.g., a visual arrangement of text callouts in an AR workpiece scene) may form a framework like that created when subjects use a classic mnemonic technique to remember a list of items. With this method, a subject associates items to be remembered with invented places or landmarks on an imaginary path (Neumann & Majoros, 1998; Yates, 1984). During recall, the subject “mentally walks” on the path, encounters a mental landmark, visualizes the item associated with the landmark (e.g., to-be-recalled item on a real-world workpiece), and then processes the to-berecalled item into working memory. AR has the potential to expand these mental landmarks to include multimodal sensory input that establishes multiple channels to the memory. Users of AR are provided a framework (i.e., the real world) that holds the items that will be recalled. This association and multimodal elaboration does not necessarily happen intentionally; it can occur as a by-product of the use of enhanced workpiece scenes (Neumann & Majoros, 1998).

KNOWLEDGE DEVELOPMENT AND TRAINING TRANSFER Reduced costs and increasing capabilities of computer-based technologies have initiated dramatic increases in the application of computer-delivered instruction such as computer-based training, web-based training, multimedia learning environments (Brown, 2001), virtual reality (Stone, 2001), and augmented reality (Majoros & Neumann, 2001). Computer-based training has become ubiquitous throughout the government, military, and commercial training associated with the aviation field. It typically gives the learner the loci of controls over instruction. Learner-controlled environments offer learners choices regarding practice level, time on task, and attention. However, computer-based training systems usually incur fidelity problems as the difference between reality and simulated environment (Gonzalez-Franco et al., 2017), which makes the transfer of training important. Transfer of training refers to how well learned skills and information can be applied in a different work setting. In the case of AR-based training, skills first acquired in a mixed-reality work setting would serve as training for subsequent skill application in the real world. Application of these skills could involve cognitive or psychomotor work tasks. In the future, the new mixed-reality world may redefine how workers are trained (Kalawsky, Stedmon, Hill, & Cook, 2000; Majoros & Neumann, 2001). The traditional training paradigm employs some form of training (e.g., computer-based tutorials, face-to-face instruction, self-study with printed manuals) prior to licensing or assignment to a work task. In this future mixed-reality world, AR may make some forms of training unnecessary or at least reduced in time and scope (Macchiarella, 2005; Macchiarella et al., 2005a, b; Majoros & Neumann, 2001; Vincenzi et al., 2003). Cognitive tasks normally associated with training could be performed for

Augmented Reality as a Means of Job Task Training in Aviation

79

the human by the AR system. This characteristic of AR may enable just-in-time training functions that occur simultaneously with work task performance. As an example, AR could provide scenes that are annotated with types of information that is customarily learned through training. This presentation of information could support humans in inspection tasks or enable them to perform work tasks that are rarely encountered and with little prior training. AR scenes, in the same manner as VR scenes, have the ability to direct learner attention and facilitate the acquisition of spatial knowledge regarding a real world or virtual world (Witmer et al., 1996). Virtual environments provide that symbolic media (e.g., a map or photograph) cannot provide. Witmer et al. (1996) conducted a study using undergraduate students at the University of Central Florida in conjunction with the US Army. Selected test participants rehearsed navigation through a building either using VE or photographs and maps. The participants using the VE rehearsal were significantly more accurate in their navigation of the real building. Additionally, the authors postulated that additional VR cues, tactile or aural, would enhance the participants’ gained knowledge and improve navigation through the building. The creation of an AR-based mixed-reality world, where the positive transfer information occurs with users, could enhance training environments. Waller et al. (2001) conducted research involving the effects of visual fidelity and individual differences on the ability to learn in a virtual environment, and subsequently transfer the learned knowledge to a real-world use. They found that the fidelity of the VE is less important when used to train tasks that do not require higherlevel cognitive processes. Additionally, the authors found individual differences, such as cognitive abilities and level of computer-use experience, did impact the transfer of training for virtual-to-real and real-to-virtual environments more than the fidelity of the simulation. Two possible positive effects can be inferred regarding AR and this research. First, AR can be designed to deliver information that normally is obtained through training, in effect reducing cognitive load and helping to mitigate differences in cognitive abilities while training. Second, the AR interface is intuitive, and typically does not require an interface device (i.e., trackball, joystick, etc.). The intuitive interface of AR may help mitigate differences in levels of computer-use skills. The users of AR look at a real-world object, and virtual scenes of information are automatically presented for use. Self-efficacy (i.e., people’s judgments of their capabilities to organize and execute courses of action necessary to attain designated types or levels of performances) (Bandura, 1986, 1997) is central to the success or failure that learners experience as they engage in the tasks necessary to attain knowledge in a given field. High selfefficacy helps create feelings of serenity or “peace of mind” as learners approach difficult tasks and activities that comprise decision-making and complex work tasks. Majoros and Neumann (2001) postulate that AR scenes may support self-efficacy by creating an environment where the learner, or user of AR, has the loci of controls over their learning environment. A high level of individualized control for the learning situation has a positive effect on learning (Ormrod, 1999; e.g., allow users to invoke an AR scene with virtual “paste and copy” to keep information accessible while conducting a real-world work task).

80

Human Factors in Simulation and Training

With regard to concurrent training and performance, AR enables learning experiences where users train for tasks in a manner that identically replicates performance of the task in the real-world environment; this type of a “real-world” training environment has shown to provide advantages regarding transfer of knowledge and training (Majoros & Neumann, 2001). Rose et al. (2000) empirically ascertained that VEs do transfer training as effectively as real-world training. They also highlight that three main factors influence interference between concurrent task learning: task similarity, practice, and task difficulty. Regarding task similarity, the authors concluded that the extent of interference between two separate tasks is dependent on the degree they share a stimulus modality (e.g., visual, auditory, and tactile) and whether they rely on the same memory coding processes (e.g., verbal and visual). Rose et al. (2000) cite research by Sullivan (1976) as corroborating their position that concurrent tasks are impaired when the difficulty of the tasks is increased. They differentiate between performance that is resource limited (i.e., dependent on the mental processing resources available to devote to the task) and data limited (i.e., dependent on external stimulus quality—instructions, notes, cues, etc.). In both cases of performances, both resource limited and data limited, AR has the potential to enhance concurrent training by delivering annotated work scenes that reduce mental work loads through virtual text callouts, equipment diagrams, and instructions with step-by-step sequencing. As AR training environments mature, creation of just-in-time or concurrent training may be feasible (Majoros & Neumann, 2001). One objective of future applications of AR may be to provide annotated visual scenes that supplant the need for certain aspects of training. This substitution for training would occur by providing AR-delivered information to the user, during work task performance, in lieu of the user recalling work task steps from long-term or working memory.

WHAT IS THE FUTURE OF JOB TRAINING – TRAINING ON THE JOB LITERALLY? Applications of AR can enable learning environments embedded in the real world and make the real world part of the computer interface (see Figure 2.4). Future applications of virtual environments can take the form and function of a mixed-reality world with hypertext linking to vast resources of information and instructional content. The visual nature of the AR scenes is, in many ways, analogous to a GUI in the mind’s eye. AR may have a positive effect on recall by enticing elaboration through the creation of multiple associations between the real-world object being studied and the to-be-learned virtual information (Macchiarella, 2004; Valimont, 2002). In this new mixed-reality world, multimodal sensory elaboration can create a framework of associations that aid recall and learning. Each association of a virtual object (e.g., virtual text label) with a real-world object could serve as a basis for a link in memory that might not otherwise exist. Together these links (e.g., a visual arrangement of text callouts in an AR workpiece scene) may form a framework like that created when students use a mnemonic technique to remember a list

Augmented Reality as a Means of Job Task Training in Aviation

81

FIGURE 2.4  AR-aided inspection of an aircraft elevator.  

FIGURE 2.5  AR scene with instructions for servicing a turbine engine oil pump.

of items (see Figure 2.5). With this method, in the mind’s eye, a student would associate items to be remembered with places or landmarks after viewing mixed-reality images of the studied item. During recall, the student “mentally walks” on the path; encounters a mental landmark, visualizes the item associated with the landmark (e.g., to-be-recalled aspect of a real-world workpiece), and then processes the to-berecalled item into working memory. AR has the potential to expand these mental landmarks to include multimodal sensory input that establishes multiple channels to the memory. Users of AR are provided a framework (i.e., the real world) that holds the items that will be recalled. This association and multimodal elaboration does not

82

Human Factors in Simulation and Training

necessarily happen intentionally; it can occur as a by-product of the use of enhanced workpiece scenes. Another facet is the application of mobile augmented reality learning environments as people spend more time on their mobile devices. The technological, theoretical, and assessment challenges for mobile-based AR need to be addressed for mobile augmented reality learning environment to fulfill its potential (Ifenthaler & Eseryel, 2013). Transfer of training refers to how effectively learned skills and information can be applied in a work setting. In the case of AR-based training, skills first acquired in a mixed-reality work setting would serve as training for subsequent skill application in the real world. Application of these skills could involve cognitive or psychomotor work tasks. In the future, the new mixed-reality world may redefine how workers are trained (Kalawsky et al., 2000; Majoros & Neumann, 2001). The traditional training paradigm employs some form of training (e.g., computer-based tutorials, face-toface instruction, self-study with printed manuals) prior to licensing or assignment to a job task. In this future mixed-reality world, AR may make some forms of training unnecessary or at least reduced in time and scope (Macchiarella et al., 2005a, b; Macchiarella & Haritos, 2005). Cognitive tasks normally associated with training could be performed for the human by the AR system. This characteristic of AR may enable just-in-time training functions that occur simultaneously with job task performance. As an example, AR could provide scenes that are annotated with types of information that is customarily learned through training. This presentation of information could support humans in inspection tasks or enable them to perform job tasks that are rarely encountered and with little prior training. AR has the potential to transform computing as drastically as the GUI-transformed computing (Fishkin et al., 2000; Vincenzi et al., 2003). Physical configuration of computational devices is a major determinant of their usability. Despite the rapid advances in computing technology afforded by exponential increases in computational power, humans still interact with computers in a very limited manner. The mode of interaction available for humans with computers primarily consists of a keyboard and a pointing device. In most cases, the pointing device is a mouse, and humans are limited to pointing, dragging, and drawing. When contrasting the various ways humans interact with each other (e.g., speaking—actual meaning of words; speaking—use of tone, listening, touching, gesturing, etc.), human interaction with computers is relatively simple (Alessi & Trollip, 2001). As researchers, computer scientists, and practitioners of AR solve the technological issues associated with using AR in real time and in the real world, AR-based human and computer interaction can become more like human-to-human interaction and engage more human modalities. The movie Minority Report (Frank & Cohen, 2001) portends human interactions with computers in an insightful and powerful way. The film depicts numerous applications of AR. Police officers interact with computer-generated images from human minds through a wall-sized interface device they manipulate with speech and touch. The officers can tear virtual media from the display, move media around, change view aspects, and generally use the virtual media in the real world as if it were a real-world object. Another interesting application of AR in the movie is

Augmented Reality as a Means of Job Task Training in Aviation

83

for marketing and sales purposes. Pedestrians walk past scanners and receive an iris scan that positively identifies them. This application of biometric identification enables a computer to generate a holographic 3-D salesperson that is implanted into the real world as an AR feature. The 3-D salesperson makes a personalized sales presentation to the pedestrian. The movie presents many other innovative examples for applications of AR.

CONCLUSION As the computational power of computers continues its rapid advance, as prophesied by Moore (1965), developers of AR-based training, during the upcoming decades, will have the opportunity to create AR workstations that are portable and powerful. These portable and powerful workstations can enable AR in the real-world work settings of the aerospace industry. AR has the potential to positively affect training by enabling higher levels of recall and just-in-time training functions. Training could occur in the actual work setting and at times simultaneously with job task performance. The net positive effect resulting from the use of AR as a learning medium may derive from the learners’ ability to mentally match augmented information directly with the workpiece in front of them; future research is required to fully ascertain these effects on the cognitive activities associated with job tasks.

REFERENCES Alessi, S. M., & Trollip, S. R. 2001. Multimedia for Learning: Methods and Development (3rd ed.). Boston: Allyn and Bacon. Aviation Technician Education Council. 2020. Pipeline Report & Aviation Maintenance School Directory. https://www​.atec​-amt​.org​/uploads​/1​/0​/7​/5​/10756256​/atec​ -pipelinereport​-truncated​-20200416​.pdf Azuma, R. T. 1997. A Survey of Augmented Reality. Presence, 6(4), 355–385. Azuma, R. T. 2004. Overview of Augmented Reality. Proceedings of the Conference on SIGGRAPH 2004. Los Angeles, CA. Azuma, R. T., Baillot, Y., Behringer, R., Feiner, S., Julier, S., & MacIntyre, B. 2001. Recent Advances in Augmented Reality. IEEE Computer Graphics and Applications, 21(6), 34–47. Bandura, A. 1986. Social Foundations of Thought and Action: A Social Cognitive Theory. Englewood Cliffs, NJ: Prentice Hall. Bandura, A. 1997. Self-Efficacy: The Exercise of Control. New York: Freeman. Billinghurst, M., & Kato, H. 2002. Collaborative Augmented Reality. Communications of the ACM, 45, 64–70. Billinghurst, M., Kato, H., & Poupyrev, I. 2000. ARToolKit User’s Manual. Seattle, WA: University of Washington. Billinghurst, M., Kato, H., & Poupyrev, I. 2001. The MagicBook—Moving Seamlessly Between Reality and Virtuality. Computer Graphics and Applications, 21(3), 2–4. Bjork, R. A., & Bjork, E. L. (Eds.). 1996. Memory. San Diego, CA: Academic Press. Boeing. 2020. Pilot and Technician Outlook 2020–2039. https://www​.boeing​.com ​/resources​/ boeingdotcom ​/market ​/assets​/downloads​/2020​_ PTO​_ PDF​_Download​.pdf Brown, K. G. 2001. Using Computers to Deliver Training: Which Employees Learn and Why. Personnel Psychology, 54(2), 271–296.

84

Human Factors in Simulation and Training

Cooper, N., Milella, F., Cant, I., Pinto, C., White, M., & Meyer, G. 2016, September. Augmented Cues Facilitate Learning Transfer from Virtual to Real Environments. In 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMARAdjunct) (pp. 194–198). IEEE. Cutmore, T. R. H., Hine, T. J., Maberly, K. J., Langford, N. M., & Hawgood, G. 2000. Cognitive and Gender Factors Influencing Navigation in a Virtual Environment. International Journal of Human-Computer Studies, 53, 223–249. Czerwinski, M., Tan, D., & Robertson, G. 2002. Women Take a Wider View. Paper presented at the ACM, SIGCHI, Conference on Human Factors and Computing Systems [Spatial Cognition], Minneapolis, MN. Dunleavy, M., & Dede, C. 2014. Augmented Reality Teaching and Learning. In J. Spector, M. Merrill, J. Elen, & M. Bishop (Eds.), Handbook of Research on Educational Communications and Technology, 735–745. New York: Springer. Feiner, S., MacIntyre, B., & Hollerer, T. 1997. A Touring Machine: Prototyping 3D Mobile Augmented Reality Systems for Exploring the Urban Environment. Proceedings of the International Symposium on Wearable Computing, 74–81. Fishkin, P., Gujar, A., Harrison, B., Moran, T., & Want, R. 2000. Embodied User Interfaces for Really Direct Manipulation. Communications of the ACM, 43(9), 75–80. Frank, S., & Cohen, J. (Writer). 2001. Minority Report. Hollywood, CA: Twentieth Century Fox and Dream works LLC. Gamberini, L. 2000. Virtual Reality as a New Research Tool for the Study of Human Memory. CyberPsychology and Behavior, 3(3), 337–342. Gonzalez-Franco, M., Pizarro, R., Cermeron, J., Li, K., Thorn, J., Hutabarat, W., Tiwari, A., & Bermell-Garcia, P. 2017. Immersive Mixed Reality for Manufacturing Training. Frontiers in Robotics and AI, 4, 3. Haberlandt, K. 1997. Cognitive Psychology (2nd ed.). Needham Heights, MA: Allyn and Bacon. Hamilton, C. 1995. Beyond Sex Differences in Visuo-Spatial Processing: The Impact of Gender Trait Possession. British Journal of Psychology, 86(1), 1–20. Ifenthaler, D., & Eseryel, D. 2013. Facilitating Complex Learning by Mobile Augmented Reality Learning Environments. In R. Huang, J. M. Spector, & Kinshuk (Eds.), Reshaping Learning (pp. 415–438). Berlin, Heidelberg: Springer. Kalawsky, R., Stedmon, A. W., Hill, K., & Cook, C. 2000. A Taxonomy of Technology: Defining Augmented Reality. Paper presented at the Human Factors and Ergonomics Society Annual Meeting, Santa Monica, CA. Kaplan, A. D., Cruit, J., Endsley, M., Beers, S. M., Sawyer, B. D., & Hancock, P. A. 2021. The Effects of Virtual Reality, Augmented Reality, and Mixed Reality as Training Enhancement Methods: A Meta-Analysis. Human Factors, 63(4), 706–726. Kaufmann, H. 2003. Collaborative Augmented Reality in Education. Institute of Software Technology and Interactive Systems, Vienna University of Technology. Knowles, M. 1984. The Adult Learner: A Neglected Species (3rd ed.). Houston: Gulf Port Publishing. Lee, K. 2012. Augmented Reality in Education and Training. TechTrends, 56(2), 13–21. Lovelace, E. A., & Southall, S. D. 1983. Memory for Words in Prose and Their Locations on the Page. Memory and Cognition, 11(5), 429–434. Macchiarella, N. D. 2004. Effectiveness of Video-Based Augmented Reality as a Learning Paradigm for Aero space Maintenance Training. Dissertation Abstracts International, 65(09), 3347A, (UMI No. 3148420). Macchiarella, N. D. 2005. Augmenting Reality as a Medium for Job Task Training. Journal of Instruction Delivery Systems, 19(1), 21–24.

Augmented Reality as a Means of Job Task Training in Aviation

85

Macchiarella, N. D., Gangadharan, S. N., Liu, D., Majoros, A. E., & Vincenzi, D. A. 2005a. Application of Augmented Reality for Aerospace Maintenance Training. Proceedings of the 11th International Conference of Human Computer Interaction. Las Vegas, NV, CD-ROM, 1–5. Macchiarella, N. D., Gangadharan, S. N., Liu, D., Majoros, A. E., & Vincenzi, D. A. 2005b. Augmenting Reality as a Training Medium for Aviation/Aerospace Applications. Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting. Orlando, FL, 2174–2178. Macchiarella, N. D., & Haritos, T. 2005. A Mobile Application of Augmented Reality for Aerospace Maintenance Training. Proceedings of the 24th Digital Avionics Systems Conference, Avionics in a Changing Market Place: Safe and Secure. Washington, DC, 5.B.3–1—5.B.3–9. Majoros, A., & Boyle, E. 1997. Maintainability. In G. Salvendy (Ed.), Handbook of Human Factors and Ergonomics (2nd ed., pp. 1569–1592). New York: John Wiley. Majoros, A., & Neumann, U. 2001. Support of Crew Problem-Solving and Performance with Augmented Reality. Galveston, TX: Bioastronautics Investigators’ Workshop. Mania, K., & Chalmers, A. 2001. The Effects of Levels of Immersion on Memory and Presence in Virtual environments: A Reality Centered Approach. CyberPsychology and Behavior, 4(2), 247–264. Mayer, R. 1992. Cognition and Instruction Their Historic Meeting Within Educational Psychology. Journal of Educational Psychology, 84(4), 405–412. Milgram, P., & Kishino, F. 1994. A Taxonomy of Mixed Reality Visual Displays. IEICE Transactions Information Systems, E77-D(12), 1321–1329. Moore, G. E. 1965. Cramming More Components Onto Integrated Circuits. Electronics, 38(8), 1–4. Moreno, R., & Mayer, R. 1999. Cognitive Principles of Multimedia Learning: The Role of Modality and Contiguity. Journal of Educational Psychology, 91(2), 358–368. Nakamura, G. 1994. Scene Schemata in Memory for Spatial Relations. American Journal of Psychology, 107(4), 481–497. Neumann, U., & Majoros, A. 1998. Cognitive, Performance, and System Issues for Augmented Reality Applications in Manufacturing and Maintenance. Proceedings of IEEE the Virtual Reality Annual International Symposium (VRAIS), 4–11. Ormrod, J. 1999. Human Learning (3rd ed.). Upper-Saddle River, NJ: Prentice-Hall. Osborne, M., & Mavers, S. 2019, October. Integrating Augmented Reality in Training and Industrial Applications. In 2019 Eighth International Conference on Educational Innovation through Technology (EITT) (pp. 142–146). IEEE. Ott, J. 1995. Maintenance Executives Seek Greater Efficiency. Aviation Week and Space Technology, 142, 2. Pezdek, K., & Evans, G. W. 1979. Visual and Verbal Memory for Objects and Their Spatial Locations. Journal of Experimental Psychology: Human Learning and Memory, 5(4), 360–373. Petersen, N., & Stricker, D. 2015. Cognitive Augmented Reality. Computers & Graphics, 53, 82–91. Phaf, R., & Wolters, G. 1993. Attentional Shifts in Maintenance Rehearsal. American Journal of Psychology, 106(3), 353–382. Pham, M., & Venkataramani, J. 1997. Contingent Processes of Source Identification. Journal of Consumer Research, 24(3), 249–266. Poupyrev, I., Tan, D., Billinghurst, M., Kato, H., Regebrecht, H., & Tetsutani, N. 2002. Developing a Generic Augmented-Reality Interface. Computer Magazine, 35(3), 44–50.

86

Human Factors in Simulation and Training

Reigeluth, C. M. 1999. The Elaboration Theory: Guidance for Scope and Sequence Decisions. In C. M. Reigeluth (Ed.), Instructional-Design Theories and Models: A New Paradigm of Instructional Theory. (Vol. II). Hillsdale, NJ: Lawrence Erlbaum Assoc. Rose, F. D., Attree, B. M., Brooks, D. M., Parslow, D. M., Penn, P. R., & Ambihaipahan, N. 2000. Training in Virtual Environments: Transfer to Real World Tasks and Equivalence to Real Task Training. Ergonomics, 43(4), 494–511. Saariluoma, P., & Kalakoski, V. 1997. Skilled Imagery and Long-Term Working Memory. American Journal of Psychology, 110(2), 177–202. Santos, M. E. C., Chen, A., Taketomi, T., Yamamoto, G., Miyazaki, J., & Kato, H. 2013. Augmented Reality Learning experiences: Survey of Prototype Design and Evaluation. IEEE Transactions on Learning Technologies, 7(1), 38–56. Shneiderman, B. 1998. Designing the User Interface, Strategies for Effective HumanComputer Design (3rd ed.). Reading, MA: Addison-Wesely. Stapleton, C., Hughes, C., Moshell, M., Micikevicius, P., & Altman, M. 2002. Applying Mixed Reality to Entertainment. Computer, 35(12), 122–124. Stedmon, A. W., Hill, K., Kalawsky, R. S., & Cook, C. A. 1999. Old Theories, New Technologies: Comprehension and Retention Issues in Augmented Reality Systems. Proceedings of the 43rd Annual Meeting of the Human Factors and Ergonomics Society. Santa Monica, CA, 1050–1054. Stein, B., Littlefield, J., Bransford, J., & Persampieri, M. 1984. Elaboration and Knowledge Acquisition. Memory and Cognition, 12(5), 522–529. Stone, R. 2001. Virtual Reality for Interactive Training: An Industrial Practitioner’s Viewpoint. International Journal of Human–Computer Studies, 55(4), 699–711. Sullivan, L. 1976. Selective Attention and Secondary Message Analysis: A Reconsideration of Broadbent’s Filter Model of Selective Attention. Quarterly Journal of Experimental Psychology, 28, 167–178. Thorndike, E. L. 1905. The Elements of Psychology. London: Routledge and Kegan. Tulving, E., & Osler, S. 1968. Effectiveness of Retrieval Cues in Memory for Words. Journal of Experimental Psychology, 77(4), 593–601. US General Accounting Office. 2003. FAA Needs to Update the Curriculum and Certification Requirements for Aviation Mechanics. Washington, DC: United States General Accounting Office. Valimont, B. 2002. The Effectiveness of an Augmented Reality Learning Paradigm. Daytona Beach, FL: Embry-Riddle Aeronautical University. Vincenzi, D., & Gangadharan, S. 2001. Project Proposal Collaborative Research on Augmented Reality. Daytona Beach, FL: Embry-Riddle Aeronautical University. Vincenzi, D. A., Valimont, B., Macchiarella, N. D., Opalenik, C., Gangadharan, S., & Majoros, A. 2003. The Effectiveness of Cognitive Elaboration Using Augmented Reality as a Training and Learning Paradigm. Proceedings of the Human Factors and Ergonomics Society 47th Annual Meeting. Denver, CO, 2054–2058. Waller, D. 2000. Individual Differences in Spatial Learning from Computer-Simulated Environments. Journal of Experimental Psychology, 6(4), 307–321. Waller, D., Knapp, D., & Hunt, E. 2001. Spatial Representations of Virtual Mazes: The Role of Visual Fidelity and Individual Difference. Human Factors, 43(1), 147–158. Witmer, B., Baily, J., & Knerr, B. 1996. Virtual Spaces and Real World Places: Transfer of Route Knowledge. International Journal of Human–Computer Studies, 45(4), 413–428. Yates, F. A. 1984. The Art of Memory. London: Routledge and Kegan Paul. Yeh, M., & Wickens, C. 2001. Display Signaling in Augmented Reality: Effects of Cue Reliability and Image Realism on Attention Allocation and Trust Calibration. Human Factors, 43(3), 355–365.

3 Flight Simulators Civil Aviation and Training Ronald J. Lofaro and Kevin M. Smith CONTENTS Introduction............................................................................................................... 88 An Overview of Civil Aviation Training, Flight Simulators, and the Human Factors Therein................................................................................................90 Introduction......................................................................................................90 Flight Simulator Basics....................................................................................92 Human Factors and Flight Simulators............................................................. 93 Major Drivers in Civil Aviation Pilot/Crew Training...................................... 95 Flight Simulators and Flight Training Devices.........................................................96 Overview..........................................................................................................97 Flight Simulators and Training........................................................................97 Flight Training Devices and Training.............................................................. 98 Flight Simulator and FTD Assessment............................................................99 SFAR 58, AND AQP............................................................................................... 100 History ........................................................................................................... 100 SFAR/AQP: Overview and Synopsis............................................................. 100 SFAR 58......................................................................................................... 101 AQP, LOS/LOFT, and Simulators................................................................. 102 Line-Oriented Flight Training (LOFT)................................................................... 102 Background.................................................................................................... 102 Maximizing LOFT: The Mission Performance Model and the Operational Decision-Making Paradigm........................................................................... 104 LOFT: Current and Future............................................................................. 104 Risk Identification and Management: Training and Evaluation with MPM and ODM Paradigm................................................................. 104 Mission Performance Model.......................................................................... 105 From CRM/MPM to ODM............................................................................ 107 Operational Risk Management and Decision-Making............................................ 109 Optimizing Performance during Complex Operations.................................. 109 Introduction.................................................................................................... 109 The Overall Mission Continuation Decision................................................. 110 Determining Risk........................................................................................... 111

DOI: 10.1201/9781003401353-3

87

88

Human Factors in Simulation and Training

Operational Envelope..................................................................................... 112 The Unstable, Missed Approach Decision.............................................................. 115 Bayesian Probability...................................................................................... 115 Problem-Solving under Conditions of Uncertainty....................................... 116 The Takeoff “Go/No-Go” Decision............................................................... 117 Unexpected Operational Difficulties............................................................. 118 Optimizing the Decision Function................................................................. 119 Operational Analysis of the Takeoff Decision............................................... 120 Conclusion..................................................................................................... 120 ODM and MPM in LOFT Design, Development, and Evaluation......................... 121 Introduction.................................................................................................... 121 LOF: DOM and MRM................................................................................... 122 LOFT Design: Another Approach................................................................. 124 Training.......................................................................................................... 126 References............................................................................................................... 127 Federal Aviation Administration Advisory Circulars (Ac) and Regulations........... 128

INTRODUCTION The use of flight simulators for civil pilot/crew training and performance evaluation has evolved to the point where they have become key and indispensable tools for air carrier education—and will continue to be so. This is due to a conjunction of factors: safety, cost, simulator fidelity, and fairly recent changes and additions to the federal aviation regulations (FARs). A short bit of explanation on the Federal Aviation Administration (FAA) and FARs follows. The FAA regulates aviation in the US, from air traffic to civil aviation security, the operation of air carriers, pilot training, and more. The federal regulations covering aviation are all found in Combined Federal Regulations (CFR), Title 14, Aeronautics and Space; these are commonly referred to as the FARs. The FARs are divided into parts (1 through 199) and each part has a descriptive title and is a specific and detailed regulation. For example, CFR 14, Part 121 is the FAR titled Operating Requirement: Domestic Flight and Supplemental Operations. The FARs are usually referred to by part number; this one would just be called Part 121. As an aid to aviation, the FAA normally published advisory circulars (ACs) that relate to specific FARs. The ACs are designed to provide assistance and guidelines in complying with FARs. In fact, these ACs, which often are many times longer than the length of the FAR, are written in meticulous detail and serve as a “how to” template. The ACs are titled, numbered, and grouped by such areas as air traffic control, civil and general aviation security, pilot training, and so on. For example, AC 120-45A is the AC titled Airplane Flight Training Device Qualification. The 120 specifies the general area of air carrier and commercial operations and helicopters, whereas the 45 refers to the fact that this is the 45th AC under the 120 area. The “A” means it is the first revision.

Civil Aviation

89

For the remainder of the chapter, we will refer to the FARs by part number only, and the same for ACs. A caveat: We will, somewhat extensively, cite from FARs and ACs. As these are large documents, and we have space limitations, we will sometimes omit sentences within a relevant portion of a citation. We may, occasionally, rather than use a long citation, synopsize what we are referring to. (For the interested reader, the complete FARs and ACs can be found on the FAA website: http:// www​.faa​.gov) We will begin this chapter with an overview of flight simulators (FS) and their use in the air carrier arena. From there, this chapter will proceed to a brief look at the human factors issues and problems in civil aviation’s use of flight simulators. These are, in the main, the same issues and problems of flight simulator use in any environment: fidelity, part-task trainers, transfer of training, motion axes, transition training, and the like. In addressing the human factors of civil aviation, we recognize that the hexapod axial motion bases and the veridicality of the FS to the actual flying environment are what have led to the use of the FS as THE major pilot/crew training, evaluation, and certification tool. As one example, flight simulators today have assumed such an important place in air carrier training that an aircraft-type rating can be obtained (almost) entirely in a flight simulator. Advances in the capabilities of this generation of flight simulator have led to modified FAA’s classification schemas (levels) for airplane flight simulators and flight training devices (FTDs) and to the first new pilot/crew training effort in over 30 years, the Advanced Qualification Program (AQP). Our HF focus will be on training, and the models for training and evaluating the skills we see as paramount for the pilot/crew in accomplishing risk identification and risk management. Thus, the chapter will deal extensively with the use and maximization of line-oriented flight training (LOFT). The Air Transport Association of America (ATA) and the FAA have worked on the best ways to design, develop, and implement LOFT and line-oriented evaluation (LOE) in flight simulators, as shown by the relevant FARs and ACs. The result is that LOFT, which is done entirely in a flight simulator, has become, over the past 15–17 years, what could be called the “crown jewel” of air carrier pilot/crew training. LOFT and LOE received initial impetus from their use in crew resource management (CRM) training. However, CRM in the mid-1990s, encountered problems, with one resultant being the “Big Three” US flag air carriers (American, Delta, and United) almost completely revamping and renaming their CRM programs (Aviation Week and Space Technology, 1996). LOFT, always recognized as the key element in CRM, has thus become more and more recognized as the independent and indispensable training and evaluation tool for pilot/crew performance. The FAA’s emphasis on air carriers’ going to a new training and certification paradigm, the above-mentioned Advanced Qualification Program as spelled out in (Special FAR) SFAR 58, has further enhanced the role of LOFT and LOE. Lastly, this chapter will deal with LOFT design, development, and use in crew performance and evaluation, using the mission performance model and the operational decision-making (ODM) paradigm. We will lay out the use of LOFT for

90

Human Factors in Simulation and Training

learning/practice in operational decision- making, which results in risk assessment and reduction. This, the authors have long considered as the key functions for any airline captain and crew.

AN OVERVIEW OF CIVIL AVIATION TRAINING, FLIGHT SIMULATORS, AND THE HUMAN FACTORS THEREIN Introduction There are three somewhat obvious benefits from the use of a flight simulator in training. These are the underpinning of the now-extensive use of flight simulators in general aviation, civil aviation, and military aviation. Briefly put, they are: 1. Cost reduction and increased efficiency by replacing the real system, the plane, with the flight simulator. 2. Reduction in the hazards of training in the plane. The loss of life and injuries that result from training accidents and incidents are well-documented. 3. The ability to train skills and performances that cannot be trained in the aircraft, such as malfunctions and adverse conditions and, more important, missions/tasks that may never be performed in real word operations but are essential components of the operational mission profile of the aircraft (cf. Flexman & Stark, 1987). The extensive use of flight simulators in the civil aviation world has resulted from both recognizing these benefits and a confluence of other causes. The first, as said, is that the flight simulator is a safe environment, putting neither crew nor planes at risk. However, although crew and aircraft safety is foremost, looking further we see the following set of converging vectors: In the formative years, and continuing well into the late 1980s and early 1990s, both civil aviation and the FAA relied on ex-military (mainly active-duty personnel who left after fulfilling their commitment, but also some retirees in their 40s) for a source of experienced manpower. In civil aviation, these former military included pilots and aviation maintenance technicians (AMTs). The same was true in the FAA, where there was, perhaps, a higher emphasis placed on the ex-military pilot, who could be placed in the flight standards, aircraft, and pilot certification areas without missing a beat. The rationale for the aviation industry seeking (and welcoming) military personal is both obvious and subtle. It is obvious that the military was a source of highly trained and qualified personnel. Plus, their training was both extensive and standardized. It is also true that there is a “brotherhood of airman,” where inclusion is highly dependent on airmen background, experiences, training, and even common acquaintances. Add to these factors a common “language,” one that is technical, acronym-laden, and replete with idiomatic expressions. All of the above are still active (albeit to a lesser degree) today. One result of the influx of military personnel was the use of the flight simulator as the major training tool for aircrew.

Civil Aviation

91

In passing, it must be remembered that the earliest flight simulator, with replicated cockpit instrumentation, controls, and most aspects of flight built by Edward A. Link in 1929, was a generic flight simulator for general aviation (GA) use. This Link trainer, whose picture we have all seen (it looks like a child’s drawing of a plane mounted on a base), soon evolved into an instrument flight trainer as a result of World War II. From there, the great advances in flight simulators had to do with its use in military training. The history of the early flight simulator development of both capability and capacity for training was for use in training military pilots/air crew. It was natural, as many of the early air carrier and FAA personnel had military backgrounds, that the use of the flight simulator in training and certification became paramount. Add to this a piece of reality that is often overlooked: The air carriers simply do not have enough planes to take significant or even small item numbers out of service for pilot training. As one example, the largest US flag carriers own/lease more than 300 planes each. Unlike the military, which has large numbers of aircraft that are dedicated only to training pilots, the air carriers must use their planes, in the main, to generate income. Considering that the air carriers run 24/7 schedules, it is apparent that the flight simulator is, must be, and will continue to be the training and certification tool in civil aviation, as the planes owned or leased by the carriers are, and must be used, to generate revenue. To resume: This military use of the flight simulator drove much of early flight simulator technology as aircraft become more complex, more automated, and achieved higher performance. On the civil aviation front, the air carriers began to insist that the delivery of a flight simulator for a new type aircraft be simultaneous, or even before, the new aircraft type was put in everyday service. Similarly, the civil side of aviation demanded more and more simulator capability (fidelity and veridicality). The major air carriers used their pilot and crew training facilities to house a growing number of flight simulators. Companies that made flight simulators worked closely with both the airframe manufacturers and the air carriers to design and deliver flight simulators that met the changing needs of pilot training and certification. Today, we have civil aviation flight simulators for such as the B-757/767 and 777 which cost upwards of $35 million and cost in the thousands of dollars per hour when in operation. These operational costs include maintenance, the simulator operator, and the A/C needed to maintain the temperature in the flight simulator faculty at a level that does not impact the highly sophisticated computers that drive the flight simulator. The final vector, or piece, of the equation has been the technological advances in flight simulator capability during the 1980s and 1990s. The advances in the fidelity of the visual scene presented to the crew, as well as the fidelity of the response of flight simulator instruments (be they “glass” or “steam”) to inputs by the crew are outstanding examples of this fidelity. In summation, through a convergence of causal vectors, the period from approximately 1970 to today has seen the emergence of the flight simulator as the essential tool for pilot training and certification in the civil area.

92

Human Factors in Simulation and Training

Flight Simulator Basics The primary functions of any flight simulator (i.e., the functional definitions, as it were) are:

1. To present information that the real system would present for the purpose of training 2. To provide a practice environment that facilitates and enhances the skills and knowledge of the pilot and thus provides learning that enhances performance in the real system, the airplane Put into other terms, a flight simulator is a system designed to “imitate” the functions of another system (a plane) in a real operational environment and to be a realistic substitute that responds realistically to flight crew inputs. The key here is that a flight simulator can be programmed to offer varied experiences to a flight crew, but experiences that are safe, in that if you “crash” in a simulator, there is no injury, save to your pride. (We will return to the role of experience in flight training and flying later in this chapter.) Basically, a flight simulator is a training device that is safer, less expensive, capable of quick modification, and can operate in all weather and for all or any part of a 24-hour day. The flight simulator presents accurate cockpit displays to the flight crew and accurately (except for complete motion capability) responds to control and avionics inputs—all the time processing and storing data on the crew’s control inputs, etc. The characteristics of a flight simulator are that it: 1. Stores data that can be replayed and analyzed on crew input/crew response 2. Stores data that can be used to generate a realistic “environment”/mission or portion; control and other responses to crew inputs 3. Displays such data /info both to the flight crew and to the flight simulator operator 4. Responds to crew inputs accurately as to their effects on both system and environment 5. As would the actual plane, has accurate/valid displays of the status of on-board systems/ components (e.g., EPRs ) that are so vital for the crew to see and monitor 6. Provides two-way training interfaces for the flight simulator instructor and the flight crew being trained There are research applications of different types of flight simulators. The engineering development flight simulator is used in the cockpit/flight control systems design phase. This flight simulator makes system design easier because it is quickly reconfigurable, so that one can conduct system experiments on changes and reconfigurations without having to build or tear down a real system. The research/engineering simulator is a system used to help with basic R&D and applied research on system operations (to look at differences in various flight-control

Civil Aviation

93

systems; e.g., fly by wire vs. manual vs. fly by light). This flight simulator has many of the capabilities mentioned for the engineering development flight simulator. It can also evaluate human performance limits in the system and, evaluate system interaction with other systems (data-link, dispatch, ATC). And lastly the research/engineering flight simulator can be used to train personnel in the operation of the system. Although, as mentioned, there are several types of flight simulators, the focus in this chapter will be what is often termed the “full-up” flight simulator. Simply put, a “full-up” flight simulator has: 1. Three-axis motion base (pitch/roll/yaw) in two directions each (hexapod motion, six degrees of freedom) 2. Computer-generated graphic displays for the “out the windows” visual scene. These include most types of weather and environments, as well as night scenarios 3. A complete cockpit that is the same as in the aircraft type that the flight simulator models 4. Tremendous computer memory, which allows for superb capability in realistic flight simulation. In short, the flight simulator has ultra-realism, except damping as to the motion bases. These flight simulator capabilities are usually ranked in terms of fidelity—the closeness to which the flight simulator mimics actual flight. What may be of more import is the flight simulator’s veridicality (see below).

Human Factors and Flight Simulators The human factors training functions that a flight simulator can address are many and varied. We will simply list many of them. Such a list must include briefings and demonstrations, practice, performance analysis, learning enhancement, providing knowledge of results of actions, providing supplementary cues to the flight crew, building cognitive structures, performance assessment and, finally, providing a safe environment for introducing adverse conditions, malfunctions, and outright mechanical and other failures (cf. Flexman & Stark, 1987). Human factors training as a major component of pilot training became possible with the advent of AQP. This was the first time that a complete review of traditional training became possible. No longer were air carriers tied to the existing FARs, but were allowed to build programs to meet more specifically targeted needs. This training usually included CRM, but more accurately, allowed curricula development that addressed complex problem-solving events that a flight crew would encounter operating under adverse conditions. Thus, one saw an explosion of “line-oriented” training and certification events as part of this new emphasis on human factors. There are many issues, human factors being a major one, in the building and use of flight simulators for the training and evaluation of pilot/crew performance. We have briefly mentioned some, although we did not identify them as issues, in the preceding subsection. These issues cluster around the level of flight simulator fidelity

94

Human Factors in Simulation and Training

necessary for training and/or evaluating a pilot/crew task. This is not to say that other concerns do not exist. But, in the main, the use in training and evaluations of a flight simulator deals what is called fidelity (of the flight simulator to the aircraft) in terms of (a) the motion bases; (b) the out–the-windscreen/window visuals and, (c) the simulator response to control inputs. All these issues can be subsumed under the more general question of “How veridical to actual flight does a flight simulator have to be in order to ensure training that fully prepares a pilot/crew for actually flying the plane?” Veridicality is the closeness of the correspondence of the knowledge structures formed by using the flight simulator (learning and using controls/input responses/ instrument responses/visual scene/ motion, etc.) to the information environment it represents, that is, the actual aircraft type. Because the flight simulator is used in training to build knowledge structures in the crew that will be used in actual flight, it is obvious that veridicality is the primary factor in flight simulator design and use. The higher the fidelity of a flight simulator, the more veridical the knowledge structures built in the flight simulator are—making the flight crew optimally prepared for actual flight. To be sure, we do not want to give the impression that other issues do not exist and could include incorrect control inputs, incorrect sequencing, poor or incorrect decision-making, and more. However, these are beyond the scope of this chapter. We will, however, look at one basic human factors/training assumption: The use of any flight simulator for training tasks and skills results in the skills gained transferring to the actual cockpit, called “transfer of training.” The assumption is that the knowledge structures and information acquired previously on one task affect (neutrally) the ability to be trained on another/other tasks. Although we are talking of tasks here, the intent is that the learning of tasks trained in the flight simulator will aid the learning (performance) of a new task in the plane. Confusing as this may sound, the task learned in the flight simulator, when performed in the plane, is referred to as a new task. Why? Because the task in the plane, even though it is the same as the task in the flight simulator, is being done now in the plane; therefore, it is called a new task. Note: Transfer of training also occurs from plane to flight simulator. There are two types of training transfer that occur. The desired one is called positive transfer where previous training/experience aids “learning” of new tasks. However, there also exists negative transfer where previous experience interferes with learning new tasks or performing the trained, “old” task in new environment. It is important again to note that positive and negative transfer can occur in either direction: flight simulator to plane or plane to flight simulator. There are some examples of skills gained in flying that are not transferrable to the cockpit. This would seem to have to do with the veridicality of the flight simulator to the aircraft and can present some difficult problems. It would also seem to have to do with how the flight simulator was certified for use. As one component of transfer of training, there has been great emphasis on the need for and value of the flight simulator having the capability for hexapod axial motion. At this writing (2006), the FAA, NASA, and the DOT/Volpe Transportation

Civil Aviation

95

Center were conducting sophisticated experiments on the benefits and need for hexapod motion for the platform on which the airline flight simulator is mounted. The ultimate goals of these experiments ((Burki-Cohen et al., 2003; Go et al., 2003) are to provide information for a possible FAA AC and to develop information for a possible FAA policy on flight simulator motion requirements in airline pilot training and evaluation. A brief overview of the research findings to date is that hexapod platform motion has a significant positive effect for flight crew evaluation, but no significant benefit for training. Further, certain enhancements to the motion washout filters (lateral side force and heave motion) seem to be beneficial in all cases. However, for more complete information, the reader is referred to the works cited above, as it is not our purpose here to engage in a lengthy discussion of flight simulator motion issues and research. Another problem is that of “simulator sickness”: a phenomenon in which a pilot becomes sick (vertigo, nausea) in the flight simulator. This has been handled in several ways. It now seems clear that it is caused by a complex interaction that involves conflicting visual and kinesthetic cues. Another part of the interaction seems to be the time duration between flying and using the flight simulator. It would appear that, for some individuals, use of the flight simulator in close proximity to having actually flown can cause simulator sickness. This is (usually) easily remedied by specifying a minimum time duration that must pass between flight and flight simulator. In some cases, there has been a reverse “simulation sickness” reaction whereby the pilot becomes sick in the aircraft. Obviously, this is dangerous, as well as possibly career ending. To the author’s knowledge, this has been handled via specifying (as before) minimum time duration between flight simulator and use flight. Although medication, ranging from over-thecounter motion sickness pills to prescription antivertigo drugs can be used, it would seem to be only a one-time or short-term remedy. The rationale here seems obvious. We now will leave the above issues as we are quite sure that they are well covered in other chapters of this text and, the issues/problems that we mention in passing are not the focus of the chapter.

Major Drivers in Civil Aviation Pilot/Crew Training Air crew training for air carrier pilots (Part 121) has evolved over the years due to five main drivers. They are:

1. Technical advancements in aircraft systems and simulation realism 2. Engine out operations 3. Mission-critical alerts and warnings 4. Adverse condition operations 5. Human factors

Technical advancements in aircraft systems have, for obvious reasons, driven pilot training programs. The features of a new system and how it should be

96

Human Factors in Simulation and Training

used operationally have always been built into the curriculum. A good example of this is terrain collision avoidance system, version II (TCAS II), a hard/software system that provides the aircrew with alerts for terrain collision avoidance. Although TCAS I was initially introduced with a part-task trainer, it is now an important feature of full-mission simulators and collision avoidance training is now possible. Engine out operations as a major driver of pilot training are not quite so obvious. With the advent of multiengine aircraft, engine out-training had always been important to pilot certification. However, with the advent of swept-wing turbojet aircraft, this training took center stage. This was due to the unique aerodynamic properties of swept-wing aircraft, more specifically, asymmetrical thrust, and axis coupling. During asymmetrical thrust operations, the swept-wing turbojet aircraft experiences pronounced axis coupling, manifested in a rapid roll off or wing drop along with an equally pronounced yaw. Increased pilot skill was and is the only counter-tactic to this potentially fatal condition. In 1968, a DC-8 training accident involving asymmetrical thrust prompted the effort to conduct all simulator training. Such an effort was successful, prompting, among other things, advancements in simulator realism. Although often overlooked, the third major pilot training and certification driver has and continues to be mission-critical alerts and warnings. This area includes such maneuvers as stalls and steep turns, wind shear recovery, and ground proximity warning recovery. Recent additions include CAT III auto land system failure recovery maneuvers. It is important to note that all of these recovery maneuvers require the aircraft to be “hand-flown” by the pilot. This last statement brings to the surface what we think is the major challenge facing training managers today: the tension line between increasingly sophisticated autopilot systems and the continuing and pressing need for a high degree of basic “stick and rudder” pilot skills. Pilots therefore need to demonstrate proficiency in (1) adverse conditions (which include engine out operations), (2) low visibility operations, (3) mission-critical alerts and warnings, and (4) system and human limitations. We shall later show how these four conditions form the boundaries for the pilot’s worldview and how they can be incorporated into a training/operation model to manage/reduce risk.

FLIGHT SIMULATORS AND FLIGHT TRAINING DEVICES Before we begin this and the following sections, we again feel constrained to make the following caveat. Much of the following is from FAA documents: SFAR 58 and its accompanying AC (AC 120-AQP); AC 120-40B; AC 120-45A; AC 120-35B, and AC 1210-45B. For brevity, “ease of flow” and resultant clarity, we have condensed some of the material in these; we omitted portions not pertinent to this chapter and often deleted references to other FARs and ACs; on some occasions, there is paraphrasing. As has been said, the complete documents are online if the reader wants to see the entirety of any of the FAR and AC cited.

Civil Aviation

97

Overview The availability of advanced technology has permitted greater use of flight simulators for training and checking of flight crewmembers. The complexity, costs, and operating environment of mod-ern aircraft also have encouraged broader use of advanced simulation. Simulators can provide more in- depth training than can be accomplished in airplanes and provide a very high transfer of learning and behavior from the simulator to the airplanes. The use of simulators in lieu of airplanes results in safer flight training and cost reductions for the operators. It also achieves fuel conservation and reduction in adverse environmental effects. As technology progressed and the capabilities of flight simulation were recognized, FAR revisions were made to permit the increased use of simulators in approved training programs. Simulators have been used in training and some checking programs since the middle 1950s. Various FAR amendments gradually permitted additional simulators use in training and checking aircrews. A significant recognition of simulator capability has occurred since the early 1970s. In December 1973, FAR Amendments 61-62 and 121-108 permitted additional use of visual simulators. In the early 1990s, various ACs and SFAR 58 further recognized simulator capability and use in training and evaluating flight crews. Of importance is the fact that the FAA makes a distinction between an airplane (flight) simulator and an airplane flight training device. The FAA AC that deals with airplane simulators is FAA AC 120-40B, and AC 120-45B deals with FTDs. The term FTD covers everything from a PC with training-specific software to a mock-up of an instrument panel, to a complete cockpit. However, what air carrier pilots/crew use for our focal point, LOFT, is a full-up, hexapod axial motion-based, high-quality visual scene flight simulator, often called the “box” or the “sim.” Although FTDs are used as part-task trainers, it is the sim alone that is used for LOFT. The box has full mission capabilities to include ATC chatter/instructions as well as day/night and various weather and wind conditions. The simulator operator can program a mission (a flight from point A to point B) that introduces the full spectrum of conditions and problems gleaned from the experiences and reports of other pilots who have flown that particular route. The mission simulation also introduces conditions and problems that have been encountered or reported on other flights/routes.

Flight Simulators and Training An airplane simulator (commonly called a flight simulator) is a full-size replica of an airplane’s instruments, equipment, panels, and controls in an open flight deck area or an enclosed airplane cockpit, including the assemblage of equipment and computer software programs necessary to represent the airplane in ground and flight operations, a visual system providing an out-of-the cockpit view, a force (motion) cueing system with provides cues at least equivalent to that of a three degree of freedom motion system; and is in compliance with the minimum standards for a Level A simulator specified in AC 120-40, as amended.

98

Human Factors in Simulation and Training

The airplane simulators are placed, graded as it were, into 4 levels, A through D; the FTDs are similarly ordered, except the classification scheme of levels 1 through 7 is used. In both cases, the levels refer to the capabilities and complexities (hard and soft ware) of the training equipment. In both cases, all equipment are placed in a matrix, by level, that indicates what flight tasks can be trained at each level. The new designations and their relationships with the simulator definitions used and in FAR Part 121, Appendix H are: Level A—Visual Level B—Phase I Level C—Phase II Level D—Phase III While trying not to oversimplify this distinction, the main difference is that a “full-up” airplane simulator has axial motion capability whereas an FTD does not. This will become clearer below as we give the FAA definitions of both types of flight training equipment.

Flight Training Devices and Training An airplane flight training device is a full-scale replica of an airplane’s instruments, equipment, panels, and controls in an open flight deck area or an enclosed airplane cockpit, including the assemblage of equipment and computer software programs necessary to represent the airplane in ground and flight conditions to the extent of the systems installed in the device. An FTD does not require a force (motion) cueing or visual system and meets the criteria outlined in the AC for a specific flight training device level. In an FTD, any flight training event or flight checking event can be accomplished. Nonvisual simulators are now grouped with Level 6 training devices, but must meet the requirements, except for visual, of a Level A simulator. There is no other change in their characteristics or description; just their name. Alphabetic designations were chosen for simulators to maintain a distinction form the numerically designated training devices. In coordination with a broad cross section of the aviation industry, the FAA has defined seven levels of flight training devices as Level 1 through Level 7. Level 1 is currently reserved. Levels 2 and 3 are generic in that they are representative of no specific airplane cockpit and do not require reference to a specific airplane. Levels 4 through 7 represent a specific cockpit for the airplane represented. Within the generic or specific category, every higher level of flight training device is progressively more complex. Because of the increase in complexity and more demanding standards when progressing from Level 2 to Level 7, there is a continuum of technical definition across those levels. (Note: For complete matrices of flight simulator and FTD/levels and the tasks that can be trained and/or checked in each device, see AC120-40B and AC120-AQP.)

Civil Aviation

99

Flight Simulator and FTD Assessment The need for standard flight simulator and FTD assessment and qualification criteria was necessitated by the use of simulators for training and checking. The evolution of the simulator technology and the concomitant increase in permitted use has required a similar evolution of the criteria for simulator qualification. A listing of known simulator criteria should be, therefore, informative. The qualification basis for a given simulator may be any of the past criteria, depending on when the simulator was first approved or last upgraded. The training and checking credits for nonvisual and visual simulators were delineated in FAR Part 61, Appendix A, and FAR Part 121, Appendices E and F. Four levels of simulators were addressed; Basic (nonvisual and visual simulators), Phase 1, Phase II, and Phase III. (These designations have since been replaced by levels A through D as seen in subsection A.) Each of the four levels is progressively more complex than the preceding level and each contains all the features of preceding levels plus the requirements for the designated level. As the technology has advanced, so has the qualification guidance. Efforts to keep the criteria updated are, therefore, ongoing with active participation from both industry and government resources. Any FTD or airplane flight simulator must be assessed in those areas that are essential to accomplishing airman training and checking events. The assessment requirements and guidelines are, essentially, the same for both FTD and flight simulator. This includes climb, cruise, descent, approach, and landing phases of flight. Crewmember station checks, instructor station functions, checks, and certain additional requirements depending on the complexity of the device (i.e., touchactivated cathode ray tube instructor controls; automatic lesson plan operation; selected mode of operation for “fly-by-wire” airplanes, etc.) must be thoroughly assessed. Should a motion system or visual system be contemplated for installation on any level of flight training device, the operator or the manufacturer should contact the NSPM for information regarding an acceptable method for measuring motion and/or visual system operation and applicable tolerances. The motion and visual systems, if installed, will be evaluated to ensure their proper operation. The FAA’s intent is to evaluate flight simulators and FTDs as objectively as possible. Pilot acceptance, however, is also an important consideration. Therefore, the flight simulator or FTD will be subjected to validation tests listed in the relevant ACs. These tests include a qualitative assessment by an FAA pilot who is qualified in the respective airplane or set of airplanes. Validation tests are used to compare objectively flight simulator or FTD data and airplane data (or another approved reference data) to assure that they agree within a specified tolerance. Functions tests provide a basis for evaluation of the flight simulator or FTD capability to perform over a typical training period and to verify correct operation of the controls, instruments, and systems. The above subsections should suffice as an introduction to the FARs and ACs as they apply to defining flight simulators and FTDs, as well as to the concept of “levels” of flight simulators and FTDs. When we deal with LOFT later in this chapter, it will be LOFT as done in a Level D flight simulator.

100

Human Factors in Simulation and Training

SFAR 58, AND AQP History In 1975, the FAA began to deal with two issues: hardware requirements needed for total flight simulation and the redesign of training programs to deal with increasingly complex human factors problems. At the urging of the air transportation industry, the FAA addressed the hardware issue first. This effort culminated in 1980 in the development of the Advanced Simulation Program. Since then, the FAA has continued to pursue approaches for the redesign of training programs to increase the benefits of advance simulation and to deal with the increasing complexity of cockpit human factors. A joint government–industry task force was formed on flight crew performance issues. On September 10, 1987, the task force met at the Air Transport Association’s headquarters to identify and discuss flight crewmember performance issues. Working groups in three major areas were formed, and the recommendations to the joint task force were presented to the FAA administrator. Some of the substantive recommendations to the FAA administrator from the flight crew member training group were the following: a. Provide for a Special Federal Aviation Regulation (SFAR) and Advisory Circular to permit development of innovative training (SFAR 58) b. Require all training to be accomplished through a certificate holder’s training program c. Provide for approval of training programs based on course content and training aids rather than specified programmed hours (SFAR 58) d. Require Cockpit Resource Management (121.404, SFAR 58) training and encourage greater use of line-oriented flight training

SFAR/AQP: Overview and Synopsis In this subsection, we will show how the relevant FAR (SFAR 58) on AQP came into being, the portions of it that directly impact the use of flight simulator, and finally, why and how SFAR 58 and the accompanying extensive AC have changed civil pilot training (the most significant change being the enhanced role of flight simulator and LOFT). We will now present a brief look at the aspects of SFAR 58 that pertain to training and to the use of flight simulators and other training devices. Note: Any Special FAR (SFAR) expires within 5 years unless extended or made into an FAR. In the case of SFAR 58, it would have expired in late 1995, but has been extended until 2 October, 2005. It is interesting to note that the original AQP AC accompanying SFAR 58 was published in 1990. It has been updated and is in the process of finalization to be reissued in its newest version. It was expected that this would occur in early 2004, or some 14 years since the original AC was published. In response to the recommendations from the joint government–industry task force and from the National Transportation Safety Board (NTSB), the FAA put forward SFAR 58, Advanced Qualification Program, in October 1990. AQP was also established to permit a greater degree of regulatory flexibility in the approval of

Civil Aviation

101

innovative pilot training programs. Based on a documented analysis of operational requirements, an airline (FAA certificate-holder) under AQP may propose to depart from traditional training practices and requirements for pilot/crew with respect to what, how, when, and where training and testing are conducted. This is subject to FAA approval of the specific content of each proposed program. SFAR 58 requires that all departures from traditional regulatory requirements be documented and based upon an approved continuing data collection process sufficient to establish at least an equivalent level of safety. AQP provides a systematic basis for matching technology to training requirements and for approving a training program with content based on relevance to operational performance.

SFAR 58 SFAR 58 provides for approval of an alternate method, Advanced Qualification Program, for qualifying, training, certifying, and otherwise ensuring competency of crew members, aircraft dispatchers, other operations personnel, instructors, and evaluators who are required to be trained or qualified under parts 121 and 135 of the FAR or under this SFAR. For pilots in command, seconds in command, and flight engineers, a proficiency evaluation—a portion of which may be conducted in an aircraft, flight simulator, or flight training device as approved in the certificate holder’s curriculum—must be completed during each evaluation period. Each AQP qualification and continuing qualification curriculum must include approved training on and evaluation of skills and proficiency of each person being trained under an AQP to use their cockpit resource management skills and their technical (piloting or other) skills in an actual or simulated operational scenario. (The integrated assessment of CRM and technical flight skills will be discussed later.) For flight crew members this training and evaluation must be conducted in an approved flight training device or flight simulator. A person enrolled in an AQP is eligible to receive a commercial or airline transport pilot, flight engineer, or aircraft dispatcher certificate or appropriate rating based on the successful completion of training and evaluation events accomplished under that program if the applicant shows competence in required technical knowledge and skills (e.g., piloting) and cockpit resource management knowledge and skills in scenarios that test both types of knowledge and skills together. (Note: There are other requirements, but, as said, we are focusing on the flight simulator in AQP.) As has been said, any flight simulator or FTD that will be used in an AQP for one of the following purposes must be evaluated by the FAA administrator for assignment of a flight training device or flight simulator qualification level: (i) Required evaluation of individual or crew proficiency (ii) Training activities that determine if an individual or crew is ready for a proficiency evaluation (iii) Activities used to meet requirements for recent experience (iv) Line operational simulations (and to include LOFT)

102

Human Factors in Simulation and Training

AQP, LOS/LOFT, and Simulators The capabilities and use of simulators and other computer-based training devices in training and qualifications activities have changed dramatically. SFAR 58 and AC 120-AQP allow certificate holders that are subject to the training and evaluation requirements of Part 121 and Part 135 to develop innovative training and qualification programs that incorporate the most recent advances in training methods and techniques. SFAR 58 and the AC also apply to training centers under Part 142, which intend to provide training for eligible certificate holders. AQP emphasizes crew-oriented training and evaluation. These training and evaluation applications are now grouped under the general term of line operational simulations, including LOFT, special purpose operational training, and line operational evaluation. Due to the role of crew resource management issues in fatal accidents, it has become evident that LOS is the most appropriate environment to train and evaluate both technical and CRM skills. Consequently, a structured LOS design process is necessary to specify and integrate the required CRM and technical skills into line-oriented LOS scenarios. These should provide the opportunity for training or evaluation, as appropriate, in accordance with approved AQP qualification standards. All of the above can be done in an FAA-approved flight simulator.

LINE-ORIENTED FLIGHT TRAINING (LOFT) Background LOFT emphasizes an orientation on events that could be encountered in line operations (“flying the line”). Thus, mission realism—making the LOFT session correspond as closely as possible to event sets that could or would be encountered in flying one or more point A to point B legs—becomes the major driver in LOFT design. In other words, events that make up a LOFT scenario should pass the test of mission realism where it is reasonable to assume that this “could” happen in the real world. The use of flight training devices and flight simulators has become increasingly important in training flight crew members. As the level of sophistication in simulators increased, air carriers have come to rely on simulators for part or all of their flight training programs. Since the mid-1970s, some FAR Part 121 and Part 135 operators have implemented alternative simulator training (now LOFT) to train crewmembers. LOFT is training in a simulator with a complete crew using representative flight segments that contain normal, abnormal, and emergency procedures that may be expected in line operations. This FAA AC specifies the multiple types of line operational simulations, of which LOFT is one. The AC also specifies the types of LOFT and LOE. In this AC, the FAA provides guidelines for LOFT content, LOFT use, LOE use and LOFT/ LOS instructor qualifications. We will briefly show some relevant portions of this AC because LOFT and LOS are done in a flight simulator and because LOFT is the vital venue for pilot training and evaluation. (Excerpt from FAA AC 120-35B 58, Line Operational Simulators/LOS, with our usual caveat.)

Civil Aviation

103

LOFT is a useful training method because it gives crewmembers the opportunity to practice line operations (e.g., maneuvers, operating skills, systems operations, and the operator’s procedures) with a full crew in a realistic environment. Crewmembers learn to handle a variety of scripted real-time scenarios, which include routine, abnormal, and emergency situations. They also learn and practice cockpit resource management skills, including crew coordination, judgment, decision-making, and communication skills. The overall objective of LOFT is to improve total flight crew performance, thereby preventing incidents and accident during operational flying. The types of LOFT are: 1. Qualification LOFT—An approved flight simulator course of LOFT to facilitate transition from training using flight simulation to operational flying. Qualification LOFT meets other requirements of FAR Part 121, Appendix H. 2. Recurrent LOFT—An approved flight simulator course of LOFT which may be used to meet (yearly) recurrent flight training requirements and to substitute for alternate proficiency checks. 3. Line Operational Evaluation—An evaluation of crewmembers and crews in a flight training device or flight simulator during real-time Line Operational Simulations. LOE is primarily designed for crewmember evaluation under an AQP. LOE is conducted in a flight simulator or flight training device and is designed to check for both individual and crew competence. [Authors: Such competencies should be demonstrated in a mission-realistic environment.] LOE may also be used to evaluate a specific training objective. Operators conducting LOE may be approved to use any level of flight simulator or flight training device, depending on the objective of the evaluation and the capability of the device. The level of the flight simulator of flight training device required to support evaluation in LOE will depend upon the evaluation objectives and the device’s capability to support the objectives. Special purpose operational training (SPOT) is an approved course of operationally oriented flight training, conducted in a flight simulator or flight training device, which may be used to learn, practice, and accomplish specific training objectives, for example, training in variant aircraft or special aircraft equipment. LOFT is “no-jeopardy” training, that is, the instructor does not issue a passing or failing grade to a participating crewmember. As a LOFT scenario progresses, it is allowed to continue without interruption so crewmembers may learn by experiencing the results of their decisions. Decisions which produce unwanted results do not indicate a training failure but serve as a learning experience. If the LOFT instructor identifies crew member performance deficiencies, additional training or instruction will be provided. This training or instruction may be in any form, including additional LOFT. Before the crew member may return to line operations, the performance deficiencies will be corrected and the instructor will document the training as satisfactorily completed. The “no-jeopardy” concept allows crew members to use their full resources and creativity without instructor interference. At the end of a

104

Human Factors in Simulation and Training

LOFT session and after debriefing, the instructor certifies that the training has been completed. (We will return to jeopardy versus nonjeopardy in LOFT later; it has both a history and problematic aspects.) To iterate: Each AQP qualification and continuing qualification curriculum must include approved training on and evaluation of skills and proficiency of each person being trained under an AQP to use their cockpit resource management skills and their technical (piloting or other) skills in an actual or simulated operations scenario. For flight crewmembers, this training and evaluation must be conducted in an approved flight training device or flight simulator.

The reader may feel, at this point, that what has been presented has been an overabundance of FAA definitions, regulations, policies, and guidance. This is only somewhat true and if the reading of what has come before may have been somewhat dry and/or tedious, a point must be made again. All of civil aviation’s activities come under the purview of the FAA. It is not possible to completely or clearly understand the role and functions of flight simulation (whether in flight simulator or FTD) in civil aviation without the information so far presented.

MAXIMIZING LOFT: THE MISSION PERFORMANCE MODEL AND THE OPERATIONAL DECISION-MAKING PARADIGM LOFT: Current and Future We have described the initial development of LOFT, its current form, and content. We have stated that LOFT is the major training and check tool in an AQP Program. LOFT and LOE, as performed in the flight simulator, simply put, are both the optimal training/testing environment and the “court of last resort,” as it were. Upon successful completion of LOFT/LOE, the pilot crew have earned new ratings or certifications or are “good to go” for another year. However, the current LOFTs and LOEs need to be strengthened for exactly the reasons cited above; they are the best, and safest methods for cutting-edge, realistic training and evaluation, and they provide a final stamp of approval in an AQP—as well as a more traditional Part 121-based training program. We have set the stage to present how our earlier statements about the tremendous potential and existing use-value of LOFT can be merged and realized via the MPM and the ODM models.

Risk Identification and Management: Training and Evaluation with MPM and ODM Paradigm The end result of all civic pilot training should be to prepare a pilot to identify, assess, and manage risk. The primary role of the pilot as a risk manager has been emphasized multiple times over the past 10 years by the authors (Lofaro & Smith, 2003, 2001, 2000, 1999, 1998, 1993). LOFT is simply the preeminent tool, as well

Civil Aviation

105

as test situation for training and evaluating civil captains/crews. Over the years, two major models have been developed by which LOFTs can be designed and crew performance enhanced as well as evaluated. The first is the mission performance model (MPM) as developed by Captains Kevin Smith, and William Hamman of UAL, with some input from Jan DeMuth and Ron Lofaro of the FAA. The MPM came from the recognition that the CRM skills must be integrated with a corresponding set of technical skills (flight control skills) in an interactive matrix in order to fully evaluate overall crew proficiency. Further, such an integrated CRM approach would serve as a training tool—in LOFT design and in specifying where the CRM/flight control skill linkages existed. An approach to integrated CRM, along with both human factors and flight control/technical skill evaluation scales, was partially developed during an FAA-hosted workshop in 1992. Dr. Lofaro was the designer and facilitator of this workshop and Captain Smith, along with several training captains from NW, DL, United Airlines, the chief pilot for Boeing, and others were the participants. The results of that workshop are in Report DOT/FAA/ RD-92/5: Workshop on Integrated Crew Resource Management (Lofaro 1992). The integration, and assessment, of CRM and flight control skills received considerable attention—and, a fair share of concern and skepticism—in the 1980s and early 1990s. As one response, the ATA formed a joint air carrier/FAA/academic working group to deal with this and other CRM issues in 1990; both Kevin Smith and Ron Lofaro were on that group. Dr. Robert Helmreich, in conjunction with several major air carriers, developed a complete set of flight crew CRM performance markers (he termed them “CRM behavioral markers”) with behaviorally-anchored rating scales. In a NASA/FAA/University of Texas project, Helmreich worked with several air carriers on research that involved the use of these markers in LOFT. In 1991, Captain Kevin Smith (United Air Lines) and Jan DeMuth (FAA Flight Standards) developed an initial set of performance markers for the technical/flight control skills. Both the CRM and the technical sets of markers were used in the next step of CRM integration: the attempt at developing an analytic paradigm. Kevin Smith created the framework for a model that demonstrated that the CRM human factors skills and the technical/flight control skills are interrelated, interdependent, and often simultaneous in execution—that, for safe and efficient flight, CRM can sometimes be integral to flight control, and vice versa. This model is called the mission performance model. Captain Smith worked with Captain Hamman and others to develop exemplars of the application of the MPM to actual flight maneuvers, such as an engine-out at V1, with a turn procedure required by the terrain.

Mission Performance Model The model is based on these concepts: 1. Flying is an integrated, mission-oriented activity and must be evaluated as such. 2. The crew’s performance is not adequately captured by totaling the sum of the component tasks/subtasks/elements. The focus must be on crew function—usually at the task and critical subtask levels.

106

Human Factors in Simulation and Training

3. Flight proficiency skills/knowledge are interwoven, interdependent, and necessarily interact with the CRM skills/knowledge differentially across tasks and conditions. These interactions can be identified/specified by a matrix-type crew mission performance model using the tasks, which comprise a mission/flight leg. (This is what we term integrated CRM.) 4. The model can capture these interactions and can be sensitive to changes in both task and mission—for example, show that, for different tasks and conditions, the technical/flight proficiency skills, the CRM skills, and their interactions, will vary. This is an indication that the model has a measure of discriminatory power or “sensitivity” to changes in task and conditions. 5. Helmreich’s behavioral markers can adequately delineate CRM skills and provide one basis for the (flight crew) mission performance model, as can the technical markers capture the flight control skills and form the other MPM basis (see below). Finally, the bases for the technical proficiency evaluation currently exist in a behavioral marker- type format with scales. Both the marker and their scales can be validated/modified for evaluation of all these proficiencies, which will be called “crew performance markers—technical factors.” This arena focuses on the crew as a unit and how well they discharge the technical aspects of the mission. It specifically addresses precision maneuvers across these areas:

1. Flight maneuvers and attitude control 2. Propulsion/lift/drag control 3. System operations 4. Malfunction warning and reconfiguration 5. Energy management

Another rationale for the MPM, and later, ODM, is that pilot/crew performance has often been seen as a series of discrete tasks, where each task was further decomposed to reveal a set of subtasks combined with the requisite knowledge and skills necessary for subtask completion. For many applications, such as aircrew training, this produces a large collection of task, knowledge and skill data. In most traditional pilot or crew training programs, these are taught individually as isolated knowledge components. Consequently, the trainee is left with the responsibility of combining these isolated knowledge components into integrated wholes (Merrill & Li, 1989). However, the linear decomposition of individual tasks does not address integrated functioning nor does it reveal how tightly coupled teams (flight crews) perform, thus an analytical process other than the traditional task analysis approach is considered necessary. Therefore, the MPM uses the functional modeling approach. The mission performance model has embedded within it the concept of functions. It is proposed that the model, as constructed, represents all significant functions necessary for the successful completion of an air transport mission. This model views crew performance as consisting of system-level functions that represent the mechanisms used to perform a mission activity. The importance of a model that is founded

Civil Aviation

107

on a set of systems-level functions cannot be overstated. Moreover, the model delineates crew performance at a level of abstraction that is significantly different than the current descriptions of individual performance. The MPM consists of a set of functions that can be activated by inserting an instance/example—in other words, asking the function to specify/describe a particular activity or situation in the mission. If a particular function such as workload management was asked to “spin out” the components of a particular mission activity, such as takeoff with an engine failure at V1, then the function should be able to organize, sequence, distribute, and coordinate key crew actions so that a successful outcome could be assured. This workload management function, then, can be viewed as a generic performance statement that:

a. Can be applied to many mission activities/situations, and b. Can be activated for the application to, and specification of, any one of these activities/ situations.

The mission performance model specifies the components of flight crew “effectiveness” (effective performance). That the model represents effectiveness is important to understand since, if the crew is really engaging in the set of functions that are both germane and linked to the problem at hand, and if these functions are the prerequisites for a successful outcome, then effectiveness has been demonstrated. Similarly, the model is prescriptive; it prescribes what needs to be accomplished for the crew to perform effectively. For example, we can specify, during the LOFT design process, what are very likely to be the necessary crew behaviors. In summary, in the MPM, human factors as well as technical performance clusters are specified along with the applicable markers under each cluster. For example, under workload management and situational awareness, key markers include preparation, planning, vigilance, workload distribution, and distraction avoidance. Similarly, under the cluster entitled “propulsion/lift/drag control,” the key makers include instrument interpretation, energy management, power control, lift control, and drag control. When all these markers are combined into a matrix array with their various categories, the MPM emerges.

From CRM/MPM to ODM Upon completion of the 1992 Integrated CRM Workshop, a new set of issues and concerns became apparent to Smith and Lofaro. The integrated CRM concept and the MPM were well received by the workshop participants. However, due to many factors—such as a lack of FAA interest in follow-on efforts and a CRM “establishment” that was not open to taking CRM to either another level or in new directions, along with the jeopardy issue—it was clear that integrated CRM and the MPM had become dead issues. Of much more import was the realization that CRM was not the human factors silver bullet. Captain Hal Sprogis asked, “Is the Aviation Industry Experiencing CRM Failure?” (Sprogis, 1997). Captain Daniel Maurino had written

108

Human Factors in Simulation and Training

Crew Resource Management: A Time for Reflection (Maurino, 1999). Both indicate that we may have expected too much from CRM; that the relationship between CRM and safety, which was and is the prime rationale offered for teaching CRM, has not been proven; that CRM is a process, not an outcome, and certain efforts to assess outcomes (i.e., individual performance) may be misguided. American Airlines, in July of 1996, set aside much of CRM as they were doing it. Their reason was that their flight crews have valid objections to, and concerns about, CRM: CRM is too often viewed as a number of interpersonal issues that simply do not define the problems that we face in aviation … CRM training will most likely always be defined and suffer in terms of the first generation of courses … which were seen as “touchy-feely,” “getting along,” and “managing human relations or resolving personality conflicts” rather than dealing with truly important concerns. (Ewell & Chidester, 1996)

American’s new focus is on preparing flight crews for the daily challenges of normal and normal operations encountered flying the line. Delta Airlines, in the same timeframe, revamped their “CRM for New Captains” course and now calls it “In Command.” As with American, Delta emphasized leadership, responsibility, and performance. So, in 1996–1997, we see these two major carriers eschewing overemphasis on communication and interpersonal relations in their CRM training. Lastly, United Airlines’ version of CRM was and is called C-L-R, where the C is for “command” and the L is “leadership,” indicating that United wanted to bypass the interpersonal with C-L-R and move on to the performance issues. Yet, even United changed aspects of their CRM in 1997. Further, “common wisdom” was that pilots made good decisions easily and almost naturally, aided by (some) increase in experience. The facile assumption that additional experience will teach pilots to make better decisions has proven to be a dangerous fallacy. Experience can be a nasty teacher, often giving the test before, or without, giving the lessons and materials needed for the test. Experience can also reinforce poor decisions and behaviors that seemingly “worked” in the past (the “not your day to die” phenomenon). There was also the commonly accepted view that decision-making is but one of the components of CRM. This was, and is, a gross error. CRM, with its emphases on communications and team function, is but one enabler of good decisions. As such, it is a part of decision-making, not vice versa. CRM is, simply put, an enabler of decision-making. Decision-making is the primary tool to be used by the pilot and crew with their primary functions: risk identification and risk reduction. In short, risk management. And, it became apparent that aeronautical decision-making was greatly different that decision-making on the ground, and that a new paradigm was necessary that both articulated the differences and had a new set of decision-making/DM techniques specific to what pilots and crew encounter. Another realization was on the primacy of LOFT in pilot training. As one result of this, Smith et al. wrote an interrelated set of papers on LOFT design and delivery, that later formed a session

Civil Aviation

109

at the 1993 International Symposium of Aviation Psychology biennial meeting in Columbus, Ohio. As another result, Lofaro designed and held an FAA/Industry/DoD/Academe Workshop in Denver (1992) which had some of the CRM workshop participants and added others from the decision-making world. The two-volume FAA report on this workshop (DOT/FAA/RD-92, Vols. I, II; Lofaro and Adams) initiated the efforts for what has become the operational decision-making model of Kevin Smith and Ron Lofaro (Lofaro & Smith, 2001, 2003).

OPERATIONAL RISK MANAGEMENT AND DECISION-MAKING Optimizing Performance during Complex Operations Managing risk, thinking critically, and making sound decisions when performing complex operations are our greatest challenges. When a problem arises during a complex business or military operation, non-linearity becomes a reality—the operation continues while the problem is being addressed. And importantly, the problem-solving team is almost always the same as those conducting the operation. This dual track immediately puts stress on the human-machine system resulting in a mission-critical situation where high levels of complexity and uncertainty prevail. This section addresses this reality by characterizing the critical operational concerns when addressing risk management and decision-making during the course of performing an intense complex operation. We present an understanding of the unique characteristics of an operational decision, the decision analytic structure that contains a robust risk management and decision processing algorithm, and importantly the analytics that can be employed when uncertainty prevails. A rigorous analytic process that can be employed when operating under conditions of complexity and uncertainty is presented herein.

Introduction As a general class of phenomena, complex environments contain complex situations and complex systems. Complex environments are one of the most challenging to consider, in large measure because of our inability to understand and predict; they can be fraught with uncertainty. If one is planning to operate in a complex environment by employing large-scale dynamic systems, conventional reasoning— especially determinism—cannot be used. Complex entities are non-deterministic by nature because complexity theory informs us that complex systems exhibit novel behavior and emergent properties, rendering these entities and phenomena into a class by themselves residing outside of conventional wisdom. Tackling the decision problem for large-scale dynamic systems utilized in the field of aviation is of immediate importance yet is arguably the most difficult. This is because very little is understood with respect to optimizing the performance of such systems, and previous attempts have not considered the levels of uncertainty

110

Human Factors in Simulation and Training

associated with such systems. This chapter adds some measure of analytic rigor to the discussion.

The Overall Mission Continuation Decision Operational decision theory was created to support operational decision-making. Specifically, this body of knowledge helps identify and optimize operational decisions. Operational decisions are singular among all other classes of decisions and represent the most important command activity. Importantly, ODM provides for the broad situation awareness needed to identify risk and the structural mechanisms necessary to manage a rising risk profile. In an effort to redesign pilot training from the ground up, the Advanced Qualification Program set out to understand specific pilot activities, with the objective of directly attacking the causes of controlled flight into terrain (CFIT). They defined observable mission-related activities and attempted to integrate these with team-related or interpersonal activities (crew resource management). Through this and other studies, they realized that various mission tasks were not performed in a linear sequence, but were done selectively and differentially. Furthermore, simulator studies by Smith revealed more astounding results: High-performing crews did something unexpected—they prioritized their tasks. So while the CRM (the management of human resources) deconstructed task activities to understand the pilots and crews’ tasks, it revealed something else entirely. But what was it? The answer came from Dr. Robert J. Sternberg (1985) and his triarchic theory of intelligence. He proposed a cognitive superstructure that informed and triggered selective activities according to some rule as yet unidentified. If we understood the characteristics of this superstructure then we had a chance to understand how and why such crews did so well. In the pilot’s task universe, “mission activities” existed side by side with other “task organizing” activities. High-performing pilots could differentiate and prioritize, thereby optimizing mission outcome. But what exactly is going on? Smith and Larrieu proposed a radical idea: How humans perform in groups (as a crew) is beside the point. What is critical for mission success is how well flight crews and ultimately the captain solves problems in a complex environment. Thus in subsequent work by Smith and Larrieu, non-linear problem-solving took center stage.

1. This breakthrough came with the following insights. 2. All air carrier mission activities are highly planned, often using sophisticated planning tools. 3. While all activities are planned, excellent pilots do not plan real-time activities. Some are discarded altogether. 4. These pilots prioritize and select tasks using some kind of decision-making process to optimize mission outcome. This decision-making process gained definition after Keeney and Raiffa (1976) invented a branch of mathematics that dealt with the numerical weighting of multiple

Civil Aviation

111

attributes with the multi-attribute utility theory (MAUT). This defined operational decisions, identified key decisions, and specified triggers that activate certain decision pathways. High-performing pilots were selecting optimum pathways, but this had yet to be understood. 1. An operational decision for pilots is now defined by Smith and Hastie (1992) as containing three unique components: 1. It must often be performed using incomplete information. 2. Once airborne, it is always performed under increased time compression. 3. Consequences of poor decisions are often catastrophic, placing the aircraft, crew, passengers, and the corporation in jeopardy.

Determining Risk The operational decision for the air transport mission is a four-branched network, which captures the planning nature of the activity, the need to prioritize to optimize the outcome, and conforms to the following rules:

1. If the risk to the completion of the mission is low, then continue with the original mission plan. 2. If the risk to the mission is moderate, then modify the mission to either reduce or stop the risk from rising. 3. If the risk is high, abandon the mission plan and activate available alternatives. This we refer to this as “divert—reject—abandon.” This breakthrough concept was presented at numerous symposia and dramatically changed the dialogue so that more and more aviation professionals were willing to discuss decision-making and prioritizing. See Figure 3.1. In order for Figure 3.1 to be effective we ask what is the nature of risk? How can we identify and quantify it? Moreover, how can we trigger a particular decision path of the four-branched structure to ensure an optimum outcome? The nature of risk means we must deal with it or it can get worse—it will be a rising risk. In aviation systems, unless decisive action is taken during critical events, risk will continue to rise to a point beyond which one experiences a catastrophic mission failure. Such a point is called the critical event horizon. Rising risk can be explained by using the risk continuum, as is shown in Figure 3.2a. The risk continuum is organized into three zones. When risk rises it passes through zone one, where the risk is judged to be low, to zone two, where the risk is moderate. If the encountered event is critical enough, or if risk has not been mitigated, it will likely become high risk, zone 3. Each zone determines certain action. For low risk, continue with the mission plan. For moderate risk, modify the mission plan to arrest the rise or lower the risk. For high risk, where catastrophic failure is probable, abandon the mission plan and immediately implement survival measures. The course the risk takes is determined by critical events, almost like the critical events enter the operational environment acting like a hostile agent. The result is

112

Human Factors in Simulation and Training

FIGURE 3.1  (a) Four-branch decision analytic structure depicting basic concept concerning risk management. (b) Depiction of the risk management algorithm.

an attack on the mission system where degraded functions are possible, likely up to and including total system failure. This hostile agent invades the operational system or mission space. Arguably, the operator’s most critical decision is to determine if catastrophe is imminent and if so to take decisive action. The catastrophe avoidance algorithm is depicted in Figure 3.2b.

Operational Envelope The mission space is a reality with explicit boundaries. These boundaries define when it is acceptable to operate and when it is not. For example, it is not acceptable to operate when a hostile agent such as a “microburst alert” enters the mission space for an airplane. Nor is it acceptable to continue to fly to the intended destination when a power plant is degraded. Hostile agents come from four general directions. These are:

1. Any adverse condition, such as adverse wind, freezing precipitation, and so forth.

Civil Aviation

113

FIGURE 3.2  (a) The risk continuum. (b) Catastrophe avoidance algorithm.

2. Restricted visibility. This can often limit the ability to land at a particular airport, causing great concern if insufficient fuel remains to proceed to an alternate airport. 3. Mission-critical alerts and warnings. This could be such things as terrain alert, traffic alert, or thunderstorm detection. 4. Human and system limitations. System limitations could be speed or altitude, where human limitations could be fatigue, task overload, or inexperience. These four hostile agents create boundary conditions. The boundary conditions are the edges of an operational envelope. See Figure 3.3. Low risk resides within the envelope. Risk factors that impact the mission but do not place the aircraft outside the boundaries should be considered as moderate. The operational strategy would be to modify but not abandon the mission. Some agents are more dangerous than others. For example, a microburst alert is more dangerous than a wind shear alert. Thus we can say that some agents are highly energetic (like the microburst alert) and others are moderately energetic (like the wind shear alert). Some agents, regardless of their energy type, have one other rather important characteristic—they can bind with another agent. This

114

Human Factors in Simulation and Training

FIGURE 3.3  Operational envelope.

FIGURE 3.4  The cumulative effect of two risk factors.

phenomenon is called the “cumulative effect.” This can be a dangerous situation because it can go undetected and produce a situation so dire that catastrophic mission failure is imminent. For example, a combined agent could be low visibility at the destination airport combined with strong crosswind, which could produce an untenable situation. These are two hostile agents coming from two different sides of the operational envelope: adverse conditions and restricted visibility, as shown in Figure 3.4. This figure shows the combined vector is the resolved hypotenuse of the triangle formed. The vector travels to the corner of the mission space; this represents a rising risk situation that must be addressed immediately. Let us look at an example of risk, the operational envelope, and the cumulative effect. On December 12, 2005, Flight 1248 attempting to land at the ChicagoMidway airport crashed (National Transportation Safety Board 2007). At the time a significant Midwest snowstorm made the weather exceptionally poor. The flight crew had to deal with four hostile agents that had entered the mission space:

1. Braking action advisories in effect with fair-to-poor braking action reported. 2. Short runway with no overrun. 3. Adverse wind, with an 8-knot tailwind reported. 4. Low visibility and approaching landing minimums.

Civil Aviation

115

Under the adverse conditions category, there are three mission-critical impact areas: braking action advisories, a short runway, and adverse wind. Under the restricted visibility category, the mission-critical impact was the plane was approaching CAT I minimums. While it can be argued that any single situation does not represent a “show stopper,” in the words of Senior Captain and Safety Manager Captain Bill Yantiss, “Being legal according to the book does not always make it safe.” By referring to the operational envelope, we can see that the mission should be immediately abandoned, and the approach and landing should not be attempted. To attempt the approach and landing leaves the rising risk unchecked and it will pass beyond the critical event horizon, resulting in catastrophic mission failure. In this case, this is precisely what happened because a meaningful risk assessment was not performed by the flight crew. Mission performance is optimized by first understanding the prevailing risk and then knowing what to do about it. When risk begins to rise, the flight crew must prioritize or discard activities to manage a rising risk profile. If risk mitigation measures are not effective, and the risk is high or projected to go to high, then the mission plan must be abandoned, and survival measures must be taken.

THE UNSTABLE, MISSED APPROACH DECISION Bayesian Probability Most probability theories deal with the here and now; Bayesian probability theory does not. Bayesian probability is based on the concept that the likelihood of an event can be understood in terms of a moving dynamic, which in turn acquires additional relevant information over time. In an operational environment where motion and stability are of major concern, such as with an aircraft, Bayesian probability helps flight crew members determine the level of risk and uncertainty caused by certain events. A change in the operational environment causes the emergence of an event. At this point it is not known if such an event is critical to the mission or not. (Most operations that are non-trivial contain a mission statement and thus they can be referred to as a mission.) The event’s impact on the mission is considered. This impact is stated in terms of the likelihood that the mission may or may not continue unimpeded. These impediments can be referred to as risk, which is the level of uncertainty that the mission will succeed. Low risk means that mission success is essentially assured, while high risk may signify mission failure is most likely. This level of uncertainty is the problem space where the initial projection is expressed as a hypothesis P(A) that the condition will deteriorate. Meanwhile, the mission continues and it follows a trajectory through time and space. In a dynamic environment, things change rapidly. After the initial assessment is made, additional evidence is most likely encountered or added. This additional evidence may likely emanate from a completely different source than the initial, triggering event. It resides along the planned trajectory, sometime after P(A) was encountered, and is expressed as P(B).

116

Human Factors in Simulation and Training

In the next step, since both P(A) and P(B) may be stand-alone events, we attempt to correlate both into a single metric. This will also serve to update P(B) with respect to P(A). Thus, we will update the figure of merit for P(B) given that P(A) is true. This is expressed as P(B/A). So far, we have: 1. P(A) is the probability of encountering a mission-critical event and its impact on the mission. 2. P(B) is additional evidence that has been encountered, expressed in probabilistic terms. 3. P(B/A) is an updated figure of merit for B given that A is true. 4. P(A/B) is the unknown that we wish to determine. It is the updated level of uncertainty.

Problem-Solving under Conditions of Uncertainty Our case study involves an operation where it is critically important that the onset of instability be determined notwithstanding the uncertainty associated with other events. • At point A on the trajectory, an operational parameter has been exceeded. The likelihood that this out-of-tolerance condition will result in the onset of instability at the approaching point D, is key operational knowledge to maintain the integrity of the operation. This is represented by P(B/A). • Additional evidence is obtained at point B and is represented by P(B). This evidence may be germane to the operation, and it could influence the determination of the onset of instability. • At the conditional point along the trajectory, identified as point C, P(B) is assessed. Given that P(B/A) is true, P(B) is updated and given a value commensurate with this updated evidence. This probability is represented by P(A). • It is necessary to determine at point D on the trajectory whether the onset of instability will occur. If it is highly probable that instability will occur, then best practices dictate that the mission should be abandoned prior to reaching this limit. This probability is the determined value of P(A/B). This is the unknown that we wish to discover, or, in another way of presenting it, this is the unknown in the mathematical equation that represents all three previous points. The mathematical equation is represented in Figure 3.5. The case study is shown in Figure 3.6.

FIGURE 3.5  Bayes’s theorem.

Civil Aviation

117

FIGURE 3.6  Bayesian algorithmic reasoning.

The Takeoff “Go/No-Go” Decision At the center of our image of an operational decision-maker is an actor (in the form of a crew member) who receives information about the current state and “encodes” this information as a value along a unitary risk dimension. Consider the example of takeoff operations in large transport aircraft, where the crew is constantly monitoring sources of information about the environment and the condition of the aircraft to assess the emergence of any operational risk that could impact the operation. High risk implies a significant danger to the operation. Low risk, on the other hand, implies a continuation of the takeoff is appropriate. However, this message is often not perfect and ambiguity may very well occur. The critical aspect is whether the information received came from the danger (high risk) distribution of all possible events or the no danger (low risk) distribution of events. When the value is either high or low, the decision is relatively straightforward. However, when the value is in the midrange, the decision becomes

118

Human Factors in Simulation and Training

FIGURE 3.7  Probabilistic distribution of the go and no-go takeoff decision

difficult because the discrimination is not clear. This region is called the zone of ambiguity. Figure 3.7 shows the probabilistic distribution of the go or no-go situation for the takeoff operation. The distribution on the left indicates the probabilities of receiving event messages at various locations on the risk continuum when the crew should decide to continue to “go.” The distribution on the right is the analogous probability density function when, in fact, the decision should be “no go” (abort the takeoff). In most cases in commercial flight operations, the information received by the crew indicates the takeoff operation is advised.

Unexpected Operational Difficulties In rare cases, the information received indicates high risk and thus a no-go signal is received. When there is a criterion located on the risk continuum and when the information available identifies a value above a threshold, the decision maker decides to abort the takeoff. Following the conventions of signal detection theory, Xc labels the decision criteria that are used to make the optimum decision (Figure 3.8). This situation is more complex than first realized when the area surrounding Xc is examined more closely. This is shown in Figure 3.9. In the first case, the crew should have continued but rejected (Case A). The second case is where the crew should have rejected but continued (Case B). Numerous studies have shown in the actual operational environment that ambiguity prevails. In this ambiguous area, messages from both the continue and reject cases overlap, and the implications of the actual perception of risk are ambiguous and confusing.

Civil Aviation

119

FIGURE 3.8  Eliminate ambiguity and confusion.

FIGURE 3.9  The takeoff decision.

Optimizing the Decision Function Ideally, we want to minimize both Case A and Case B error as much as possible. Here are two approaches. The first approach at improving the takeoff decision performance is the plausible approach. The second approach is more analytically generous. In the plausible approach, crews can be instructed to set Xc decision criteria in such a way that the opportunity for error is minimized. Any procedural setting of Xc inevitably involves trade-offs, where fewer rejections when the pilot should have continued means more takeoffs when the pilot should have rejected and vice versa. The plausible location for the criteria is between the two distributions, at the point where their density function curves cross. Such a location would appear exactly balanced and would minimize the two types of errors. However plausible, this is completely inappropriate because the loss incurred in takeoff under reject conditions is significantly greater than in a reject condition when the pilot should have continued the takeoff. However, if policymakers and operational managers could set values on the two relevant errors, it would be possible to set criteria for crews to optimize the decision outcome.

120

Human Factors in Simulation and Training

The analytical approach focuses on improving the discrimination between the two states (danger—should reject; low risk—can continue). This should be the primary goal of improvements in technology. But improvements in “discriminatory training” are also necessary. This corresponds to changing the relationship between the distributions by moving the distributions farther apart; Figure 3.9 shows how a more accurate decision can be realized.

Operational Analysis of the Takeoff Decision In this section, we will discuss the decision analytic structure and use the takeoff operation to further explain its properties. The decision structure is represented in Figure 3.9. The key choice points are depicted. They involve the choice of continuing takeoff as planned, continuing takeoff with modifications to the operational plan, or aborting takeoff due to significant danger. In this example, the key choice points involve the primary choice to continue the takeoff as planned or reevaluate the takeoff plan. The secondary choice is contingent on the first—either to modify the operation to accommodate a rising risk or to abort the takeoff due to excessive risk. The key activities associated with each choice point are the mechanism by which the decision is executed. It is important to realize that an optimum decision selection criterion, called an alternative, is the ability to select the most accurate decision path with respect to the prevailing risk at the time. Choice Point A represents the primary binary decision. Notice that this entails the evaluation of risk. This is important for many reasons. Risk analysis is critical in selecting the correct path. While execution of the proscribed maneuver is important, it is at Choice Point B after the risk is examined. Many studies as well as documented operational experience have suggested that the takeoff accident rate is excessive. But while this insight is important, they focus mostly on the maneuver execution phase rather than first examining the higher-order skill requirements involving the optimization of the operational decision.

Conclusion A serious challenge facing aircrews is maintaining an acceptable level of risk while performing a mission. Key to their success is to determine with accuracy and clarity if a low-risk situation prevails or is anticipated. If so, then crews can continue with the mission as planned. If a moderate risk posture is evident, then crews must modify the mission plan accordingly. If the risk posture is judged to be high, then crews must discontinue the current plan. The effective management of risk involves the optimum placement of the decision criteria, which we have labeled Xc, along the risk dimension. It also involves the reduction of the ambiguity zone through discrimination methods. Current data shows that the probability that crews will not make the correct abort decision with respect to accurate assessment is 54 percent. Flight crews are incorrectly assessing risk and aborting takeoffs at an alarming rate.

Civil Aviation

121

FIGURE 3.10  The takeoff decision.

Figure 3.10 summarizes this situation in decision analytic terms. The abort decision column shows that 54 percent were incorrect (cell B) while 46 percent were correct (cell D). Among the many solutions that have been proposed to reduce takeoff accidents, several proposals have involved moving the decision criteria, Xc. However, such an administrative adjustment of the decision criteria should not be undertaken without a careful analytical study of improving the discrimination capabilities of prevailing risk.

ODM AND MPM IN LOFT DESIGN, DEVELOPMENT, AND EVALUATION Introduction We must deal with major issues before we go into LOFT design using ODM and MPM. The first is that CRM training was first developed in the late 1970s/early 1980s after a series of disastrous and fatal air carrier accidents—accidents where perfectly functioning planes crashed. Human- factor errors by pilot/crew were seen as the cause of these accidents. As a result, the FAA wanted the air carriers to implement new, human-factors-oriented training—for example, CRM. Rather than append existing FARs to make CRM mandatory, the FAA chose another path. (Of interest here is that in SFAR 58, the FAA decided, after many years, to make CRM mandatory in AQP training). To return to the situation at hand, the FAA, in order to make CRM training costs palatable to air carriers, offered to waive some hours of pilot recurrency training in lieu of CRM training. As training is a “big buck” item for air carriers, this allowed the carriers to save (not spend additional funds) and give CRM. It also allowed the FAA to ensure the CRM training would, to a great extent, be given, thus silencing some critics who, understandably, wanted new HF training to counter the rash of accidents. However, there was a sticking point: jeopardy. Simply put: As a further inducement to carriers to give CRM, the FAA and ALPA agreed that CRM would be “nojeopardy” training. When CRM skills were evaluated in a LOFT, neither pilot nor crew could fail be given a “down,” which requires additional training and checking. Therefore, the evaluation of a LOFT such as the CRM, consisted of videotaping the

122

Human Factors in Simulation and Training

LOFT and a critique/debrief given to the crew upon completion of the LOFT. The videotape is then erased. Our view is that LOFTs built around the MPM and ODM must be evaluated as a jeopardy LOFT session. The MPM has technical markers that encompasses actions and skills normally evaluated in a check ride (flight simulator or actual flight) and, which can be failed. These include attitude management, course deviation, power management, etc. In short, the full panoply of flight skills that are usually evaluated in sessions/flights can be included, so it would seem that failure should be an option. Added to this, we would have LOFT scenarios that enable ODM, and both pilot and crew must be allowed to succeed or fail. We have postulated ODM as the primary tool for training pilots and crew to do risk identification and management—and that risk identification and management are the primary functions of a flight crew. The common fallacies about decisionmaking were discussed early in the chapter. We add this: Author Captain Smith functioned as a “line check airman” during the late 1990s. He was checking fairly senior pilots transitioning to a large (250-plus passenger capacity), highly automated aircraft. He was saddened (and both angry and frightened) to see the number of times pilots either did not recognize a decision point in time to stay ahead of the power curve; did not recognize a decision and action point at all; and made poor decisions—decisions which raised the risk for having a safe and successful flight. In conversations, the authors became even more convinced of the absolute need for ODM. What LOFT not only lends itself to evaluation but is actually designed for evaluation? A LOFT that uses event sets with embedded decision points carefully designed to force decisions on a risk continuum, and a LOFT that is designed with event sets partitioned into the MPM matrix, thus showing both CRM and flight skills. The evaluations of these carry with them the possibility of failure, without which they are meaningless.

LOF: DOM and MRM In designing and developing LOFT scenarios, the basic unit, as proposed in 1993 (viz., Hamman, Smith, Lofaro, and Seamster, 1992) is the event set. The LOFT scenario is a set of event sets selected from real world ops reports or made from amalgams of events and incidents as reported, or actually put together from the experiences of the LOFT design team. (An aside: NTSB accident reports can also be used.) It should be clear that superior LOFT design is a team effort, and the team should be carefully selected. The LOFT design team must consist of senior pilots with extensive flight time in the aircraft type the LOFT scenario is being created for. These pilots should also have experience in the air carriers’ training complex. It is also somewhat desirable to have a person from the training department with ISD credentials, as will be shown later. One LOFT design team member should be a flight simulator operator to ensure that the event sets selected can be replicated in the flight simulator. Finally, it is truly preferable if one or more LOFT design team members have been line check airmen.

Civil Aviation

123

Having selected the team, the next step would be to layout an overview of the LOFT mission/ flight leg(s). This overview would include the basics, such as Wx, time-of the year ops (i.e., winter), and departure and destination airports, as well as alternatives. Into this skeletal framework, the team will select the event sets for each phase of flight (takeoff, cruise, descent, landing) as well as any pre-takeoff event sets that may impact the flight leg. The next steps are the crucial ones: Carefully select the problems that you want the flight crew to encounter: mechanical, system malfunctions, etc. Then, plan the sequence into which you want to embed the problems. Remember that the overall goal is not to create the fabled “LOFT from hell”—one which cannot be successfully flown but must result in a loss of flight control. In both the sequencing of the vent sets and the selection of the problems to be embedded in the LOFT, the MPM and the ODM are be used as the structural underpinnings. It is done in this manner:

1. Upon selection of the problems and the phases of flight that these problems are to occur in, the ODM is used to build a sequence that results in a rising risk. The decision points are identified. (A “decision point” is a point in the flight where if no decision and resultant action is taken, or if a wrong decision is made, the risk rises from low to moderate or from moderate to high.) Upon identifying the decision points, the basic sequence is modified to add the various outcomes from no-decision/wrong decision; that is to say, the sequence now contains branches that are dependent upon the decisions made/unmade. Each branch or node will also need have any changes in conditions and systems (again; Wx, en route or at destination, systems malfunctions, etc.) built in. 2. Next, the MPM is integrated with ODM The concept here is to make the consequences of following a no-decision/wrong decision model such that the risk continues to use until it reaches the high level and crew action must be taken in order to regain any possibility of successful flight completion. The “successful completion” may evolve ATB or diversion to the/an alternate airfield where “success” simply means landing the plane. This integration is a two-step process. The first involves taking the selected event sets and identifying the critical tasks that are to be performed during those times. These high-level critical tasks (e.g., V1 “cut” on takeoff) are then decomposed, using the ISD process, into the complete list of subtasks involved. The MPM is then used to further identify which of the critical tasks track to which of the both the relevant CRM and lift control functions necessary for successful task performance. The set of MPM functions will organize, sequence, distribute, and coordinate the actions key to successful performance. Looked at another way: On the V1 “cut” at takeoff, as an exemplar, we find that the needed CRM function is workload management. The MPM, with ISD decomposition, will spin out the specifics of the critical actions and flight control skills embedded in the workload function. The flight control tasks for this example include

124

Human Factors in Simulation and Training

propulsion/lift/drag, operational integrity, and altitude control—with such subtasks as disconnecting the auto throttle at 400’ AGL, setting airspeed to xyz knots, checking flap setting, and so on. The MPM well also spool out the crew performance markers for each subtask—both the CRM and flight technical markers. Not only that, but the functions and actions of both the pilot flying (PF) and the pilot not flying (PNF) will be clearly spelled out. These, as said, will be spelled out at the subtask level. In fact, this is true CRM integration—the place where both CRM and flight control actions are presented as a unified whole. However, space and scope preclude further explication, and there is a CRM integration document (Lofaro, 1992). As is also clear, because the necessary performances are specified, that the performance markers can be used not only to track the crew’s actions but, if desired, to evaluate them. This evaluation can be done simultaneously using a flight simulator operator and a check airman, or done post hoc, using the videotapes that are normally part of LOFT sessions. Again, scope precludes going further into the evaluation area. In summary, we see that the LOFT design has been driven by using the ODM to do initial event set selection and sequence design. Then, the MPM was used to generate the task and subtask breakouts for selected events within the event sets. The MPM further sequenced the events selected (as an example, CRM and flight control integration was performed at the subtask level, with optional evaluation procedures). The LOFT design can be seen as embedding event sets into the LOFT scenario that can take the crew and plane into the moderate and even the high risk areas of the rising risk continuum, thusly: 1. If the conditions causing the risks are either not identified or their interactions are not recognized. 2. If the decision points are either missed or result in an incorrect decision(s). 3. As a result of 1 and or 2, no actions are taken or incorrect actions are taken. So far, we have shown the LOFT design process as one where the event sets, as well as initial and changing conditions, are used to generate decision points. The decision points, if missed or responded to incorrectly, cause a rise in the mission risk. The MPM is overlaid to give a level of de-tail whereby an analysis will determine where the errors were made: in flight control, in CRM, or in CRM and flight control. The MPM also offers an evaluative framework. However, LOFT design could be done in the other direction.

LOFT Design: Another Approach We have discussed selecting event sets that have built-in, as it were, decision points. Put another way, events/event sets can be selected that require decisions (and actions) to prevent risk from rising—to prevent the aircraft’s position in the ops envelope from approaching a corner or a boundary. As an example, the event set could include deteriorating Wx en route or at the destination airport, perhaps with braking advisories or crosswinds on approach/landing. From there, the MPM would be used to

Civil Aviation

125

develop the flight crew tasks and functions for PF and PNF. An initial bifurcation could be made, one path of event sets following the correct identification of rising risk and attendant risk reduction actions; the second path based on nonidentification of rising risk. Now, there would also be subpaths, for example, showing the correct identification of rising risk but incorrect response(s). The branching process can be repeated as needed. Thus, the ODM is the driver and the MPM is the method used to develop functionality. However, LOFT can be developed in a different way, still using the MPM and ODM. A series of event sets (based on incident reports, “hangar talk,” experiences, etc.) can be selected, linked, and the PF/PNF functions identified. These event sets we will term “expanded event sets” or “fully articulated event sets.” By analyzing these sets, the decisional points can be identified. In fact, “identified” is not the exact term; “selected” is more appropriate. This is because it seems clear that, in any flight, new conditions or changes in conditions (Wx, flight system problems, etc.) will result in changes in the aircraft’s position both in the ops envelope and on the rising risk continuum. With the initial set of fully articulated functions and actions developed via the MPM, changes are introduced using the ODM’s boundary conditions as guidelines. That is to say, an initial set of boundary conditions will be specified and used as a basis for carefully selecting changes to them that result, if left unidentified and/ or unchecked, in additive or cumulative interaction, such interactions driving the aircraft toward a corner or side in the ops envelope. Of course, this means that the risk has risen to moderate or high-moderate—even to high. The changes to the boundary conditions should be introduced at different points in flight so that the risk does not rise suddenly. The rationale here is that one goal of the LOFT is to keep situation awareness high by the introduction of ongoing series of changes, rather than a compressed set of events that lead to an immediate abnormal ops or emergency, with limited options for the flight crew. If the changes in the boundary conditions are introduced over the first 1–1½ h of the LOFT session, their additive and/or cumulative interactions and impact will be sequences so that the flight crew’s ODM skills are tested. Thus, ODM skills are tested rather than skills at handling an overt and immediately apparent abnormal or emergency situation, which are often trained in other venues. An aside: This is not to say that missed decision points as well as incorrect decisions and actions may not lead to an abnormal or emergency situation. If that occurs, then the LOFT can also demonstrate flight crew skills in the emergency arena. However, as said, the training, to include recurrent or special-item training of flight crews does provide for certain emergency training. As one example, recently some air carriers have instituted upset recovery training (recovering the aircraft from unusual or abnormal attitudes). To resume: By carefully introducing boundary conditions changes into the event sets, the risk can be caused to rise from additive or cumulative interactions. As before, when we indicated how to use the ODM to MPM LOFT design methodology, there will be a branching effect, contingent on decisions (made, unmade, correct, incorrect) and resultant actions (taken, not taken). The LOFT scenario must be designed

126

Human Factors in Simulation and Training

to include the various pathways, so that the flight simulator can be pre-programmed for the contingencies. It would seem that, optimally, the OPM to MPM and the MPM to ODM methods would operate simultaneously or in an intertwined manner. It may be fairly said that the use of the MPM and the ODM is actually a necessary and sufficient condition of effective LOFT design At this point in time, we believe that we have presented the ODM in sufficient detail and with useful examples. The same can be said for the MPM. We have given references for the reader who wants more information and exposition of either model. We have presented the framework for developing ODM/MPM-based LOFT scenario(s). The evaluation of the flight crew in the LOFT training session has been discussed. It is of import to now clearly state that neither the ODM, the MPM, nor any LOFT developed using them need have an evaluative aspect. Further, if evaluation is to be an aspect of the LOFT session, it need not be a jeopardy situation. However, we still hold to our original view that LOFT should have a jeopardy component.

Training Although we have not mentioned or emphasized the training aspects of the MPM or the ODM, it is clear that here are necessary training considerations for both. However, the MPM needs little, if any, training in terms of flight crew. The reason is that the CRM components are already included as part of either initial or recurrent training. The flight-control maneuvers components are all included in flight training/ type training—and many of the flight control aspects are used in recurrency training. Additionally, these flight control tasks/subtasks are all part of the handbooks used by pilots for each type of aircraft. Put another way, from the task through the subtask level of flight control, pilots are familiar with and have been trained in all of it. Of more importance, the performance of these flight control tasks, and the FAA and carrier standards to which they must be performed, are already known to the flight crew—they have learned and been tested in them on the ground and been evaluated on their ability to perform the standard in the air. Therefore, for any critical task decomposition used in a LOFT, the flight crew is well aware of the subtasks required to perform the task. Where, then, is any training needed for understanding and use of the MPM? It would seem that a single presentation and explanation of the MPM would suffice. There are the CRM and flight control behavior marks to consider. However, these are only of concern if the LOFT is to be evaluated for “jeopardy.” If not, the markers and scoring scales can be distributed and explained; this process could be incorporated into the presentation and explanation of the MPM. Two hours would suffice. Such is not the case with the ODM. This model would require dedicated training time. Again, there is a “however”: The boundary conditions are well known to pilots. Although it is true that the flight crew may never have seen the way ODM structures the boundaries, no training time is really needed for that aspect. The rising risk continuum, the concepts of interaction among boundary conditions/functions (with

Civil Aviation

127

resultants that can exceed the impact of the single factors involved)—all this can be easily trained in a 2–4-hour class, with pencil and paper exercises. At this point in time, the optimum use of the ODM for safely would be to automate some or all of it, and make it a call-up part of a display. Perhaps the best concept would be to have the display come up when two or more boundary conditions, from either the same boundary side or contiguous sides, had become active. Such an endeavor, or any discussion of it, is far beyond the scope of this chapter.

REFERENCES Burki-Cohen, J. and Go, T. H. et al. 2003. Simulator fidelity requirements for airline pilots training and evaluation continued: An update on motion requirements research. Proceedings of the Twelfth Annual International Symposium on Aviation Psychology, Dayton, OH. Ewell, C. D. and Chidester, T. 1996. American airlines converts CRM in favor of human factors and safety training, the flightdeck, July/August, 1996. Flight Department, American Airlines; DFW Airport. Aviation Week and Space Technology article, September 6, 1996, p. 15. Flexman, R. H. and Stark, E. A. 1987. Training simulators. In Handbook of Human Factors G. Salvendy, Ed. New York: John Wiley & Sons, pp. 1012–1037. Go, T. H. and Burki-Cohen, J. et al. 2003. The effects of enhanced hexapod motion on airline pilot recurrent training and evaluation, AIAA-2003-5678. Hamman, W. R., Seamster, T. L., Lofaro, R. J. and Smith, K. M. 1992. The future of LOFT scenario design and validation. Proceedings of the Seventh International Symposium of Aviation Psychologists. R. S. Jensen, Ed. Columbus, OH. Keeney, R. L. and Raiffa, H. 1976. Decision with Multiple Objectives: Preferences and Value Tradeoffs. Hoboken, NJ: John Wiley & Sons. Lofaro, R. J. and Smith, K. M. 2003. The finalized paradigm for operational decision-making (ODM) paradigm: Components and placement. Proceedings of the 12th International Symposium on Aviation Psychology. Dayton, OH. Lofaro, R. J. and Smith, K. M. 2001. Operational decision making: Integrating new concepts into the paradigm. Proceedings of Eleventh International Symposium on Aviation Psychology. R. S. Jensen, Ed. Columbus, OH. Lofaro, R. J. and Smith, K. M. 2001. A Paradigm for developing operational decision-making (ODM). Proceedings of 2001 SAE World Aviation Congress (WAC) Conference. Various articles in Aviation Week and Space Technology, Vol. 15, No. 3, July 17, 2000, pp. 58–63. Lofaro, R. J. and Smith, K. M. 1999. Operational decision-making (ODM) and risk management: Rising risk, the critical mission factors and training. Proceedings of the Tenth International Symposium of Aviation Psychologists. R. S. Jensen, Ed. Columbus, OH. Lofaro, R. J. and Smith, K. A. 1998. Rising risk? Rising safety? The millennium and air travel special issue of the Transportation Law Journal Vol. 25, No. 2, University of Denver Press, Denver, CO. Lofaro, R. J. and Smith, K. M, 1993. The role of LOFT in CRM integration. Proceedings of the Seventh International Symposium of Aviation Psychologists. R. S. Jensen, Ed. Columbus, OH. Lofaro, R. J., Adams, R. J. and C. N. 1992. Workshop on Aeronautical Decision-Making (ADM) DOT/FAA/ RD-92/14; Vol. I, II. National Technical Information Service: Springfield, VA. 22161.

128

Human Factors in Simulation and Training

Maurino, D. 1999. Crew resource management: A time for reflection. Chapter in Handbook of Aviation Human Factors. Daniel Garland, John Wise, and David V. Hopkin, Eds. National Transportation Safety Board. 2007. Aviation Accident Report AAR-07/06. Retrieved January 3, 2015, from http://www​.ntsb​.gov​/investigations​/AccidentReports​/ Reports​/ AAR0706​.pdf Smith, K. M and Hastie, R. 1992. Airworthiness as a design strategy. Proceedings of the Flight Simulator Air Safety Symposium, San Diego, CA. Sprogis, H. 1997. Is the aviation industry expressing CRM failure? Proceedings of Ninth International Symposium on Aviation Psychology. J. Rakovan and R. S. Jensen, Eds. Columbus, OH. Sternberg, R. J. 1985. Implicit theories of intelligence, creativity, and wisdom. Journal of Personality and Social Psychology, Vol. 49, No. 3, 607–627. https://doi​.org​/10​.1037​ /0022​-3514​.49​.3​.607

FEDERAL AVIATION ADMINISTRATION ADVISORY CIRCULARS (AC) AND REGULATIONS AC 120-35B: Line Operation Simulations (9/6/91) AC 120-40B: Airplane Simulator Qualification (7/29/91) AC 120-45A: Airplane Flight Training Device Qualification (2/5/92) AC 120-AQP: Advanced Qualification Program (8/9/91). Note: This AC has been revised and will be reissued in early 2004. SFAR 58: Advanced Qualification Program (5/29/03). Note: This is an extension of the original SFAR of 1991)

4

Integrating Effective Training and Research Objectives Lessons from the Black Skies Series of Exercises Christopher Best, Gregory Funke, Winston Bennett, Michael Tolston, Simon Hosking, and Robert Bolia

CONTENTS Introduction............................................................................................................. 129 The Black Skies Exercises...................................................................................... 130 Outcomes for Military Operators............................................................................ 131 Performance Benefits and Transfer................................................................ 131 Auxiliary Benefits.......................................................................................... 133 Research on Team Coordination Dynamics............................................................ 134 Communication Dynamics............................................................................ 135 Physiological Dynamics................................................................................ 136 Combining Communication with Physiological Data................................... 138 Operator Acceptance of Physiological Monitoring................................................. 140 Evaluation of Training Capability........................................................................... 142 Summary and Conclusion....................................................................................... 146 Acknowledgements and Dedication........................................................................ 146 References............................................................................................................... 147

INTRODUCTION At the height of the COVID-19 pandemic in 2020, the Royal Australian Air Force (RAAF) cancelled its premier live air-combat exercise, known as Exercise Pitch Black. Despite this, a subset of the operators who would have taken part in that exercise were still able to undertake high-end training as part of a synthetic counterpart activity, known as Exercise Virtual Pitch Black. This synthetic exercise brought together air mobility and fast-jet elements with airborne and ground-based

DOI: 10.1201/9781003401353-4

129

130

Human Factors in Simulation and Training

command-and-control via distributed simulation to conduct integrated planning and execution of complex mission scenarios (Hartigan, 2020). The ability to undertake complex training despite the cancellation of the live exercise represented a significant benefit arising from over a decade of research, development, and investment by RAAF, Australia’s Defence Science and Technology Group (DSTG) and their international partners in the US Air Force Research Laboratory (AFRL) into the tools and methods of synthetic training. In this chapter, we describe the programme of research and development that laid the foundations for Virtual Pitch Black. In particular, we focus on a key component of that programme; the Black Skies series of research exercises (e.g., Shanahan, Best, Finch, Tracey, Vince, Hasenbosch, & Stott, 2009; Stephens, Crone, Temby, Best, & Simpkin, 2011; Best, Jia, & Simpkin, 2013; Francis, Best, & Yildiz, 2016). The Black Skies exercises provided a means by which RAAF operators could learn to work more effectively with others to achieve mission objectives and by which the RAAF as an organization could learn how to build and employ distributed simulation systems to augment live training. In addition, these exercises provided a rich and ecologically valid environment within which DSTG and AFRL scientists could undertake research to inform future capability development. These exercises led to a greater understanding of team learning and performance in operationally realistic settings, to the establishment of new training capability for the RAAF, and to the identification of requirements for future training capability development. The experience of the Black Skies series demonstrated that scientific and operational training objectives need not be mutually exclusive and provided a model for how the judicious integration of these objectives could serve to drive innovation in training.

THE BLACK SKIES EXERCISES Once every two years, between 2008 and 2016, the Aerospace Division of DSTG hosted Exercise Black Skies (EBS) in the weeks preceding Exercise Pitch Black (PB); the latter of which is a biennial, multinational, live-flying exercise conducted over Australia’s Northern Territory. EBS had three overarching objectives, which were captured in the motto: “Prepare, Evaluate, Demonstrate”. The first objective of EBS was to provide an opportunity for RAAF participants to prepare for PB. This was achieved by replicating many of the characteristics of PB during simulated EBS missions (e.g., airspace, order of battle, mission types, unit role assignments). The second objective was to use the realistic EBS mission scenarios to evaluate the impact of the tools and methods of synthetic training in an ecologically valid way. To achieve this, EBS incorporated investigations into various aspects of team performance as well as investigations of the transfer of performance benefits from the simulators to live PB missions. The third objective was to demonstrate the potential of synthetic training for air combat to RAAF. This was achieved partly by hosting visits by senior leaders and decision-makers responsible for RAAF training and capability development to observe the exercise and partly by leveraging the outcomes from

Integrating Effective Training and Research Objectives

131

training transfer evaluations to communicate the real-world impact of the exercise on operator performance. Each iteration of EBS ran for five consecutive days, with briefings and familiarization on the first day, followed by four days of mission scenarios. Mission scenarios generally lasted for around 90 minutes. Each mission was preceded by planning and briefing sessions and followed by facilitated after-action reviews (AARs). Simulation systems for EBS consisted of research simulators which, though high in functional fidelity (i.e., the simulator behaved in a manner that faithfully replicated operational systems) were generally of only moderate physical fidelity (i.e., the simulator hardware differed in some ways from that of operational equipment). Specific attention was paid to ensuring that the EBS training environment was relatively high in psychological fidelity (Kozlowski & DeShon, 2004) by ensuring that scenarios, work processes, and team structures were representative of the operational environment. An important contributor to this was the exercise planning process, the beginning of which typically coincided with the Australian International Airshow, over 12 months prior to each iteration of EBS. Planning for EBS was centred on a series of three conferences, which brought together a diverse range of stakeholders, including representatives from the participating units, scientists and engineers from DSTG and their partner organizations, as well as contracted support staff including professional military training subject-matter experts (see McIntyre & Smith, 2013, for a description of the importance of this role) and software engineers. The goals of these conferences were to build shared understanding of the training objectives of the participating units and the research objectives of the scientists and engineering staff and to work collaboratively towards a design and narrative structure for the exercise scenarios that enabled both sets of objectives to be achieved to the greatest extent possible. Over the course of the EBS series, military participants included both airborne and ground-based air-battle managers (ABMs), joint terminal attack controllers (JTACs), and fast-jet aircrew. The exercises delivered measurable performance benefits as well as valuable auxiliary benefits for these military participants. In addition, EBS delivered research outcomes to help inform and de-risk the development of future training capability. In the following sections, we describe how the activity led to positive outcomes of both kinds.

OUTCOMES FOR MILITARY OPERATORS The utility of EBS for the military participants can be understood in terms of the positive impact the exercise had on participant performance and in terms of a number of auxiliary benefits that were observed. These two categories of outcomes are considered in turn below.

Performance Benefits and Transfer Despite the fact that EBS was led by researchers and conducted within research facilities, the activity consistently delivered measurable performance benefits for the

132

Human Factors in Simulation and Training

operators who took part; both in terms of their performance during the simulated exercise itself and during the subsequent live exercise. Performance in EBS was operationalized as expert assessor ratings against two sets of criteria. The first was a set of behaviourally anchored rating scales relating to the teamwork dimensions of Communication, Information Exchange, Leadership/Initiative, and Supporting Behaviour (Smith-Jentsch, Zeisig, Acton, & McPherson, 1998). The second was a set of role-specific mission objectives. During EBS missions, subject-matter expert assessors rated the performance of the participating teams against both sets of criteria. After EBS, follow-up evaluations of the performance of a subset of the participating teams were conducted during the live exercise PB. To provide a basis for comparison, observers at the live exercise also evaluated the performance of similarly experienced “control” teams who had not taken part in EBS. Difference scores were calculated to capture change in ratings of mission effectiveness and team coordination behaviours between the beginning and end of EBS and between the EBS teams and the control teams during live PB missions. These differences were then expressed in terms of percentage of scale maximum, with positive values indicating performance improvements and negative values indicating performance decrements. These scores are summarized in Figure 4.1. The first column of Figure 4.1 shows the mean percentage difference of team coordination for all teams (n = 8) across the EBS series (error bars represent the standard deviation). The data show that there was an average increase in performance of around 20% on team coordination processes from the first to the last Black Skies

FIGURE 4.1  Mean difference scores for Black Skies (synthetic) and Pitch Black (live) expressed as a percentage of scale maximum. Note: nEBS = number of EBS teams; nPB = number of PB teams.

Integrating Effective Training and Research Objectives

133

mission. The second column of Figure 4.1 shows the mean percentage difference of team mission effectiveness across all teams in the EBS series. These data show that there was an increase in performance, on average, of around 10% on mission effectiveness from the first to the last Black Skies mission. The third and fourth columns of Figure 4.1 show the mean percentage difference score on team coordination and mission effectiveness between the EBS teams and the matched control teams across all observed live PB missions (n = 4). These data show that on average the teams that participated in EBS outperformed the control teams that did not by around 20% of scale maximum on team coordination and around 10% of scale maximum on mission effectiveness. These data make it clear that participating teams benefited from EBS in terms of improvements in performance and that these improvements persisted during subsequent missions undertaken in the live environment.

Auxiliary Benefits As well as the consistent performance benefits described above, EBS provided a number of auxiliary benefits for participants in the form of opportunities for augmenting standard training, refining team processes, and making the most of the subsequent live exercise. One such outcome was that the teams who participated in EBS typically departed the exercise with updated plans for how they would execute their missions during the subsequent live exercise. By providing an environment within which coordination processes could be worked out and plans for the live exercise could be tested and refined, EBS added value to the live training experience. Evidence of this benefit was reflected both in the feedback of participants and that of their senior leaders. For example, free-text feedback obtained from EBS participants at the conclusion of the exercise included comments such as: “We have refined our team dynamics and bonded during this exercise”, “I found being able to learn from mistakes through actively applying fixes invaluable”, and “The ability to try and test different ways of tackling problems provided good learning for all”. These sentiments were reinforced during one iteration of PB, when a senior Air Force officer remarked that the performance of a team that had recently participated in EBS represented “the best Day 1 of Pitch Black” he had ever seen. In addition to providing participating teams with the ability to test and refine their own plans and processes, observations during EBS often led to suggested changes being fed back into planning for the live exercise as a whole. These suggestions typically related to characteristics such as airspace boundaries, the placement of tanker or Airborne-Early Warning (AEW) orbits or combat-air-patrol (CAP) points or to the timing of scenario events. Because of this feedback, it is likely that EBS had a positive impact on the quality of the live training experience for many more operators than just those who participated in EBS itself. Large, complex exercises such as PB are often used as the context for conducting operator performance assessments that contribute to the award of advanced qualifications. On their own initiative during the third iteration of EBS, participating units began conducting such observations during the simulated missions of EBS. This was

134

Human Factors in Simulation and Training

achieved by having the operators who were under assessment observed in a one-onone fashion by a qualified trainer during their missions (a practice that is referred to as “back-seating”). This represented yet another practical benefit for the operators and units that were involved in EBS and also provided the research team with increased confidence in the ecological validity of the EBS research environment. A final practical benefit for the operators was observed during EBS 2016, when DSTG and AFRL researchers worked together to develop and provide advanced, semi-automated performance assessment and after-action review (AAR) systems for use by assessors during the exercise. On three occasions during that exercise, assessors were alerted by the prototype systems to events within the scenario that provided important learning opportunities and that they subsequently chose to focus on during the post-mission debrief, but which they stated they would have missed were it not for the automated alerts. This outcome provided good evidence for the utility of such systems for supporting complex training. But of more practical significance for the participating operators, it served to ensure that some important lessons that would not have been captured in other circumstances were identified, communicated, and learnt. When combined with the performance benefits described earlier, it is clear that EBS delivered valuable outcomes for the military operators who took part. In addition to that, EBS served as a rich and ecologically valid context for undertaking research on training and team performance. In the sections that follow, we highlight some of the outcomes from that research and describe how they will inform the development of future training capability.

RESEARCH ON TEAM COORDINATION DYNAMICS One of the key lessons learnt during the EBS series was that it takes a significant amount of effort on the part of a large number of expert staff to plan and execute complex training events using current tools and methods. There is therefore great potential benefit to be gained from research into technologies that reduce the time and the number of expert staff required to plan and generate exercise scenarios, adapt training experiences to trainee needs, observe and evaluate trainee performance, and provide feedback that leads to measurable performance improvements. A primary goal of EBS was to develop and evaluate novel tools and techniques to address these needs. The focus of this effort was on near-real-time operator and team state monitoring; and in particular, non-linear methods for assessing changes in team state by analyzing dynamic patterns in data (Riley & Van Orden, 2005). From the dynamical systems perspective, teams are continuously evolving and highly interdependent sets of nested components whose dynamics are shaped by fluid constraints that couple individuals and result in similarities in their multimodal responses (Gorman, Dunbar, Grimm, & Gipson, 2017). Put another way, observed patterns in physiological responses like heart rate, and behaviours like communication, from teams interacting to resolve task demands form dynamical trajectories that are a confluence of constraints, both internal and external. The influence of these constraints, which manifest as restrictions on degrees of freedom arising from

Integrating Effective Training and Research Objectives

135

interactions in the form of dynamic dependencies, confine the temporal evolution – the dynamics – of the system and are reflected as changes in individual- and teamlevel physiological responses as well as overt behaviour. For example, during the complex scenarios often observed in military settings, teams must cycle between regular, practised behaviours and unplanned, adaptive responses to unforeseen perturbations (Ishak & Ballard, 2011). Regular behaviours occur when executing procedures that are well trained and integrated into team functioning, such as a leader’s pushing information about objectives and progress during typical mission phases like marshalling and check-in, in the air-battle management domain. These behaviours tend to be relatively highly patterned. In this case, team dynamics are stable and predictable, and thus have a low entropy (a measure of complexity and predictability; Pincus & Singer, 1996; Richman & Moorman, 2000). This situation can be contrasted with adaptive responses to unforeseen events in which typical patterns may break down. For instance, during an aircraft emergency (e.g., an engine failure), the amount of communication traffic can quickly increase and unused communication pathways may be employed to convey the context-specific information needed to resolve an evolving set of problems. In this case, team dynamics are unplanned and emergent, which means they can fluctuate to provide local variation to meet global task demands (c.f., Gorman et al., 2017). These fluctuations are breakdowns in constraints that restrict the complexity of the system in order to meet changing objectives. This reduction in constraints allows spontaneous reorganization of system dynamics to meet ongoing task demands that increases adaptability and leads to less stability and predictability, thus resulting in higher degrees of entropy (Stephen, Boncoddo, Magnuson, & Dixon, 2009). In sum, the responses of teams to changing task demands and unforeseen events are expected to lead to measurable changes in the complexity of multimodal physiological and behavioural data collected from the teams, which in turn can be used to identify learning opportunities. In EBS, we sought to leverage unobtrusive team-level responses in physiological and behavioural data to assess changes in team state with the goals of identifying potential learning opportunities and improving after-action reviews. We used several types of non-linear analyses to assess the temporal complexity of time series data observed from teams of ABMs, including complexity analysis using sample entropy (Richman & Moorman, 2000; Strang et al., 2012) and measures of dynamic stability from recurrence quantification analysis (RQA; Marwan, Romano, Thiel, & Kurths, 2007; Webber & Zbilut, 1994). In the following sections, we highlight results from our efforts in terms of communication and physiological signals.

Communication Dynamics Team communication is a direct indicator of team cognition and reflects the manner in which teams are organized (Cooke, Gorman, Myers, & Duran, 2013). Importantly, team communication, using categorical time series indicating who is speaking and on what channel has been shown to be sensitive to cognitive workload in teams performing an ABM simulation (Strang et al., 2012). In EBS 2014, a prototype tool

136

Human Factors in Simulation and Training

called the “Dynamic, Real-time Analysis of Distributed Interactive Simulation packets tool” (DRADIS) was debuted to analyse real-time communication patterns in categorical time series data for change points (i.e., times during which the parameters that specify the mean and variance in the data change) without disrupting normal team communication (see Figure 4.2). DRADIS analyzed the temporal complexity of voice transmissions sent by ABMs during EBS missions in real time and sent alerts via a visual interface whenever entropy fell outside of 90% bootstrapped confidence intervals. The results were used to support subject-matter expert evaluation of team functioning during after-action reviews and users reported that DRADIS provided valuable information and insight into team dynamics (Dukes, Funke, Strang, & Best, 2015).​

Physiological Dynamics In addition to communications, patterns of physiological data are also informative of team state and dynamics. For instance, changes in heart rate can reflect degrees of excitement and investment in a task (Wright & Gendolla, 2012) and similarity in the patterning of changes in heart rate is related to mutual investment in shared

FIGURE 4.2  DRADIS display, showing different computed complexities (sample entropies), number of utterances, and summary from a 30-minute moving window.

Integrating Effective Training and Research Objectives

137

FIGURE 4.3  Heart rates from all air-battle managers (ABMs; left) are simultaneously assessed for dynamic patterns using multivariate recurrence quantification analysis (MVRQA). %Determinism, an MVRQA metric related to the degree of randomness in the multivariate system, is then indexed for sudden changes using a change-point detection algorithm. A large change point can be seen immediately around the planned training event (the “inject”). %Determinism is also clearly higher for the team of ABMs than it is for the average of surrogate systems.

outcomes and trust between individuals (Konvalinka et al., 2011; Mitkidis, McGraw, Roepstorff, & Wallot, 2015; Tolston et al., 2018). Heart rate is also associated to different emotional states (Cacioppo, Bernston, Larson, Poehlmann, & Ito, 2000) and with cognitive workload (Charles & Nixon, 2019). Nevertheless, mapping heart rate to task performance outcomes can be difficult, even in the best-case scenario of a single participant in a tightly controlled experiment. With large teams of military operators performing in highly complex environments, calculating useful heart rate metrics becomes extremely challenging. To address this, we sought to identify metrics of team-level cohesion using multivariate recurrence quantification analysis (MVRQA, a non-linear tool for assessing structure in complex signals; Cao, Mees, & Judd, 1998; Proulx, Côté, & Parrott, 2009; Wallot, Roepstorff, & Mønster, 2016). MVRQA considers the heart rates of all team members as a single system and thus provides metrics of complexity and stability of dynamics of the team as a whole. Previous work has shown that MVRQA can be used to assess team physiological–behavioural coupling (Tolston et al., 2018) which in turn has been shown to index cohesion, workload, and stress (e.g., Strang, Funke, Russell, Dukes, & Middendorf, 2014). In our analyses of the data from EBS 2014, we investigated whether MVRQA can scale to large teams (i.e., teams composed of up to at least 13 members) in highly complex situations (Tolston, Best, Funke, Menke, & Dukes, 2016). Thirteen RAAF military members took part in six complex, air-combat missions in a DSTG simulation facility in Melbourne, Australia over the course of four days. Each mission featured an unexpected event – called a “scenario inject” – to test team effectiveness. Heartrate data collected from team members were simultaneously assessed for evidence of physiological coupling using MVRQA within each mission. MVRQA measures were evaluated against synthesized surrogate signals for greater-than-chance level coupling; each surrogate signal was generated to have a fractal structure similar

138

Human Factors in Simulation and Training

to that of the heart rate of an individual during the mission in question (c.f., Strang et al., 2014). MVRQA measures were evaluated for sudden changes in estimated distribution parameters using an online Bayesian change point (BCP) detection algorithm (Adams & MacKay, 2007) and distances from the beginning of the inject to the nearest estimated change point were compared to chance levels. The results showed that the mean value MVRQA %Determinism – a measure that is inverse to the degree of randomness in data – of real teams was significantly higher compared to that of surrogate signals, meaning that there were dynamical dependencies in the data that were revealed by MVRQA. Additionally, following initiation of the scenario inject, BCP analyses of %Determinism detected a significant variation in coupling between team members with an average time of less than two minutes, a value significantly lower than values obtained from surrogate analyses. This means that MVRQA of team-level heart rate was linked to environmental factors. Importantly, the number of change points did not differ between teams and surrogates, meaning that the surrogate analyses succeeded in replicating the degree of non-stationarity in the data, thereby providing supporting evidence that the distribution of change points in the team data was linked temporally to the onset of the perturbations. These results show that interactions and dependencies of large teams in complex tasks can be meaningfully summarized by sensitive, low-dimensional values: A single metric indexing the degree of physiological coupling between up to 13 teammates showed sensitivity to disruptive mission events. This provides evidence for the utility of MVRQA in the assessment of coupling in even a large number of signals and demonstrates the applied potential of this technique for supporting team performance assessment and monitoring changes in team coordination processes in realistic settings.

Combining Communication with Physiological Data The analyses described above demonstrated that MVRQA can scale to large teams (i.e., teams of up to 13 members) in highly complex situations (Tolston et al., 2016) and that interactions and dependencies of large teams in complex tasks can be meaningfully summarized by sensitive, low-dimensional values. They also showed that changes in team communication patterns in complex training environments vary systematically around perturbations (Dukes et al., 2015; c.f., Wiltshire, Butner, & Fiore, 2018). Together, these outcomes highlight the potential utility of univariate signals for summarizing the complex dynamics of physiological and behavioural responses of military teams and thereby detecting meaningful changes in team coordination dynamics. However, since these metrics relied on summary statistics of the entire system separated by modality, the approaches taken did not provide diagnostic information regarding which subcomponents of the multivariate system were most responsive to events (i.e., which signals and team members were most responsive to changes in task demands). In more recent efforts, we have combined non-linear heart rate variability metrics from individuals and the whole team with communication patterns into a single

Integrating Effective Training and Research Objectives

139

feature space and conducted multivariate analyses to determine which signals are likely to be most responsive to unanticipated events. As with the univariate analyses described above, we expected that there would be meaningful patterns that emerge in multivariate characterizations of team responses that map on to critical training events. Further, we expected that multivariate analyses would lend insight into how the teams adapted to those events. Our analyses showed that multivariate change-point analysis of MVRQA of physiological data and communication complexity in data from large teams of ABMs in complex scenarios can uncover patterns that vary in regular ways around critical mission events (Tolston et al., 2019). Importantly, the multivariate approach allowed detailed diagnostics of team responses to particular mission events. To follow up on this analysis, we are currently evaluating how topological data analysis (TDA, a powerful way of identifying patterns in data; Carlsson, 2009; Singh, Mémoli, & Carlsson, 2007) can be used to assess multivariate team data to identify regular patterns in team coordination and physiological states. Our preliminary findings support the proposition that teams enter into stable trajectories that correspond to dynamical attractors and that teams under perturbation re-organize to process the perturbation and return to stable attractors, which forms cycles in the data (Tolston et al., 2019). These results provide evidence for the utility of combining MVRQA and TDA in the assessment of complex high-dimensional data in high-fidelity training environments. The research described in this section demonstrates that real-time assessment of communications can aid subject-matter experts during after-action reviews and that assessment of heart rate data from large teams provides useful indices of team state. By focusing on changes in dynamical systems, we have shown that it is possible to detect when teams transition into a new state, which often happens around critical periods of the mission (cf., de Mooij et al., 2020). Being able to draw the attention of instructors and assessors to changes in team state or to drive changes in the behaviour of adaptive training systems on the basis of such information provides significant opportunities for the development of training capability. For future work, we believe that identifying low-dimensional variables to quantify dynamics of the task environment, a critical source of constraints and perturbations for the team, is an important next step in both improving automated team interventions and identifying adaptive training opportunities. For instance, longitudinal metrics characterizing the spatial relations between aircraft could be combined with lowdimensional MVRQA metrics to create a time series indicator of environment-team coupling. In this case, a strongly coupled environment-team system would entail that the environment and team are evolving together, indicating a strong mutual influence that could point to effective or responsive teamwork. Alternatively, a weakly or uncoupled environment-team system could mean that there are changes in the environment that the team has not yet responded to, or that there are changes in the dynamics of the team that reflect internal reorganizations or perturbations in the team itself. We believe that being able to measure such symmetric and asymmetric changes in dynamics of a team and its environment will be critical for developing smart, real-time interventions from automated aids.

140

Human Factors in Simulation and Training

OPERATOR ACCEPTANCE OF PHYSIOLOGICAL MONITORING The potential applications of near-real-time monitoring of operator and team states such as workload, situation awareness, and fatigue are many, both in operational and training settings. In the training domain, such information could be used to inform performance evaluation, help tailor the behaviour of adaptive training systems, alert instructors to potentially important learning points, or identify critical events for feedback during after-action review. Research of the kind described in the preceding section is needed to develop and validate the approaches to data analysis that underpin such applications. However, the effective acquisition, interpretation, and utilization of the information arising from operator state monitoring technologies in applied settings also depend upon the acceptance of these technologies; both by those who are being monitored (e.g., trainees) and by those doing the monitoring (e.g., instructors, curriculum designers). Given this, it is important to understand the tendency of operators to either accept or reject such technologies as well as the factors underlying this. Factors that may either increase or decrease acceptance of monitoring technologies have been identified in the research literature. Examples of the former include perceptions that the benefits of the technology outweigh any risks (Moran et al., 2013) or the belief that performance monitoring makes available data that individuals can use for their own purposes, such as keeping track of their own physical fitness (e.g., Heron & Smyth, 2010). Examples of the latter include fear of unwanted disclosure of health-related information (e.g., Ahamed, Talukder, & Kameas, 2007) and feelings of discomfort or anxiety associated with being the subject of evaluation by a superior, colleague, or even the monitoring system itself (e.g., Zeidner & Matthews, 2005). During EBS14 and EBS16, four teams of C2 operators (27 operators in total) were asked to wear a physiological monitoring device for the duration of the exercise scenarios. Two of these teams were from the ground-based C2 environment and two were from the airborne environment. The device used was the Zephyr BioHarness 3 (Medtronic Zephyr, Boulder CO, USA). The Zephyr BioHarness is a lightweight physiological sensor designed to be worn against the wearer’s chest by means of a flexible synthetic strap. It records electrocardiographic (ECG), respiration, and accelerometry data at rates of 250, 100, and 25 Hz respectively, as well as summary statistics at a rate of 1 Hz. Data were recorded throughout each exercise scenario to the onboard memory of the recording module. At the end of each scenario, data were downloaded from each operator’s module to a central database for subsequent analysis. At the conclusion of each exercise, participants were asked to complete a Device Comfort Questionnaire (DCQ). The first version of this instrument, comprising five subscales, was used during EBS14. Items on the first subscale, device ergonomics, related to fit factors (e.g., ease of donning/doffing). The second subscale, acceptance of physiological monitoring, included items related to operators’ comfort with being monitored. The last three subscales, namely endorsement of use during simulation training exercises, live training exercises, and real operations, asked respondents to evaluate the degree to which they would accept the use of monitoring technologies

Integrating Effective Training and Research Objectives

141

for a variety of purposes (e.g., performance assessment, adjustment of task difficulty) across those three different contexts (see Menke et al., 2015, for a detailed description of this instrument). An updated version of the DCQ comprising nine subscales was used in EBS16. The first subscale, physical and psychological comfort, combined the device ergonomics and acceptance of physiological monitoring subscales from the previous version of the DCQ. The next six subscales were constructed to distinguish between different contexts for using monitoring technologies as well as between contexts in which data are interpreted and used by a human decision-maker (e.g., an instructor) versus a machine, such as an adaptive training system. These subscales were: (1) endorsement of use during simulation training exercises to support human decisionmakers, (2) endorsement of use during simulation training exercises to support adaptive automation, (3) endorsement of use during live training exercises to support human decision-makers, (4) endorsement of use during live training exercises to support adaptive automation, (5) endorsement of use during real operations to support human decision-makers, and (6) endorsement of use during real operations to support adaptive automation. Items on the next subscale, comfort with ubiquitous monitoring, asked participants to indicate how comfortable they would be wearing physiological monitoring devices during off-duty activities, such as during sleep and leisure activities. The final subscale, comfort with assessment consequences, included items related to potentially unanticipated consequences of physiobehavioural monitoring, such as disclosure of medical information (see Funke et al., 2017 for details of this instrument). Analyses conducted on the data from the first administration of the DCQ revealed surprisingly strong acceptance of the monitoring devices and their various use cases. This appeared to indicate that, from the operators’ perspective, the perceived benefits of monitoring outweigh any perceived risks. This was a surprising outcome, given that there remain significant limitations on the extent to which such devices and the data they make available can be considered consistently reliable and valid across a wide range of operational contexts and proposed applications (e.g., Christensen, Estepp, Wilson, & Russell, 2012; Hancock & Matthews, 2019; Matthews, Reinerman-Jones, Barber, & Abich, 2015). One possible implication of this outcome is that operators may be unfamiliar with the capabilities and limitations of current technologies. Based on this, the authors concluded that it would behove those working in the area to ensure that expectations are appropriately calibrated against the actual capabilities of the systems under development. Failure to do so could result in violated expectations, mistrust, and disuse of future capabilities (Menke et al., 2015). Some potential applications of monitoring technologies involve providing information to human decision-makers, while others involve informing the behaviour of intelligent, adaptive systems. For its second administration, the structure of the DCQ was refined in order to investigate whether acceptance of monitoring technologies varied systematically between these two use cases. Analyses conducted on the data from the second administration of the DCQ suggested that this factor did indeed have an impact. While the operators’ responses were not as overwhelmingly

142

Human Factors in Simulation and Training

positive in the second administration as the first, the subscale scores that departed from the scale midpoint in a statistically significant way (physical and psychological comfort, endorsement of use during simulation training exercises to support human decision-makers, and endorsement of use during live training exercises to support human decision-makers) did so universally in the positive direction. Furthermore, there was a statistically significant difference across subscales, depending upon who was named as being the user of the data. Specifically, operator ratings were significantly more positive on average for subscales that named a human decisionmaker than those that named an intelligent, adaptive system as the user of the data. This difference in acceptance could have significant implications for the use of monitoring technologies to enable the provision of large-scale, complex training experiences in Australia and other countries with similarly sized militaries. As described above, a key constraint on the provision of such training is the availability of expert trainers to plan and oversee the execution of these activities and of operational personnel to generate scale and realistic learning opportunities by participating in exercise scenarios. Monitoring technologies can go some way towards extracting additional learning benefits from complex training events by, for example, drawing the attention of the relatively small number of available instructors to potentially important learning points that might otherwise be missed (indeed specific examples of this were observed during EBS 2016). However, if these technologies are to support a significant increase in the frequency and regularity with which complex training can be provided, without a concomitant increase in the staff and other resources required, the development and broad adoption of intelligent, adaptive training systems is likely to play a role. These outcomes from EBS, therefore, serve to highlight an area in which further research is required to understand and overcome potential barriers to the adoption of new training technologies.

EVALUATION OF TRAINING CAPABILITY Networked simulators and associated support systems of the kind developed, demonstrated, and transitioned to RAAF during the EBS series have introduced the opportunity to more closely replicate conditions encountered in combat and to record performance parameters for analysis and feedback. Further, they hold the promise of gaining efficiencies in training through a more effective blending of simulation and live training. However, it is unlikely that the promise of these technologies can be fully realized unless their design, use, and evolution are informed by a thorough understanding of combat-mission training requirements. Within the United States, a suite of tools and methods collectively known as the Mission Essential Competencies (MEC) approach has been successfully applied to define and track training requirements across a range of capabilities. MECs, along with associated supporting competencies, knowledge, skills and developmental experiences have been defined for every tactical platform in the United States Air Force (USAF), command-and-control platforms in both the USAF and the United States Navy (USN), as well as intelligence surveillance and reconnaissance (ground and airborne), close air support (air and ground operations), and multinational

Integrating Effective Training and Research Objectives

143

mission sets in air-to-air, air-to-ground, command and control, peacekeeping support, and joint terminal attack control. Given the success of the implementation of MECs, many efforts have been refreshed as the capabilities of the systems or mission requirements have changed over time and the results have been incorporated in training systems (see Bennett, Alliger, Colegrove, Garrity, & Beard, 2013, for a detailed description of the MEC processes and products). The final exercise to carry the Black Skies name, EBS 2016, represented a transition of capability from DSTG to RAAF and marked the initial operating capability (IOC) for a facility that was then known as the Joint Air Warfare Battle Laboratory (JAWBL; currently the RAAF Distributed Training Centre) at RAAF Base Williamtown. The purpose of the JAWBL was to inform and de-risk requirements for more permanent simulation and experimentation capabilities that would become part of the RAAF Air Warfare Centre. The baseline systems installed into the JAWBL and the planning, staffing, training design, assessment, and feedback methods used in EBS16 were the products of iterative development and evaluation that took place within DSTG’s Melbourne facilities throughout the preceding decade (e.g., Crane et al., 2006). To achieve the objectives of JAWBL, it was necessary to build upon this foundation by (1) codifying RAAF combat-mission training requirements, (2) developing and trialling systems and methods for addressing those requirements, (3) evaluating the relative strengths and weaknesses of those systems and methods, and (4) tracking progress over time to target resources effectively. To support this effort, DSTG and AFRL researchers partnered with RAAF to conduct a MEC-based analysis of JAWBL capability. As this would be the first time the MEC approach had been applied to a specifically Australian platform, a secondary objective was to evaluate the utility of the approach in the Australian context. The capability chosen for this analysis was the Air Defence Ground Environment (ADGE), operated by RAAF Number 41 Wing. The ADGE is a ground-based, tactical air-battle management (ABM) capability. ADGE operators had taken part in every iteration of EBS prior to 2016 and, as a result, their systems, missions, and roles were reasonably well understood. The ADGE MEC effort led to the identification of 9 ADGE mission essential competencies, 24 supporting competencies, 63 knowledge elements, 60 skills, and 73 developmental experiences. These materials were used during EBS16 to characterize the status of JAWBL training systems at the time of IOC. One aspect of the MEC approach that sets it apart from many other approaches to training analysis is its inclusion of “Developmental Experiences” as a driver of system requirements, scenario design, and training effectiveness evaluation. Experiences are defined within the MECs framework as developmental events during training and/or career activities that are necessary to learn knowledge, skills, or supporting competencies under operational conditions (Bennett et al., 2013). MEC experiences can be particularly useful in the context of capability development because they help to define what kinds of scenarios trainees should be exposed to – and therefore what kinds of stimuli and responses training systems should be able to support. They also support training evaluation by providing criteria that are straightforward for participants to understand and for decision-makers to interpret.

144

Human Factors in Simulation and Training

A “Training Experiences Survey” was constructed from the list of 73 ADGE mission-essential developmental experiences. Near the conclusion of EBS16, ADGE operators were asked to rate, based on what they had seen during the exercise, the extent to which they felt the existing JAWBL systems could support the effective provision of each experience. They were also asked to rate the importance of each experience for becoming fully combat-mission ready. Analysis of these data showed that participants judged that JAWBL capabilities, as demonstrated during EBS16, supported the provision of a majority of developmental experiences to at least some extent. Summing across participants and items, more than three-quarters (77.6%) of effectiveness ratings indicated that JAWBL was “Somewhat Effective” or better. Almost two-thirds (62.3%) of these ratings indicated that JAWBL was “Quite Effective” or better. While these outcomes were broadly positive, it was also apparent from the existence of a number of ratings in the “Not at all Effective” and “Slightly Effective” categories that there was room for improvement. A closer examination of these responses was undertaken to identify opportunities for capability development. Of the 73 ADGE developmental experiences, there were a total of 15 that received a median rating less than “Somewhat Effective”. Additionally, there were a total of 26 experiences for which two or more ADGE participants agreed that existing JAWBL capabilities were less than “Somewhat Effective”. There was substantial overlap between these two lists of experiences, such that the former was a subset of the latter, with the exception of just one experience. Examination of this list revealed three possible reasons why participants might have evaluated the JAWBL as relatively ineffective for providing some developmental experiences. First, the list contained a subset of experiences that are, by their nature, only marginally relevant for exercises of the kind represented by EBS16. Examples included “Operate in a deployed or field environment”, “Familiarization visits”, and “Being an exchange officer”. It was concluded that little could or should be done within JAWBL to address these particular shortcomings. Of the experiences that appeared to be better suited to EBS and JAWBL, two further sub-categories could be discerned. Some experiences were not nominated as training objectives by participating units during planning for EBS16 but nevertheless could have been provided if required. Examples included “Observe and respond to a civilian emergency”, “Operate under extreme fatigue, long hours, stress”, and “Experience a lost or unaccounted for aircraft”. Finally, the list contained experiences that, while well-suited to training environments like EBS in general terms, were indeed not possible within the JAWBL at the time the exercise was conducted due to the status of those particular simulation systems. Examples included “Operate in a GPS denied environment” and “Operations during degraded comms”. To help interpret the findings presented above, the underlying reasons for participant evaluations were represented diagrammatically in the form of a “MEC Experience Stack”, as shown in Figure 4.4. This model depicts the categories of developmental experiences described above as well as a path for experiences to move between categories over time as training capability develops, matures, and is demonstrated to be effective. The top layer of the stack (i.e., above the dashed

Integrating Effective Training and Research Objectives

145

FIGURE 4.4  A MEC-based model to guide training design and system development. Mission-essential developmental experiences are categorized according to whether they could, in principle, be provided within a given training environment, and if so, whether there is evidence to indicate that they are provided effectively given current system status. Arrows indicate a developmental trajectory.

line representing the effectiveness threshold) contains those experiences for which high effectiveness ratings have been obtained (Category A). The next layer down (i.e., immediately below the dashed line representing the effectiveness threshold) divides low-effectiveness experiences into those that could reasonably be considered to be within-scope for the training environment under consideration and those that should be considered out-of-scope (Category B). The bottom layer further differentiates amongst within-scope experiences; categorizing them as either supportable given current system status (Category C), or not currently supportable (Category D). A developmental trajectory for JAWBL or similar facilities is represented by the arrows in Figure 4.4. According to this model, progress is represented by the achievement of outcomes that serve to migrate learning experiences from Category D, through Category C, and eventually into Category A. Building on this model and its underlying logic, a set of metrics were derived by which the status and progress of the JAWBL could be tracked. The first metric was a measure of the total possible training scope of the capability; defined as the proportion of the entire list of mission-essential learning experiences for a given training audience that could, in principle, be addressed – whether or not they are supported by existing systems. This measure helps to quantify the ideal of what can be expected from the training environment. It is a representation of what proportion of training objectives could be achieved for a given training audience within a facility of this kind in a so-called “perfect world”. The second metric was an estimate of the proportion of mission-essential developmental experiences that could currently be provided to a reasonable level of effectiveness, given the existing state of the capability. Unlike the previous metric, this is not an operationalization of some ideal state, but rather, a measure of current status. There are two categories of experiences in

146

Human Factors in Simulation and Training

the model that could in principle represent current capability; one for which positive effectiveness ratings have been obtained (Category A) and one for which such ratings have not yet been obtained, but which nevertheless could be supported if they were written into mission scenarios (Category C). A more conservative approach to characterizing existing capability that yields an alternative version of this metric takes into account only those experiences for which supporting data have been obtained and excludes those experiences that could be supported, but which have not yet been demonstrated. The MEC products and processes, along with the model and metrics described above provided a relatively straightforward way of operationalizing, understanding, tracking, and communicating JAWBL status as well as focusing capability development and future training and research efforts. In this way, these tools and methods hold significant promise for ensuring that participating operators gain benefits from activities such as EBS and also that resources directed at expanding and evolving training capabilities are targeted effectively.

SUMMARY AND CONCLUSION The Black Skies series of exercises provided demonstrably effective training as well as a range of valuable auxiliary benefits for the military operators who participated. By providing concrete outcomes for the personnel involved, these exercises served to foster engagement by decision-makers at all levels within RAAF and, over time, helped build the case for investment in new training capabilities. In addition, the realistic combat missions simulated during EBS provided an ecologically valid context within which to investigate team learning and performance and to inform the development of emerging training tools and methods. The EBS motto of “Prepare, Evaluate, Demonstrate” served to characterize the multi-faceted objectives of the exercises and to focus exercise planning and execution on balancing the needs of a variety of stakeholders. Through thorough planning, deep stakeholder engagement, and close collaboration between experts with a diverse range of skillsets, the EBS series demonstrated that it is possible – and indeed beneficial – to pursue scientific and operational training objectives in tandem. In doing so, we believe these exercises serve as a model for how the judicious integration of operational and research objectives can drive innovation in training.

ACKNOWLEDGEMENTS AND DEDICATION The authors acknowledge the significant contributions made to the EBS series by the participating members of the Royal Australian Air Force and The Australian Army; researchers from DSTG Aerospace Division’s Human Factors and Air Operations Simulation Groups, Defence Technology Agency New Zealand, and Defence Science and Technology Laboratory UK; and industry partners from Aptima, Ball Aerospace, Milskil, and Simulation Solutions Australasia. In memory of Michael Skinner and Ron Best, whose enthusiasm for the work described here was a source of great support.

Integrating Effective Training and Research Objectives

147

REFERENCES Adams, R. P., & MacKay, D. J. (2007). Bayesian online changepoint detection. arXiv Preprint. Ahamed, S. I., Talukder, N., & Kameas, A. D. (2007, September). Towards privacy protection in pervasive healthcare. Paper presented at the 3rd IET International Conference on Intelligent Environments (IE 07), Ulm, Germany. Bennett, W., Alliger, G. M., Colegrove, C. M., Garrity, M. J., & Beard, R. M. (2013). Mission essential competencies: A novel approach to proficiency-based live, virtual, and constructive readiness training and assessment. In C. Best, G. Galanis, J. Kerry, & R. Sottilare (Eds.), Fundamental Issues in Defense Training and Simulation (pp. 47–62). Aldershot: Ashgate. Best, C., Jia, D., & Simpkin, G. (2013). Air force synthetic training effectiveness research in the Australian context. Proceedings of the NATO STO MSG 111 Multi-Workshop, Sydney, October, 2013. Cacioppo, J. T., Bernston, G. G., Larson, J. T., Poehlmann, K. M., & Ito, T. A. (2000). The psychophysiology of emotion. In M. Lewis & J. M. Haviland-Jones (Eds.), Handbook of Emotions (2nd ed., pp. 173–191). New York: Guilford Press. Cao, L., Mees, A., & Judd, K. (1998). Dynamics from multivariate time series. Physica D: Nonlinear Phenomena, 121(1), 75–88. Carlsson, G. (2009). Topology and data. Bulletin of the American Mathematical Society, 46(2), 255–308. Charles, R. L., & Nixon, J. (2019). Measuring mental workload using physiological measures: A systematic review. Applied Ergonomics, 74, 221–232. Christensen, J. C., Estepp, J. R., Wilson, G. F., & Russell, C. A. (2012). The effects of dayto-day variability of physiological data on operator functional state classification. Neuroimage, 59, 57–63. Cooke, N. J., Gorman, J. C., Myers, C. W., & Duran, J. L. (2013). Interactive team cognition. Cognitive Science, 37(2), 255–285. Crane, P., Skinner, M., Best, C., Burchat, E., Gehr, S. E., Grabovac, M., Pongracic, H., Robbie, A., & Zamba, M. (2006). Exercise Pacific link: Coalition distributed mission training using low-cost communications. Proceedings of the 2006 Australasian Simulation Technology and Training Conference (SimTecT), Melbourne, May 2006. De Mooij, S. M. M., Blanken, T. F., Grasman, R. P. P. P., Ramautar, J. R., Van Someren, E. J. W., & van der Maas, H. L. J. (2020). Dynamics of sleep: Exploring critical transitions and early warning signals. Computer Methods and Programs in Biomedicine, 193, 105448. https://doi​.org​/10​/ggsptj Dukes, A. W., Funke, G. J., Strang, A. J., & Best, C. J. (2015). DRADIS—Real Time Communication Analysis. Poster presented at the 2015 International Symposium on Aviation Psychology, Dayton, OH. Francis, C., Best, C., & Yildiz, J. (2016). Improving air force operator performance through synthetic mission rehearsal. Proceedings of the SISO Simulation Innovation Workshop, Orlando, FL, September 2016 [Invited Paper]. Funke, G., Best, C., Menke, L., & Strang, A. J. (2017). Warfighter acceptance of future physio-behavioral monitoring and augmentation: Update. Proceedings of the Annual Meeting of the Human Factors and Ergonomics Society, 61, 1151–1155. Gorman, J. C., Dunbar, T. A., Grimm, D., & Gipson, C. L. (2017). Understanding and modeling teams as dynamical systems. Frontiers in Psychology, 8, 1053. Hancock, P. A., & Matthews, G. (2019). Workload and performance: Associations, insensitivities, and dissociations. Human Factors, 61, 374–392. Hartigan, B. (2020, July 16). Exercise Virtual Pitch Black delivers complex training [Article]. Contact Magazine. Retreived from: https://www​.con​tact​airl​andandsea​.com ​/2020​/07​ /16​/exercise​-virtual​-pitch​-black​-delivers​-complex​-training/

148

Human Factors in Simulation and Training

Heron, K. E., & Smyth, J. M. (2010). Ecological momentary interventions: Incorporating mobile technology into psychosocial and health behaviour treatments. British Journal of Health Psychology, 15, 1–39. Ishak, A. W., & Ballard, D. I. (2011). Time to re-group: A typology and nested phase model for action teams. Small Group Research, 43(1), 3–29. Konvalinka, I., Xygalatas, D., Bulbulia, J., Schjødt, U., Jegindø, E.-M., Wallot, S., Van Orden, G., & Roepstorff, A. (2011). Synchronized arousal between performers and related spectators in a fire-walking ritual. Proceedings of the National Academy of Sciences, 108(20), 8514–8519. Kozlowski, S. W., & DeShon, R. P. (2004). A psychological fidelity approach to simulationbased training: Theory, research and principles. In S. G. Schiflett, L. R. Elliott, E. Salas, & M. D. Coovert (Eds.), Scaled Worlds: Development, Validation and Applications (pp. 75–99). Aldershot: Ashgate. McIntyre, H. M., & Smith, E. (2013). Key tenets of collective training. In C. Best, G. Galanis, J. Kerry, & R. Sottilare (Eds.), Fundamental Issues in Defense Training and Simulation (pp.125–134). Aldershot: Ashgate. Marwan, N., Romano, M. C., Thiel, M., & Kurths, J. (2007). Recurrence plots for the analysis of complex systems. Physics Reports, 438(5), 237–329. https://doi​.org​/10​.1016​/j​.physrep​ .2006​.11​.001 Matthews, G., Reinerman-Jones, L. E., Barber, D. J., & Abich, J. (2015). The psychometrics of mental workload: Multiple measures are sensitive but divergent. Human Factors, 57, 125–143. Menke, L., Best, C., Funke, G., & Strang, A. (2015). Warfighter acceptance of future physiological monitoring and augmentation: A coalition study. Proceedings of the Annual Meeting of the Human Factors and Ergonomics Society, 59, 125–129. Moran, S., Jaeger, N., Schnädelbach, H., & Glover, K. (2013, June). Using adaptive architecture to probe attitudes towards ubiquitous monitoring. Proceedings of the IEEE International Symposium on Technology and Society (ISTAS), Toronto, ON, Canada. Mitkidis, P., McGraw, J. J., Roepstorff, A., & Wallot, S. (2015). Building trust: Heart rate synchrony and arousal during joint action increased by public goods game. Physiology & Behavior, 149, 101–106. Pincus, S., & Singer, B. H. (1996). Randomness and degrees of irregularity. Proceedings of the National Academy of Sciences, 93(5), 2083–2088. Proulx, R., Côté, P., & Parrott, L. (2009). Multivariate recurrence plots for visualizing and quantifying the dynamics of spatially extended ecosystems. Ecological Complexity, 6(1), 37–47. Richman, J. S., & Moorman, J. R. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039–H2049. Riley, M. A., & Van Orden, G. C. (2005). Tutorials in Contemporary Nonlinear Methods. National Science Foundation. https://www​.nsf​.gov​/pubs​/2005​/nsf05057​/nmbs​/nmbs​ .pdf Shanahan, C., Best, C., Finch, M., Tracey, E., Vince, J., Hasenbosch, S., & Stott, A. (2009). Exercise Black Skies 2008: Enhancing Live Training Through Virtual Preparation. Part One: An Evaluation of Training Effectiveness. DSTO-RR-0344. Melbourne: Defence Science and Technology Organisation. Singh, G., Mémoli, F., & Carlsson, G. (2007). Topological methods for the analysis of high dimensional data sets and 3d object recognition. Eurographics Symposium on PointBased Graphics, 22, 91–100.

Integrating Effective Training and Research Objectives

149

Smith-Jentsch, K. A., Zeisig, R., Acton, B., & McPherson, J. (1998). Team dimensional training: A strategy for guided team self-correction. In J. A. Cannon-Bowers & E. Salas (Eds.), Making Decisions Under Stress: Implications For Individual and Team Training. (pp. 271–297). Washington, DC: APA. Stephen, D. G., Boncoddo, R. A., Magnuson, J. S., & Dixon, J. A. (2009). The dynamics of insight: Mathematical discovery as a phase transition. Memory \& Cognition, 37(8), 1132–1149. Stephens, A., Crone, D., Temby, P., Best, C., & Simpkin, G. (2011). Using synthetic environments to enhance close-air support training. Proceedings of the Simulation Technology & Training Conference (SimTecT), Melbourne, May, 2011. Strang, A. J., Funke, G. J., Russell, S. M., Dukes, A. W., & Middendorf, M. S. (2014). Physiobehavioral coupling in a cooperative team task: Contributors and relations. Journal of Experimental Psychology: Human Perception and Performance, 40(1), 145–158. Strang, A. J., Horwood, S., Best, C., Funke, G. J., Knott, B. A., & Russell, S. M. (2012). Examining temporal regularity in categorical team communication using sample entropy. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 56(1), 473–477. Tolston, M. T., Best, C. J., Funke, G. J., Menke, L., & Dukes, A. W. (2016). Monitoring Large Teams for Changes in Physiological Coupling Using Multivariate Recurrence Quantification Analysis. Poster presented at 8th International Conference on Applied Human Factors and Ergonomics, Orlando, FL. Tolston, M. T., Best, C. J., Miller, B., Rice, B., Francis, C., & Funke, G. J. (2019). Responses of Teams of Air Battle Managers to Perturbations in High-Fidelity Training Scenarios. Talk presented at the 20th International Symposium on Aviation Psychology, Dayton, OH. Tolston, M. T., Funke, G. J., Alarcon, G. M., Miller, B., Bowers, M. A., Gruenwald, C., & Capiola, A. (2018). Have a heart: Predictability of trust in an autonomous agent teammate through team-level measures of heart rate synchrony and arousal. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 62, 714–715. Wallot, S., Roepstorff, A., & Mønster, D. (2016). Multidimensional Recurrence Quantification Analysis (MdRQA) for the analysis of multidimensional time-series: A software implementation in MATLAB and its application to group-level data in joint action. Frontiers in Psychology, 7, 1835. Webber, C., & Zbilut, J. P. (1994). Dynamical assessment of physiological systems and states using recurrence plot strategies. Journal of Applied Physiology, 76(2), 965–973. Wiltshire, T. J., Butner, J. E., & Fiore, S. M. (2018). Problem-solving phase transitions during team collaboration. Cognitive Science, 42(1), 129–167. Wright, R. A., & Gendolla, G. H. E. (2012). How Motivation Affects Cardiovascular Response: Mechanisms and Applications. https://doi​.org​/10​.1037​/13090​- 000 Zeidner, M., & Matthews, M. (2005). Evaluation anxiety: Current theory and research. In A. J. Elliot & C. S. Dweck (Eds.), Handbook of Competence and Motivation (pp. 141–163). New York: The Guilford Press.

5

Extended Reality in Training Environments A Human Factors Trend Analysis Salim A. Mouloua, Gerald Matthews, John French, and Mustapha Mouloua

CONTENTS Background............................................................................................................. 152 Methodological Approach....................................................................................... 154 Results..................................................................................................................... 157 Study Designs over Time........................................................................................ 158 Simulation Domains over Time.............................................................................. 159 Simulation Cluster Areas over Time....................................................................... 161 Military Simulation........................................................................................ 161 Aerospace Simulation.................................................................................... 162 Driving Simulation........................................................................................ 162 Healthcare Simulation................................................................................... 162 Manufacturing Simulation............................................................................. 163 Entertainment Simulation.............................................................................. 163 Simulation Methods and Design.................................................................... 163 General Simulation Areas.............................................................................. 164 Military Cluster Areas over Time............................................................................ 164 Author Affiliations over Time................................................................................. 165 Military Author Affiliations over Time................................................................... 166 Funding Agencies over Time.................................................................................. 166 Military and Government Funding over Time........................................................ 168 XR System Types over Time................................................................................... 168 XR Systems’ Breakdown over Time....................................................................... 170 Physiological Recording Systems over Time.......................................................... 171 Conclusions............................................................................................................. 173 Major Conclusions and Recommendations................................................... 173 References............................................................................................................... 175

DOI: 10.1201/9781003401353-5

151

152

Human Factors in Simulation and Training

BACKGROUND The proliferation of wearable technologies that include virtual reality (VR) and augmented reality (AR) and mixed reality (MR) have gained considerable attention in recent years. VR involves total visual immersion in a representation of the real world whereas AR involves overlaying graphics and text with visuals of the real world. MR is a combination of both, where real and virtual worlds can interact. Much of this work was started in the early 1960s and 1970s, with a main focus on gaming, followed by training in systems such as aerospace, military, manufacturing, and academics. Recent reports have estimated that the global market for AR and VR technologies will reach $209.2 billion by the end of this 2022 year, and the global VR video gaming revenues were estimated at $22.9 billion in 2022 (Petrov, 2022). These numbers are forecasted to continue to grow in the next few years due to several factors such as price affordability, hardware size, internet speed, technology acceptance, and experience (Insider Intelligence, 2016). Growth in AR devices continues to grow about four times faster than in the VR markets (Market Analysis Report, 2021). MR technologies, such as Microsoft’s HoloLens, are just beginning but expected to grow similarly in the coming years. The MR experience, for example, makes it possible to visit a virtual museum many miles distant that actually exists – but is also on virtual display. Thus, a considerable increase in interest and use of these technologies reached record peaks in 2020–2022, likely due, in part, to the recent Covid-19 pandemic. The pandemic has provided both opportunities and challenges for VR designers to think about extending the reality of our daily work tasks and activities to remote and virtual platforms. These included Zoom and other internet meeting platforms, social media, education, healthcare, real estate, manufacturing, online shopping, entertainments, and training. The adoption of virtual reality in particular during the lockdown period is greater (Shirer & Soohoo, 2020). The Covid-19 pandemic has significantly prompted the adoption of virtual reality and augmented reality technologies as businesses have turned to remote work (Vardomatski, 2021). Similarly, it is estimated that 25% of internet users (70.2 million people) will be VR users by this coming 2023 year (Petrov, 2022). As a result, these users will be afforded various opportunities in their professional, social, work, entertainment environments, education and training, and online shopping, to name a few. However, VR and AR technology adoption may also lead to some human factors and engineering issues (e.g., motion sickness, lack of specific content, hardware and software compatibility, user-interface design, technology acceptance and user experience, price affordability, and safety (Petrov, 2022). One particular technology that has recently gained a momentum among these virtual reality technologies is extended reality, which is also the main focus of our current research. AR is generally believed to evoke lower levels of bodily and perceptual discomfort than VR, but direct evidence is lacking (Descheneaux et al., 2020). Extended reality (XR) is an umbrella term that combines immersive technologies like virtual reality with technologies that expands our physical world. This is enabled by adding virtual elements to it

Extended Reality in Training Environments

153

such as augmented reality or mixed reality information (Kiger, 2020). All of these technologies can be discussed as a single entity using the term XR. The XR market is expected to reach $209 billion by 2022 (Marr, 2019). This expansion makes XR development attractive to a large number of technological applications and means that the work, training, shopping, and entertainment (gaming, movies, etc.,) industries could be vastly different from today within a brief period of time. Industries utilizing XR should closely ally themselves with the Human Factors sciences, in order to streamline their development as well as endow more usability and effectiveness in their target applications. Interestingly, it is rare that engineering technologies fully mature without such close proximity to ergonomic technologies. The role of XR devices in simulation and training is becoming critical to a host of technical areas because they are far less expensive and less dangerous than realworld exposure and training (Allen et al., 1998a, 1998b; Howard & Gutworth, 2020). For example, telesurgery, remote learning, military training, driving simulation, aerospace, manufacturing, and other research-related and even clinical treatment disciplines benefit from XR. Persons who have experienced or suffered from posttraumatic stress disorder (PTSD) due to their exposure on the battlefield can now be treated using VR technologies (Bedwell et al., 2018; Wong & Beidel, 2013; Bohil et al., 2011). Such technologies can also be used for training in search and rescue missions, emergency responses to catastrophic disasters, and reducing the symptoms of simulation sickness in motion-inducing environments (Allen et al., 1998a;1998b); Mouloua et al., 2009, Smither et al., 2008; Mouloua et al., 2004; Mouloua et al., 2005a and 2005b). Similarly, physicians and surgeons can now utilize these devices in order to teach medical students skills including various types of surgery (Vincenzi et al., 2009; Scerbo et al., 2012; 2013; Atallah, 2021; Velazquez-Pimentel et al., 2021). Previous research has shown that XR systems can be used to deliver effective training for developing emotional, cognitive, and physical skills (Howard, 2018; Irish, 2013; Jensen & Koradsen, 2018). For example, some studies have reported that VR training reduced social anxiety (Anderson et al., 2013; Parsons & Rizzo, 2008). However, the findings from a recent meta-analysis on virtual training programs for social skill development also highlight the challenges facing these programs (Howard & Gutworth, 2020). The results of the meta-analysis confirmed that VR training programs are, on average, more successful than alternative methods for enhancing social skills. However, factors believed to support the effectiveness of training such as immersion and gamification did not, in fact, appear to be beneficial. These findings point to the need for further theory development to guide training system development and evaluation (Howard & Gutworth, 2020). Simulating human interaction with virtual agents also introduces unique challenges that are not present in human–machine interaction (Matthews et al., 2021a), nor real human–human interaction. Healthcare facilities, school systems, and various industries have in large part transitioned to virtual work through distributed online meetings, in order to continue their operations and enhance team engagement. This is certainly a new “extension of reality,” but it is outside the scope of the present chapter. We mention these emerging technologies briefly, as they have provided several opportunities and benefits during the confinement period until the writing of this chapter.

154

Human Factors in Simulation and Training

Specifically, if virtual meetings constitute low-fidelity virtual environments, we expect that their explosive proliferation will eclipse that of XR technologies going forward. In commercial settings, the growth of XR has taken exactly the former direction – games and applications to meet with others in a “social” setting, such as VR Chat. Furthermore, with increasing connectedness between various types of virtual “environments,” whether planes or webpages, virtual meetings, or virtual environments – we expect a gravitational pull toward “virtuality” in various industries that might form a network reminiscent or beyond the scope of the presentday internet. For example, Meta (Facebook) sees extensive commercial and leisure opportunities for its VR Metaverse. This frontier may very well be an amalgamation of 2- and 3-dimensional cyberworlds, and it remains to be seen how few and far the transition points will be between those likely proprietary virtual realities. The goal of the present research was to present the historical trend of the development of such technologies, as they apply to a wide variety of human-machine systems. In addition, this chapter also examines some of the driving forces behind the surge of these technologies and their expansion to other domains of applications.

METHODOLOGICAL APPROACH In order to elucidate the temporal trends in XR research, we conducted a trend analysis of simulation-based articles by evaluating the frequency of their appearance in the Proceedings of the Human Factors and Ergonomics Society (HFES) Annual Meeting over the last 16 years (2005–2020). The goal of this study was to identify the research gaps and human factors issues associated with the use of XR methods reported in the HFES journal, as well as differences between the methods in their quantities and trajectories. The reason for choosing the HFES Proceedings is twofold. The first reason is to observe trends in Human Factors subfields (e.g., medical human factors or aviation human factors) over time, such as the distribution of funding sources or research cluster areas in articles. This allows for charting increases and decreases in military funding (as opposed to industry funding), and in specific research cluster areas (such as simulator sickness, or surgeon training) in XR research as a whole. Furthermore, we can determine “gaps” in research focus at a macro-scale, as well as who is funding the broader literature at any point in time. The second reason is to contribute to a broader systematic examination of the research vectors in human factors simulation studies. Toward that goal, this same approach has been used in the assessment of the subfields of unmanned aerial vehicles (Mouloua et al., 2018), cybersecurity (Mouloua et al., 2019), healthcare (Descheneaux et al., 2011; Stowers & Mouloua, 2018; Mouloua et al., 2021), and aviation systems (Ludvigsen et al. (2015). The present study focuses on trends in XR methods across broader human factors subfields in order to address which methods in simulation are becoming popular and which methods are becoming outdated. We would define a factor here as any qualitative nominal variable with varying characteristics that could differentiate one article from another. For example, an article on pilot performance could be defined in terms of its type of design (experimental versus theoretical), domain (aerospace versus surface transportation), cluster area

Extended Reality in Training Environments

155

(pilot training, unmanned aerial vehicles, and situation awareness), funding source (military versus industry funding), and system type (head-mounted versus desktop simulator, VR versus AR versus MR). Furthermore, we can qualitatively compare levels of a factor (e.g., AR vs MR) at the population scale (tallies across all groups of articles). One limitation to this approach is our focus on the “big picture,” as we were unable to splice out certain differences in factors such as system type at the sub-population level (e.g., between driving and aerospace). As our focus was XR in general and within the military setting, we sought to abbreviate more on military research rather than the other sub-populations. Table 5.1 depicts a list of some of the XR technologies. Our mission was to classify articles using a standardized and transparent approach. To this end, we used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) method developed by Moher and colleagues (2009) to detail our inspection and inclusion of articles in this trend analysis. An outline of PRISMA is shown in Figure 5.1 below. In the PRISMA method, records are (1) initially identified, (2) screened, (3) assessed for eligibility, and (4) included in the final analysis. In this process, an article could be excluded at the second or third step, with the former due to face value (topical irrelevance) and the latter due to more nuanced criteria (e.g., elongated abstract instead of article, too general in scope to encompass XR). Thus, using this process funnels a large number of articles into a smaller and highly relevant subset of articles. Articles were obtained from Sage Journals’ website, the primary publishing agency of the HFES journal. Search boundaries were implemented in order to determine what articles would be included in the results. Our search within the HFES Proceedings was restricted to three keywords, with a mandate that these keywords be present somewhere in the article for initial identification. The three specific keywords for our investigation included “virtual reality,” “augmented reality,” and “mixed reality.” While queries can be limited to article title or abstract, we focused on conducting a comprehensive search of all extended reality articles within the HFES Proceedings from 2005 to 2020. Importantly, the keyword “extended reality” was not included because a priori searches determined it is very new compared to

TABLE 5.1 Definitions of Acronyms Acronym

Definition

XR

Extended reality

VR AR VE MR HUD

Virtual reality Augmented reality Virtual environment Mixed reality Head-up display

HMD

Head-mounted display

156

Human Factors in Simulation and Training

FIGURE 5.1  PRISMA methodology for present trend analysis (initial n = 1012, final n = 530).

the others (unfitting for multi-year trend analysis) and XR articles delineate the specific technology being employed. Importantly, augmented reality and mixed reality research sprouted from efforts in virtual reality across decades. Therefore, there is little orthogonality between the keywords in practice (i.e., it is arguably both rare and theoretically alarming when commentary on virtual reality is not present in an augmented reality article). We visually inspected all articles and manually rejected irrelevant articles from our sample, as opposed to relying on automated search criteria alone. The keyword method provided a high probability of identifying studies for inclusion (“hits”), but also carried the risk of “false positives”; e.g., admitting studies that mentioned virtual reality but did not actually investigate it. As such, we used manual inspection in order to filter out those false positives. We chose this

Extended Reality in Training Environments

157

procedure in order to analyze the entire population of XR articles in the Proceedings across these years and avoid sampling error to the best of our ability. We used a yearby-year search of those three aforementioned keywords, which initially yielded 1012 articles (see Figure 5.1). Out of these articles, 881 were identified with the keyword “virtual reality,” while 131 were identified through the keywords “augmented reality” and “mixed reality.” We manually removed 125 duplicate articles across the keyword searches, amounting to 887 articles in total. We then screened those articles for topical relevance and excluded 342 articles that were not relevant to the scope of XR. Afterwards, we were left with 545 articles which we assessed for eligibility in the final trend analysis. Upon manual inspection, 15 were excluded for these reasons: too general (3), panel discussion (5), or poorly defined methods (7). The final number of articles included in the trend analysis was 530, the findings of which we outline and discuss in the following sections.

RESULTS We tabulated our 530 eligible items in a keyword-by-year table and organized them in two different combinations (see Figures 5.2–5.3). We did this to delineate differences in trends based on XR as a whole, VR versus AR versus MR, as well as general differences in XR technologies based on display type (HMDs versus desktop/handheld simulators). In Figure 5.2, VR was collapsed to emphasize different types of XR. Figure 5.2 shows that virtual reality receives the most research attention compared to mixed and augmented reality. VR was split into head-mounted displays (HMDs) and desktop and handheld simulators to account for different types of systems. Figure 5.3 shows that the VR research in Figure 5.2 seems to mostly involve simulators and other types of displays than HMDs. However, in the last four years, the amount of research reported on

FIGURE 5.2  HFES articles by keyword over time.

158

Human Factors in Simulation and Training

FIGURE 5.3  HFES articles by keyword over time.

HMDs has grown dramatically, even exceeding the level of interest in simulators in the past two years. Furthermore, we see that the popularity of augmented reality actually eclipses that of traditional VR simulators in the past year. N = 530.

STUDY DESIGNS OVER TIME Articles were categorized as one of five types of study designs, comprising experimental, meta-/trend analysis, literature review, translational, and theoretical studies. Articles were deemed experimental if an experiment took place using actual manipulations of controlled conditions and were assessed based on the reported criteria in the methods section. If a methods section did not exist, the article was excluded due to poor standards. Articles such as meta-analyses and trend analyses were articles directly comparing results of multiple studies, tabulating results or criteria from different studies, or qualitatively indicating directional trends in the literature. Literature reviews comprised articles commentating on the state of the literature with minimal to no theoretical contributions. Translational articles were those focusing on the development of methods, examining usability with participants but not possessing a true experimental paradigm (e.g., pilot and product testing), and articles with both theoretical and practical contributions that did not squarely fit into either theoretical or experimental articles. Articles were deemed theoretical if they proposed new theories, suggested methods not currently used or modified those methods, and were not principally experimental or translational in nature. It is important to note that applied articles were mostly experimental, and so were tabulated in that category. However, as we did not uniquely assess the “applied” component of articles alone, it is difficult to draw conclusions on whether earlier lab-based HF research truly supported the development of reality-based applications. For example, experimental articles may not necessarily be directed toward practical applications or operational

Extended Reality in Training Environments

159

FIGURE 5.4  HFES articles by study design over time. N = 530.

samples. As the number of “applied” articles dwarfs that of theoretical articles, we are unsure whether the technology is truly mature enough or supported by initial research efforts. The breakdown of articles over time is shown in Figure 5.4. Based on the figure above, we can see that experimental research in XR peaked in 2006 and again in 2019, largely fluctuating between those years. This study design type comprised the vast majority of XR articles and many of the dips in research are matched by increases in translational research. Translational research in XR was most popular between 2010 and 2014 and has remained fairly low, although it is making a return in the past few years. Together, experimental and translational research largely define the state of simulation in human factors – practical and empirically driven. There are very few meta-analyses and trend analyses on XR, and theoretical studies and literature reviews have neither increased nor decreased.

SIMULATION DOMAINS OVER TIME While XR comprises traditional simulation methods such as virtual environments, it also incorporates methods such as AR that can involve little to no simulation. To this end, AR might not necessarily equate to “simulation” as we perceive it, but rather a simple enhancement of one’s environment. However, we noticed a “slippery slope” in delineating AR and even some MR articles as non-simulations. For comparison’s sake, we categorized all these articles under the umbrella of “simulation,” with some encompassing full simulations of the environment (VR), to real and virtual environments combined (MR), to “simulated” additions to the real environment (AR). Therefore, some of the AR research is naturally not simulation per se – but removing those articles would ablate the “real” portion of the real-virtual continuum entirely, and we wanted to preserve this for a richer population of what constitutes XR. Ultimately, reality itself is the control to XR, no matter the method employed – and gradual virtualizations of that reality are ultimately simulations of some notion.

160

Human Factors in Simulation and Training

FIGURE 5.5  HFES articles by simulation research domains over time. N = 530.

We organized articles into seven broad domains of simulation research (see Figure 5.5), indicating the subfield within human factors of which the article was part. To this end, articles were categorized into either “military,” “aerospace,” “driving,” “healthcare,” “manufacturing,” “entertainment,” or “methods & design.” These domains were chosen in order to encompass a more general scope of simulation in human factors – that is, what simulation can be applied to. Military simulation articles were those relating to military endeavors in army, naval, and air force operations, as well as command and control, and studies related to combat or soldier, officer, or team training. Aerospace simulation articles included simulations in aviation related to unmanned aerial vehicles (UAVs), pilot training, commercial flight, spaceflight, and any articles specifically outside of the scope of military aviation. Driving simulation articles were those related to either driving simulators or unmanned ground vehicles (UGVs) specifically outside of the scope of military driving. Healthcare simulation articles involved those pertaining to the training of physicians, nurses, and surgeons using manikin and simulation technologies. Manufacturing simulation articles involved industrial inspection, process control, and microworld simulation as well as training in industrial settings. Entertainment simulation articles were related to gaming (whether recreational or serious), media, and consumer-oriented prospects. Simulation methods and design articles were articles specifically focused on simulators’ development, as well as the design and methods behind simulation at a broader scale. Surprisingly, a great deal of articles fell into this group (e.g., not principally focused on healthcare, aerospace, etc.). However, articles that focused on any of the prior content areas and were also methods-oriented were placed in their specific content domain (e.g., aerospace domain with methods & design clusters). We further broke down simulation domains into specific cluster areas within those domains in the following section.

Extended Reality in Training Environments

161

Examination of Figure 5.5 indicates that simulation methods and design research was the most populous XR domain within HFES, with articles peaking in 2006 and 2019. It is important to elaborate again that methods and design articles did not focus on a specific content area (such as driving or healthcare) and so were specifically related to basic simulation development and methods alone. Interestingly, we can see that these articles appear to match the trends in experimental research previously discussed (see Figure 5.4). This shows that the main driving force behind the trends in experimental XR studies is actually research focusing on the development of simulation techniques and design criteria. Furthermore, military simulation is the second largest domain in XR and peaked between the years 2008 and 2013. Since then, military XR research has dropped steadily and is at a lower point than in 2005 – perhaps indicating decreased interest in HFES by authors in this domain. The third largest simulation domain was driving research, which peaked in 2006 and has since remained mostly consistent (neither decreasing nor increasing very much). The domain of healthcare simulation has remained stable over this time course and has slightly grown. Interestingly, aerospace simulation research has been steadily declining since its peak in 2005. Articles in entertainment simulation have been steadily increasing and peaked in 2019. As for manufacturing simulation, articles peaked in 2014 and have remained about the same.

SIMULATION CLUSTER AREAS OVER TIME In order to specify areas of interest within our simulation domains, we generated 39 cluster areas comprising topics within and between those domains. This was done in order to capture the full scope of research in XR, and these cluster areas revealed trends in more specific research concepts within our subfields. All articles were categorized into 3–5 cluster areas that defined their approach used (e.g., soldier training, situation awareness, simulation design and development), much like specific keywords are used to highlight the areas of interest within a given article. To this end, we manually placed all articles within such areas in order to understand the trends in more specific topics in HFES, as well as demonstrate their relative presence compared to one another. This is shown in Figure 5.6. As the data are highly dimensional and non-unique across articles, we collapsed them across time for ease of interpretation. We also recommend the reader not consider this as an absolute indicator of XR topic areas, but as a general relative level of authorial interest in the literature for these research topics.

Military Simulation Within military simulation, Figure 5.6 shows that the most popular research areas were workload assessment, team training, and soldier training. The trends in workload assessment show a peak in 2010 alongside a slow decline in the number of articles published. Team training and soldier training also peaked in 2010 and have since declined.

162

Human Factors in Simulation and Training

FIGURE 5.6  Number of simulation cluster areas collapsed across time. N = 1434

Aerospace Simulation The most popular cluster areas in aerospace simulation research were pilot training and UAVs (see Figure 5.6), while research on air traffic control and spaceflight and habitats was scarce. Research in pilot training has been most stable, while research on UAVs peaked between 2009 and 2011 and has fallen since.

Driving Simulation Within driving simulations, the most researched area was driver perception and behavior, while driver distractions and unmanned ground vehicles have trailed behind. To date, there are few articles on the vigilance of drivers within the context of XR, compared to other research areas. Thus, research has not realized the potential of XR to investigate pressing contemporary safety issues including distraction, impacts of automation and assistive devices, and operation of connected vehicles.

Healthcare Simulation The most researched areas in healthcare simulation were surgeon training and assistive technologies, with surgeon training peaking in 2007 and decreasing

Extended Reality in Training Environments

163

to almost no articles in recent years. However, assistive technologies have been slowly growing over the years, and the past three years have seen an all-time high – with a similar trend for physician training. Research on telesurgery over the same time, in comparison, was very limited. The most appreciably growing area out of these four clusters is assistive technologies, ranging from assistive tools used by doctors (such as prosthetics or image enhancements) to those used by special populations such as the elderly and disabled groups (such as hearing aids or close captioning).

Manufacturing Simulation The most researched cluster area for this topic was manufacturing simulation (as it relates to assembly and product development specifically), followed by industrial inspection and search and network simulation. Manufacturing simulation studies peaked in 2014 and again in the last year. Both network simulation and industrial inspection and search do not appear to have appreciable trends, and the number of articles is quite low.

Entertainment Simulation The most researched area in entertainment simulation was recreational gaming and media, closely trailed by serious gaming. Interest in recreational gaming and media has oscillated somewhat over the years but reached a high point over the past two years. Serious gaming involves those articles specifically pushing the utility of gaming environments for purposes such as training and has remained relatively consistent compared to recreational gaming and media.

Simulation Methods and Design By far, the most researched area in simulation methods and design has been the actual design and development of simulations, followed by research in simulation fidelity and presence. Simulation design and development peaked in the last year and has grown robustly in its popularity. Additionally, it consistently dwarfs the research in all other cluster areas on an annual basis as well as in the larger picture. This is the most popular area of XR-related research identified through our trend analysis. The second most researched cluster area is simulation fidelity and presence, and these articles also peaked in the last year. Device usability and testing also peaked in 2010 and especially so in the last two years. Research in simulator sickness peaked in 2005 and has been dropping off since. The education and learning cluster area peaked in 2010 and 2020, showing a rhythmic decrease and increase in popularity over the years. With respect to virtual agents, this area peaked in 2012 and research has mostly stayed consistent since. Macroworld simulations were researched up until 2014 and have not seen any further research since.

164

Human Factors in Simulation and Training

General Simulation Areas Perception and error judgment was the most researched cluster area within this domain and peaked in the last two years. Research on navigation and wayfinding has remained consistent. The cluster area of individual differences peaked in 2015 and has also remained stable. Likewise, simulation research on stress, resources, and fatigue has remained consistent. Simulation research on telerobotics has remained low, and adaptive automation in a simulation context has remained a scarce research topic over the years.

MILITARY CLUSTER AREAS OVER TIME Cluster areas related to military simulation showed a lot of research activity, with these cluster areas from the military simulation domain: “threat detection,” “soldier training,” “team training,” “search and rescue,” “situation awareness,” “workload assessment,” “vigilance,” and “unmanned underwater vehicles.” We also incorporated the domains from other cluster areas that were identified solely within those military simulation articles (but outside of our initial cluster classification of being military-specific): “pilot training,” “navigation and wayfinding,” “stress, resources and fatigue,” “individual differences,” “unmanned ground vehicles,” “unmanned aerial vehicles,” “simulation design and development,” and “simulation fidelity and presence.” This was done because we wanted to capture broader trends within military simulation research itself, and because those articles in fact frequently focused on cluster areas that were non-specific to the military simulation domain. Within our broader military cluster areas, the most researched area was simulation design and development, as seen in Figure 5.7, with research peaking in 2019. Observing the trends shows us that simulation design and development within military research has been growing steadily over the years. The second most researched area was soldier training, and articles peaked in 2010 and 2013. Since then, simulation articles on soldier training have drastically dropped. Team training research has been rising and falling over the years, with no clear trends. However, threat detection research has risen sharply in 2019. Situation awareness in military simulation research has remained consistent over the years. With respect to military research on simulation fidelity and presence, articles peaked in 2010 and have remained very low over the past few years. Furthermore, military simulation research on unmanned aerial vehicles peaked in 2009 and has dropped in a similar fashion. Articles focusing on pilot training peaked from 2011 to 2012 and have also declined since. Navigation and wayfinding, search and rescue, individual differences, vigilance, unmanned ground vehicles, and unmanned underwater vehicles were the least researched cluster areas and do not demonstrate noticeable trends over the years. This is intriguing as these areas are some of the most active in general human factors research – yet they do not frequently appear in military simulation articles in the HFES Proceedings. We also observed that the largest peak for military simulation research was in 2013 – and since then, the diversity of articles as well as overall interest in research has dropped.

Extended Reality in Training Environments

165

FIGURE 5.7  Number of simulation cluster areas collapsed across time. N = 370.

AUTHOR AFFILIATIONS OVER TIME We identified authors’ affiliations within the articles in terms of their sector of employment. In order to do this, we did not count each author’s affiliation individually, but rather the dichotomized presence of given employment sectors within each article. Thus, if an article had three academics and two industrial employees, we counted academia once and industry once as the articles’ author affiliations. We chose this approach in order to focus less on the individual authors and more on their employment sector’s contribution to the articles. To this end, these affiliations are not demographic indicators of the authors themselves, but rather their affiliations. The affiliations we categorized articles by were academia, military, government, and industry. As shown in Figure 5.8, academics have been the largest contributors and driving force behind XR research, and their contribution grew rapidly but seems to have peaked in 2019. Industry was the second largest contributor to XR research, peaking in 2011. Interestingly, military researchers were third in contribution to XR research, with research peaking in 2010 and arguably remaining stable or decreasing in the past ten years. Lastly, contributions from government employees show no clear-cut trends but clearly lagged behind the other sectors.

166

Human Factors in Simulation and Training

FIGURE 5.8  HFES Author Sector Affiliations in XR research over Time. Evaluated as sectors’ presence in article (see text).

The trends in Figure 5.8 indicate an increasing academic interest in XR while the military, government, and industry sectors have begun lagging behind. This is surprising given the military’s long-known focus on simulation, but below we have broken down trends in military author affiliations by branch.

MILITARY AUTHOR AFFILIATIONS OVER TIME While we know about military contributions to XR research generally, we wanted to elucidate how each branch contributes and their trends over time. The largest contributor to XR research in the military is the Army, followed by the Air Force, the Navy, and international and other contributors such as foreign defense agencies and domestic military groups outside of the main branches. While we did assess trends in military author affiliations, the sample sizes were deemed too small to elicit meaningful variations over time. From our analyses, Army research in XR peaked in 2010 but has since steadily dropped in the past few years. Air Force research also peaked in 2010 and has slowly declined as well. These observations line up with the increases in attention to soldier training, simulation fidelity and presence, and military aviation research in 2010 reported above. The Navy’s research efforts in XR peaked between 2010 and 2011 and have fallen since then. The same trend is present with international and other military organization authors, with research peaking in 2010 and dropping in the following years.

FUNDING AGENCIES OVER TIME We categorized articles that displayed funding sources according to the funding agencies’ sectors. To this end, we tabulated articles into the same sectors present in our author affiliation process – academia, military, government, and industry funding

Extended Reality in Training Environments

167

FIGURE 5.9  HFES funding sources for XR research over time. N = 272.

sources. For this analysis, we counted the raw distribution of funding sources present in articles. This was highly revealing to economic contribution and agencies’ interests in conducting research. Funding for XR articles, for example, is most driven by contributions from government sources, closely followed by military agencies, as shown in Figure 5.9. This is interesting given the breakdown of author affiliations above – largely indicating most articles are actually published by academics but funded by government and military agencies. The trends in government-funded research shown in Figure 5.9 reveal a peak in 2010 followed by a steady decline until 2020, where government funding skyrocketed in XR research. Furthermore, military-funded research in XR showed a large peak in 2011, followed by declines until another smaller peak in 2019. We can see that military funding synchronizes quite well with our previously mentioned peaks in military author affiliations across all agencies from 2010 to 2011 – which marks the peak years of military investment in XR in terms of publication and economic interest. Industry funding in XR research has mostly remained low and without clear trends, with interest slightly increasing in 2020. With regard to funding in academia, research peaked in 2008 and has since slightly declined. These trends demonstrate that the money in XR research is coming from military and government agencies, as opposed to internal academic or corporate-funded sources. It would be intriguing to see exactly what kinds of efforts government and military agencies are investing in within XR research, but this is outside the scope of the present trend analysis. However, given the surge of interest in augmented reality and virtual reality HMDs in 2020, we suspect that government and military sources are prioritizing these specific avenues of XR going forward (see Descheneaux et al., 2020). Given that these articles are mostly driven by peaks in simulation methods and design in the past two years, it is likely that these agencies are precisely funding XR HMDs’ development and methodological enhancements. In addition, those specific research efforts are being conducted by academics for the

168

Human Factors in Simulation and Training

most part. However, it remains yet to be seen if this developmental boom surrounding second generation HMDs will persist, or lead the way to the emergence of thirdwave XR headsets we are not yet privy to, as in the case of the approximate five-year life cycle of first generation VR/AR/MR HMDs in the literature. If the latter ends up being the case, we speculate those third generation HMDs might emerge in the HFES literature by 2025. This would be contingent on the continual engineering of these XR systems in tandem with feedback from experts in the Human Factors sciences that will guide more usable, efficient, and effective systems for the academic and the consumer going forward.

MILITARY AND GOVERNMENT FUNDING OVER TIME We further broke down funding agencies’ contributions into military and government sources, as these were our two largest sectors of XR research funding. Elucidating these trends shows which military branches and government agencies contribute the most to funding – as well as where the contributions are lacking. However, as with the case of military funding sources, we elected not to include graphical trends here due to the small sample sizes. In accordance with our expectations from author affiliations noted above, military funding in XR was most driven by contributions from the US Army, closely followed by the USs Navy. Examining the distributions over time shows Army funding peaked in 2009 and 2019 and has remained mostly stable over time. Intriguingly, both Navy and Air Force funding peaked massively in 2011 and have both since dropped to virtually no funding in the past five years. It is odd that contributions from these branches to XR research leveled off so much, given the wealth of diversity in relevant military cluster areas being studied by (largely academic) researchers at this point.

XR SYSTEM TYPES OVER TIME Our search on extended reality systems involved virtual reality, augmented reality, and mixed reality. We wanted to profile the differences in the trends of each system type over time, as we could then elucidate which systems are becoming more popular by their relative growth compared to each other (as seen in Figure 5.10). Given how closely tied these terms are to each other, we expected to see either (I) similar trends in their growth or (II) trends in the terms, such that declines in one term would be compensated by growth in another term. It seems that these trends appear to show both suggestions, as we can see (I) concurrent peaks in 2009 and 2019 for both VR head-mounted displays and VR simulators (non-head-mounted displays). Furthermore, we can also see that (II) the largest peaks in VR simulator research (2010–2012) are juxtaposed by relatively low amounts of VR HMD research. Overall, VR simulators were the largest contributing XR systems researched in the literature, peaking in 2008, from 2010 to 2012, and again in 2017 and 2019. Figure 5.10 shows that research on XR simulators steadily rose until the mid-2010s where it dropped sharply to less than half its peak of contributions.

Extended Reality in Training Environments

169

FIGURE 5.10  HFES XR system types in research over time. N = 689.

However, after a couple of years, XR simulators rebounded and have consistently remained the main system used for XR research, and more specifically VR research as well. This is in contrast to the VR HMD “boom” that much of the literature postulated would completely overtake these VR simulator systems. In essence, VR simulators still remain the conventional method of examining virtual environments, even with the rise in wearable VR technology. However, research on VR HMDs has indeed risen immensely over the years – more so than any other system type. We can also project that this might rise even higher in the future. Observing the trends shows us that VR HMDs have actually risen to the same amount of research compared to VR simulators in 2018, as well as growth to its peak in 2019 that nearly compares with that of VR simulators. Interestingly, we see that the first peaks in VR HMD research were from 2005 to 2006, as well as 2008. Following these years, research plummeted to a lull until it began to rise in 2015, until its maximum in 2019. These trends indicate that research on VR HMDs is growing rapidly and is far more predictable than the other types of systems. Continually, we see that research in both AR/MR simulators peaked in 2006 (at 3 and 5 articles, respectively) and has hovered around 0–2 articles per year. However, for AR/MR HMDs specifically, the past two years have shown growth past that of simulators. Via inspection of the articles, there seems to be a bottleneck in the transition of theory to application in XR. Broadly, the theory oversteps what can actually be done, and this is reflected in the keyword search (what is spoken of) and the system types (what is actually examined). We can see that AR/MR research has indeed risen in recent years (system types), but how much so? Even though its popularity in the conversation (keyword search) has risen noticeably, it does not appear yet to be operationally meaningful. Based on these findings, it seems that the “XR revolution” is at present still primarily concerned with VR and is more of a buzzword in practice.

170

Human Factors in Simulation and Training

XR SYSTEMS’ BREAKDOWN OVER TIME We also broke down these XR system types into every generic model used across all articles in this trend analysis. Observing the breakdown in XR systems over time reveals some very intriguing trends, though their depiction is convoluted. As pictured below (see Figure 5.11), we can see the types of VR HMD systems being used begin to differ starting in 2015 (from different HMD systems being used up until 2013). Re-referencing this data to Figure 5.10 paints a clearer picture – the general shifting of emphasis from VR HMDs to VR simulators and back to VR HMDs. Broadly, this suggests the transition of first generation VR HMDs (such as RockwellCollins, Kaiser ProView, Virtual Research Systems, and custom/ambiguous HMDs) into second generation VR HMDs (such as the Oculus Rift and HTC Vive). Since XR simulator research begins to decline in 2013 and second generation VR HMD research begins to ramp up, we suggest that the increase in VR simulator system

FIGURE 5.11  Evolution of first and second generation XR systems over time in research use. N = 689. Red consists of AR/MR systems, while blue consists solely of VR systems. Gray consists of generation-invariant systems that could not be distinguished. Critically, darker colors are first generation systems, and lighter colors are second generation systems.

Extended Reality in Training Environments

171

research was driven by limitations of characteristics of first generation HMDs. Perhaps this was because those systems did not offer promising fidelity (whether physical or psychological), flexibility of analytical techniques, or usability, but such confirmations are beyond the technical scope of this trend analysis. If this is indeed the case, that would explain the diminishing use of first generation systems beginning in 2009, as well as the uptick in articles on second generation VR HMDs in the late 2010s (beyond the popularity first generation systems achieved). Furthermore, the latter would come only after the steep decline in the use of VR simulator systems in the early to mid-2010s. However, it is important to note that in this case, we might expect to see more lab-based research such as transitional and developmental work increasing, and those trends are not present here. It is likely that we were unable to discriminate this using the methodology we employed, which could be of notable importance. Upon inspection of Figure 5.11, we can see that within HFES the usage of second generation XR systems began in 2015 and has risen noticeably since then. The decline of first generation systems is also very evident, and from 2012 to 2014 we see early AR/MR systems fade out entirely. Likewise, early VR systems met their minima between 2013 and 2015, when the rise of second generation VR systems became evident. Essentially, what these trends appear to indicate is the decline of first generation VR HMD systems with relative improvement in VR simulator systems, followed by the decline of those VR simulator systems with relative improvements in second generation VR HMD systems. However, one notion is clear – research output on VR HMD and simulator systems is currently at a high, barring the last year of limited research in the Covid-19 era. For the reader, we ask two questions next: • What will third generation VR HMDs look like? • How soon will they be here?

PHYSIOLOGICAL RECORDING SYSTEMS OVER TIME In our trend analysis, we also highlighted the use of various systems for physiological data recording in XR research (see Figure 5.12), in order to capture what psychobiological methods are being used in the Human Factors literature. We excluded those systems that were noted physically incompatible with VR headsets as well as desk-mounted systems that captured open-face eye gaze activity. Importantly, some newer VR HMD systems actually have integrated eye-tracking capabilities, whereas standard eye-trackers have traditionally been incompatible with closed-face VR HMDs. Head-tracking systems appear to be the most widely used tools in XR research, peaking in 2006, 2008, and 2019. Research also dips in the years in between, which appears to match with the trends in VR HMDs discussed previously – indicating that the use of these systems depends on having HMDs in the first place. Indeed,

172

Human Factors in Simulation and Training

FIGURE 5.12  Physiological measures used in HFES XR research. N = 225.

head-tracking systems are often based on head-mounted gyroscopes integrated into these HMD systems, and we have observed exactly that in our analysis of these articles. Body-tracking follows behind and peaks in 2013 as well as drastically in 2019. Unlike head-tracking systems normally present in VR HMDs, these bodytracking systems can involve video, patch-light, or motion-tracking from either cameras or sensors completely detached from the head. Interestingly, in 2019, the usage spikes for both these systems – and the literature shows that many researchers were utilizing both methods in tandem. Following closely behind, eye-tracking systems peak in 2014, with low but stable trends otherwise. Otherwise, the use of neuroimaging systems with XR techniques has remained fairly low, with electroencephalography (EEG), functional near-infrared spectroscopy (fNIRS), and transcranial doppler sonography (TCD) comprising only 18 articles out of the entirety of the XR sample investigated. Likewise, neural stimulation methods such as transcranial direct/alternating current stimulation (tDCS/tACS) represent a virtually unresearched topic in XR. On the more peripherally physiological side, electromyography (EMG), galvanic skin response (GSR), and electrocardiography (ECG) consist of 33 total articles, a near doubling of the relative interest in central neurophysiological measures. One notion is clear here – there appears to be minimal growth in trends in these neurophysiological techniques in combination with XR research. As there are numerous studies on the effects of XR on the body and perceptual differences with studies of reality, the scarcity of neurophysiological research in general prompts researchers to investigate this nexus further. A practical deterrent to research may be the challenges of psychophysiological recording when the person is physically moving; researchers have yet to capitalize on specialized systems for ambulatory measurement.

Extended Reality in Training Environments

173

CONCLUSIONS The present study aimed to systematically examine trends in the population of extended reality research published in the Proceedings of the Human Factors and Ergonomics Society’s Annual Meeting from 2005 to 2020. We collected a total of 530 unique articles, identified using keywords relevant to XR. Results were tabulated, analyzed, and graphed based on study design used, simulation domain, simulation cluster area, military cluster area, authorial affiliation, funding agency, country of research, XR system type, XR system breakdown, and physiological systems. Our findings indicated that XR research has been fluctuating but slowly growing, and that system types display different trends. Diagnosing the current state of VR and extended reality research in Human Factors depicts an intriguing future for researchers to tackle. From the relative presence of research topics, funding agencies, authorial affiliations, system types, and other qualitative characteristics, we have illustrated the current gaps in the XR literature as it pertains to the field of Human Factors. We have presented here the competitors in XR research, as well as the lack of competition in authorship and funding arenas by certain sectors. We hope that the current trend analysis and findings from this research are useful to various XR systems’ researchers and developers across various areas of applications, as well as the Human Factors population of researchers. The goal of this research was to highlight the benefits of these systems for human performance and system assessment, simulation and training, as well as safety.

Major Conclusions and Recommendations The following recommendations for developing and promoting the use of VR/XR technologies are a result and logical outgrowth of the data collected, analyzed, and presented in this chapter. These may serve to educate the non-HFE professionals from other fields, who may find them useful to espouse the emerging VR devices and technologies, and how to best utilize them efficiently and safely. 1. While XR research represents a global effort, the US provides 82% of the publications in HFES. Canada supplies 5% of those articles, and Sweden is third with 2% of the articles. By this token, researchers in the US should look to collaborate more with international researchers, as a large XR hub seems to be present within the country that can be generalized further. 2. HMDs comprise only 25% of the publications on XR systems. The spikes in research seem to align with the popularity of first and second generation systems, with a noticeable limitation in the years between. Have we truly addressed the problems brought by first generation HMDs, or have we glossed over them with rosy expectations of XR’s future? 3. Because there is a limited amount of physiological research as relates to XR, researchers should consider employing physiological methods to validate within-system and between-system comparisons.

174

Human Factors in Simulation and Training

4. Out of the three military branches, the Army contributes the most to funding XR research – for the Navy and Air Force, it has decreased in recent years. This finding is surprising given the increasing sophistication of the control interfaces for both unmanned systems and manned aircraft such as F-35 and the consequent scope for VR/AR training (e.g., Pawlyk, 2021). 5. Academics studies make up the vast majority of XR research. Military and industrial research may be less readily available to the public (e.g., classified and proprietary research) – but as it stands, authors in this category could contribute more to the broader scientific literature. This review also points to the need for more theory-based research. XR provides a novel arena for human interactions with technology, and people and human factors theory development is needed to guide further research and application. 6. Simulator research is on the decline, and HMD research is on the rise. However, those numbers overwhelmingly reflect VR studies. 7. Individual difference factors in XR have been neglected. Novel personality and ability factors are emerging as predictors of behavioral and cognitive-affective responses to new technology, including XR (Matthews et al., 2021b). However, systematic research is lacking, and the small Ns of many studies mitigate against identifying reliable individual differences. Research is needed to predict who will engage with and utilize XR systems most effectively, and to develop methods for personalizing training and operational environments. 8. When we manually inspected articles for system types, we lost much of the AR and MR research that was supposedly being conducted recently (according to the keyword search). Thus, much of the research mentioning AR and MR is actually VR studies. The concepts of AR and MR are being overstated relative to the actual literature on them – and this is a serious problem for the future of XR as a whole. 9. Based on this trend analysis, research on augmented and mixed reality is not increasing as rapidly as the conversation surrounding it. Perhaps it is time to re-evaluate whether sufficient effort is being put into funding and conducting these types of studies, as opposed to traditional VR studies which reference AR and MR in a low-effort fashion. 10. There might be an impending replication crisis surrounding simulator sickness. With the rise of HMDs comes problems with fully immersive and mobile systems. However, simulator sickness research has been scarce in recent years. Critically, what differences do second generation systems confer as compared to first generation systems? To understand the XR technologies of today, we must empirically address this specific issue. 11. In order for XR to grow in the near future, HF/E practitioners must investigate AR and MR in their own context – with a greater priority than VR or separate from it. Currently, AR and MR cannot stand on their own two legs, and we must prop them up if we wish to understand the potential benefits they may confer in the field.

Extended Reality in Training Environments

175

In summary, this chapter presented a preliminary snapshot of the various HF/E aspects of these VR/XR/MR technologies using a time series design, and also focusing on content relevance, and domains of applications to name a few. Although these technological innovations have revolutionized our technological progress in learning, military and medical, patient healthcare and clinical intervention, entertainment, process control and manufacturing, they are still not readily and widely used in other domains. This stems from some of the human factors issues related to user experience, training requirements, system design, performance and workload, safety, and cost-effectiveness. Furthermore, it is noteworthy that these technologies will continue to grow at a faster speed in the educational, learning, and entertainment domains. The forecast of this growth is parallel to the increase in inflation rates and socioeconomical disturbances created by increased gas prices, geopolitical conflicts, wars, and shrinking economies. This may pressure students, commuters, and teachers to stay at home and select other options for remote learning and work activities, cultural and entertainment environments, sporting events, real estate viewing, shopping, and healthcare. Similarly, as VR systems continue to be used for training and rehabilitation, the scope for remote clinical applications will increase. Examples are therapy for gait and balance for patients who suffer from Parkinson’s disease (Canning et al., 2020), clinical intervention and training for individuals who may suffer from various social phobias or clinical disorders, such as fear of flying (Maltby et al., 2002), and trauma management with VR therapy (Owens & Beidel, 2015; Beidel et al., 2019), neuroendocrine or vascular procedures for trainees, among others. Finally, we hope that this chapter will serve as a research guide and resource for students, researchers, and practitioners alike, who may have various interests in the wide application of these technologies. We also hope that our current database of articles will also continue to grow, so we can examine other emerging trends in the near future. In addition, we also strive to make this database widely available to these students, researchers, practitioners, and developers alike for their access and various use. Similarly, we also hope that other researchers may wish to extend this research by including other publication outlets, where VR and XR technologies are also published. This research was limited to the HFES Proceedings, as a flagship for the Human Factors and Ergonomics Society. Our choice for this publication outlet was motivated by the nature of cross-disciplinarity of HF/E topics, as well as the sheer number of technical groups and diverse domains of applications that exist within the Human Factors and Ergonomics Society.

REFERENCES Allen, R., Hitt. J., Zavod, M., Bowen, S., Guest, M., & Mouloua, M. (1998a). Observations and recommendations of a simulated airport emergency. Proceedings of the 42nd Annual Meeting of the Human Factors and Ergonomics Society. Santa Monica, CA: Human Factors and Ergonomics Society. Allen, R., Guest, M., Bowen, S., Hitt. J., Zavod, M., & Mouloua, M. (1998b). Airport emergency response crew training: A virtual solution. Proceedings of the 42nd Annual Meeting of the Human Factors and Ergonomics Society. Santa Monica, CA: Human Factors and Ergonomics Society.

176

Human Factors in Simulation and Training

Atallah, S. (Ed.). (2021). Digital Surgery. Cham: Springer. https://doi​.org​/10​.1007​/978​-3​- 030​ -49100​- 0​_14 Anderson, P. L., Price, M., Edwards, S. M., Obasaju, M. A., Schmertz, S. K., Zimand, E., & Calamaras, M. R. (2013). Virtual reality exposure therapy for social anxiety disorder: A randomized controlled trial. Journal of Consulting and Clinical Psychology, 81(5), 751. https://doi​.org​/10​.1037​/a0033559 Bedwell, J. S., Bohil, C. J., Neider, M. B., Gramlich, M. A., Neer, S. M., O’Donnell, J. P., & Beidel, D. C. (2018). Neurophysiological response to olfactory stimuli in Combat Veterans with posttraumatic stress disorder. The Journal of Nervous and Mental Disease, 206(6), 423–428. https://doi​.org​/10​.1097​/ NMD​.0000000000000818 Beidel, D. C., Frueh, B. C., Neer, S. M., Bowers, C. A., Trachik, B., Uhde, T. W., & Grubaugh, A. (2019). Trauma management therapy with virtual-reality augmented exposure therapy for combat-related PTSD: A randomized controlled trial. Journal of Anxiety Disorders, 61, 64–74. Bohil, C. J., Alicea, B., & Biocca, F. A. (2011). Virtual reality in neuroscience research and therapy. Nature Reviews Neuroscience, 12(12), 752. Canning, C. G., Allen, N. E., Nackaerts, E., et al. (2020). Virtual reality in research and rehabilitation of gait and balance in Parkinson disease. Nature Reviews Neurology, 16, 409–425. https://doi​.org​/10​.1038​/s41582​- 020​- 0370-2 Descheneaux, C., Mike McNeil, M., Mouloua, M., & Alicia, T. (2011). Diagnosing HCI trends of the last decade in the medical community. Proceedings of the 55th Annual Meeting of the Human Factors and Ergonomics Society, 55, 1985–1989. https://doi​.org​ /10​.1177​/1071181311551414. Descheneaux, C., Wohleber, R., Harris, S., Matthews, G., Boland, W., Maraj, C., Moss, J., & Krum, D. (2020). Implementation and Assessment Challenges for Virtual and Augmented Reality Displays within the Army Synthetic Training Environment. Report for United States Army Futures Command (AFC), Combat Capabilities Development Center (CCDC), Simulation and Training Technology Center (STTC), Orlando, FL. Howard, M. C. (2018). Virtual reality interventions for personal development: A metaanalysis of hardware and software. Human-Computer Interaction, 34(3), 1–35. Howard, M. C., & Gutworth, M. D. (2020, January). A meta-analysis of virtual reality training programs for social skill development. Computers & Education, 144, 103707. Insider Intelligence. (2016, August 22). The Virtual and Augmented Reality Market Will Reach $162 Billion By 2020. Business Insider. Retrieved from https://www​.businessinsider​ .com ​/virtual​-and​-augmented​-reality​-markets​-will​-reach​-162​-billion​-by​-2020 ​-2016 ​-8​ ?utm ​_source​=reddit.​%20com Irish, J. E. (2013). Can I sit here? A review of the literature supporting the use of singleuser virtual environments to help adolescents with autism learn appropriate social communication skills. Computers in Human Behavior, 29(5), A17–A24. Jensen, L., & Konradsen, F. (2018). A review of the use of virtual reality head-mounted displays in education and training. Education and Information Technologies, 23(4), 1515–1529. Kiger, P. J. (2020, January 6). What is extended reality? The Franklin Institute. https://www​ .fi​.edu​/tech ​/what​-is​-extended​-reality Market Analysis Report. (2021). Virtual reality market size, share & trends analysis (2021– 2028). GVR-1–68038-831-2. https://www​.grandviewresearch​.com ​/industry​-analysis​/ virtual​-reality​-vr​-market Ludvigsen, J., Mouloua, M., & Hancock, P. (2015). Human factors/ergonomics contributions to aerospace systems, 1980–2012. Ergonomics in Design: The Quarterly of Human Factors Applications, October 2015 23(4), 20–22.

Extended Reality in Training Environments

177

Maltby, N., Kirsch, I., Mayers, M., & Allen, G. J. (2002). Virtual reality exposure therapy for the treatment of fear of flying: A controlled investigation. Journal of Consulting and Clinical Psychology, 70(5), 1112–1118. https://doi​.org​/10​.1037​/0022​- 006X​.70​.5​.1112 Marr, B. (2019, August 12). What is extended reality technology? A simple explanation for anyone. Enterprise Tech, 12:23 AM. https://www​.forbes​.com ​/sites​/ bernardmarr​/2019​ /08​/12 ​/what​-is ​- extended​-reality​-technology​-a ​-simple ​- explanation​-for​-anyone/​?sh​ =22441de47249 Matthews, G., Panganiban, A. R., Lin, J., Long, M., & Schwing, M. (2021a). Super-machines or sub-humans: Mental models and trust in intelligent autonomous systems. In C. S. Nam & J. B. Lyons (Eds.), Trust in Human-Robot Interaction (pp. 59–82). Cambridge, MA: Academic Press. Matthews, G., Hancock, P. A., Lin, J., Panganiban, A. R., Reinerman-Jones, L. E., Szalma, J. L., & Wohleber, R. W. (2021b). Evolution and revolution: Personality research for the coming world of robots, artificial intelligence, and autonomous systems. Personality and Individual Differences, 169, 109969. Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & PRISMA Group*. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Annals of internal medicine, 151(4), 264–269. Mouloua, M., Smither, J., Kennedy, R. C., Kennedy, R. S., Compton, D., & Drexler, J. (2004). Visually-induced motion sickness: An experimental investigation. Proceedings of the Human Factors and Ergonomics Society 48th Annual Meeting, 48, 2623–2626. https:// doi​.org​/10​.1177​/154193120404802304 Mouloua, M., Smither, J., Kennedy, R. C., Drexler, J., Compton, D., & Kennedy, R. S. (2005a). Visually-induced motion sickness: Effects of adaptation training. Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting, 49, 2263–2267. https:// doi​.org​/10​.1177​/154193120504902610 Mouloua, M., Smither, J., Kennedy, R. C., Kennedy, R. S., Compton, D., & Drexler, J. (2005b). Training effects in a sickness inducing environment. Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting, 49, 2206–2210. https://doi​.org​/10​.1177​ /154193120504902519 Mouloua, M., Smither, J., & Kennedy, R. S. (2009). Space adaptation syndrome and perceptual training. In D. Vincenzi, J. Wise, M. Mouloua, & P. A. Hancock, (Eds.), Human Factors in Simulation and Training (pp. 239–255). Boca Raton, FL: CRS Press (Taylor & Francis Group). Mouloua, S. A., Ball, R. V., Ferraro, J. C., & Mouloua, M. (2021). The history of human factors in healthcare: From its emergence 50 years ago to COVID-19. Proceedings of the 10th International Symposium on Human Factors and Ergonomics in Health Care, 10(1), 165–169. Mouloua, S. A., Ferraro, J., Mouloua, M., Matthews, G., & Copeland, R. (2019). Trend analysis of cybersecurity research published in HFES proceedings from 1980 to 2018. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 63(1), 1600–1604. Mouloua, S. A., Ferraro, J., Mouloua, M., & Hancock, P. A. (2018). Trend analysis of Unmanned Aerial Vehicles (UAVs) research published in the HFES proceedings. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 62(1), 1067–1071. Owens, M. E., & Beidel, D. C. (2015). Can virtual reality effectively elicit distress associated with social anxiety disorder? Journal of Psychopathology and Behavioral Assessment, 37, 296–305. Parsons, T. D., & Rizzo, A. A. (2008). Affective outcomes of virtual reality exposure therapy for anxiety and specific phobias: A meta-analysis. Journal of Behavior Therapy and Experimental Psychiatry, 39(3), 250–261.

178

Human Factors in Simulation and Training

Pawlyk, O. (2021). The air force’s virtual reality fighter training is working best for 5th-gen pilots. Retrieved from https://www​.military​.com ​/daily​-news​/2021​/03​/26​/air​-forces​ -virtual​-reality​-fighter​-training​-working​-best​-5th​-gen​-pilots​.html Petrov, C. (2022, June 3). 45  Virtual Reality Statistics That Will Rock the Market in 2022. TechJury. Retrieved from https://techjury​.net​/ blog​/virtual​-reality​-statistics/​#gref Scerbo, M. W., Stefanidis, D., Britt, R. C., Davis, S. S., & Stefanidis, D. (2013). A spatial secondary task for measuring laparoscopic mental workload. Simulation in Health Care, 7 558. Scerbo, M. W., Kennedy, R. A., Montano, M., Britt, R. C., & Davis, S. S. (2012). A spatial secondary task for measuring laparoscopic mental workload: Differences in surgical experience. Proceedings of the Human Factors and Ergonomics Society, 57(1), 728–732. Shirer, M., & Soohoo, S. (2020).Worldwide spending on augmented and virtual reality forecast to deliver strong growth through 2024, According to a new IDC spending guide- IDC [(accessed on 12 January 2021)]. Smither, J. A., Mouloua, M., & Kennedy, R. S. (2008). Reducing symptoms of visually induced motion sickness through perceptual training. The International Journal of Aviation Psychology, 18, 326–339. Stowers, K., & Mouloua, M. (2018). Human computer interaction trends in healthcare: an update. First Published June 29, 2018, Research Article. https://doi​.org​/10​.1177​ /2327857918071019 Article information. Vardomatski, S. (2021, September 14). Augmented and Virtual Reality After Covid-19. Forbes Technology Council. Retrieved from https://www​.forbes​.com ​/sites​/forbestechcouncil​ /2021​/09​/14​/augmented​-and​-virtual​-reality​-after​-covid​-19/​?sh​= 6b755aa82d97 Velazquez-Pimentel, D., Hurkxkens, T., & Nehme, J. (2021). A virtual reality for the digital surgeon. In S. Atallah (Ed.), Digital Surgery. Cham: Springer. https://doi​.org​/10​.1007​ /978​-3​- 030​- 49100​- 0​_14 Vincenzi, D., Wise, J., Mouloua, M., & Hancock, P. A. (Eds.). (2009). Human Factors in Simulation and Training. Boca Raton, FL: CRS Press(Taylor & Francis Group). Wong, N., & Beidel, D. C. (2013). Virtual environments in clinical psychology research. In J. S. Comer & P. C. Kendall (Eds.), The Oxford Handbook of Research Strategies for Clinical Psychology (pp. 87–100). Oxford: Oxford University Press.

6

Mitigation of Motion Sickness Symptoms by Adaptive Perceptual Learning Implications for Space and Cyber Environments Mustapha Mouloua, John French, Janan A. Smither, and Robert S. Kennedy

CONTENTS Motion Symptoms................................................................................................... 179 Mitigation................................................................................................................ 181 Individual Differences in Adaptation...................................................................... 184 The Long-Term Retention, Conditioning, and Transfer of Adaptation.................. 186 Long-Lasting Adaptation........................................................................................ 186 Generalizability....................................................................................................... 188 Adaptive Perceptual Learning (APL) Training....................................................... 190 Pre-Adaptation on VR............................................................................................. 191 Pre-Adaptation on Vection Drum............................................................................ 194 Pre-Adaptation on VIMS........................................................................................ 195 Conclusion.............................................................................................................. 197 Acknowledgments...................................................................................................200 In Honorem.............................................................................................................200 References...............................................................................................................200

MOTION SYMPTOMS Motion sickness is an odd term for a constellation of symptoms that result from a reflexive autonomic phenomenon (Muth,2006). For example, it is not a true sickness but a normal response to unusual motion (Money, 1970). Nor does motion sickness require physical motion to produce symptoms such as from visually induced motion DOI: 10.1201/9781003401353-6

179

180

Human Factors in Simulation and Training

sickness or VIMS (Kennedy et al., 2010). It is not a prolonged illness but ends shortly after the unusual motion is ended and seems to serve no homeostatic, biological purpose (Reason, 1978). The principal and most easily recognized symptoms are nausea and emesis but a set of 16 common symptoms have been identified (Kennedy et al., 1993) in a standard test of motion sickness symptoms called the simulator sickness questionnaire (SSQ). Included in the SSQ and a symptom that is often overlooked, is the presence of a prolonged fatigue that is so common it has been named the Sopite syndrome (Graybiel & Knepton,1976) and is sometimes the only symptom associated with unusual motion. Symptoms of motion sickness are experienced by the unique visual-vestibular perspectives that always accompany new forms of locomotion; from sea travel to camels, automobiles to airplanes and now spacecraft and virtual motion through cyber environments (Money, 1972; Benson, 1978). One of the earliest references to motion sickness is evidenced by the name of the primary symptom of nausea, a word that is derived from a combination of Greek words, “nau” for navigation and sea, hence a symptom derived from traveling at sea, often attributed to Hippocrates (Golding, 2016). Many military campaigns throughout history have been won or lost because motion symptoms reduced the effectiveness of the sailors or troops. Napoleon, for example, was thought to have given up his plans for the conquest of North Africa when his troops became “camel sick” from loping along in these ships of the desert (Huppert et al., 2017). Similarly, the defeat of the Spanish Armada in 1588 is often attributed to the seasickness of the Spanish commander and, doubtless, many of his troops. There is a renewed interest in space research in recent years with the advent of commercial space operations, numerous countries sending probes to Mars and the likelihood of permanent colonies on the moon within a decade. A challenge to the efficiency and safety of space operations is space motion sickness reported to afflict about one-half of all astronauts during the initial 24–72 hours of orbital flight (Homick et al., 1984; Ishii, 1993; Nguyen, 1996; Reschke et al., 1998; Thornton et al., 1987). On returning to earth, the astronauts must readapt to terrestrial conditions. Parker et al. (1985) were able to document this readaptation and to link it with altered eye movements. These results, both in space and after return to earth, involve learned perceptual eye movements to compensate for altered inertial environments, providing relief through adaptation. Space motion sickness (SMS) symptoms resemble those from other forms of motion sickness (Money et al., 1984), particularly those which are reported in visual rearrangement studies (Kottenhoff, 1957; Welch, 1978, 2000a), and in ground-based flight simulators (Kennedy et al., 1984b). Cue conflict or neural mismatch theory (Reason, 1970) suggests that this constellation of symptoms is triggered by an uncoupling of expected sensory stimuli (Kennedy et al., 1984a; Oman, 1991; Parker et al., 1985). In other words, the disparity between and within vision, vestibular, and somatic messages can lead to conflict and conflict leads to symptoms (Guedry, 1965; Benson, 1978). Thus, as one initially moves about in the weightless environment, the sensory channels provide atypical  information about spatial orientation and bodily movement, and this sensory conflict leads to nausea and motion sickness

Mitigation of Motion Sickness Symptoms

181

(Ishii, 1993). For example, under normal terrestrial conditions, tilting the head to one side causes the otolith to roll (shear) sideways, a stimulus which is interpreted as “head tilt.” Rolling of the otolith to the front or back, in this case, occurs only under linear acceleration in normogravity and is so interpreted. Under orbital conditions, in microgravity, however, head tilt does not produce a sideways rolling of the otolith and can lead to sensory conflict and subsequent nausea. This otolith tilt-translation hypothesis is a likely source of SMS. Another new form of motion results from head mounted displays, particularly those created in virtual reality environments (VE), and is termed cybersickness, a form of VIMS. Cybersickness has become much more prevalent as the use of VE has increased in military training as it can reduce the costs and dangers of real world training in military preparedness. New ways to examine medical or biological phenomenon, for example, are also opened by VR environments to traverse from the macroscopic to the microscopic or to follow cells using VR technology as they course throughout the organism. The risk of cybersickness seems to be a use limiting factor in the rapid acceptance of this new type of training (Stanney, 2002). The creation of cybersickness is also thought to result from sensory conflict, a mismatch between the perception of virtual motion when the other senses do not sense motion. This perceived self-motion in VE is a neurovestibular illusion called vection (Fischer & Kornmüller, 1930; Nooij et al., 2017). Many studies, for many decades, have explored vection and its relationship with VIMS in experiments using a slowly rotating drum that contains a pattern of stripes. The subject sits stationary at the center of the drum and observes the moving pattern. The viewing of this moving pattern elicits eye movement consisting of a motion to track the stimulus and a return saccade, which (together) is referred to as optokinetic nystagmus (OKN) (Bender & Shanzer, 1983; Kovalev et al., 2020; see also Hu et al., 1997; Hu et al., 1989). Generating motion sickness without physical motion, only illusory motion, in the OKN drum, was first demonstrated by Bárány in 1907 (as cited in Bender & Shanzer, 1983). It is still a research tool much in use today to study VIMS and vection and the relationship between the two. The OKN drum vection effect has recently been demonstrated in a similar VR cyber vection environment (French et al., 2023).

MITIGATION The deeper one looks at the biological explanations of motion sickness, the more confusing it seems to get. There is no firm theoretical foundation for research as there are at least 3-5 competing theories about the causes of motion sickness with fervent proponents on all sides (Bertolini & Straumann, 2016; Bos et al., 2008; WarwickEvans et al., 1998). Another oddity about motion sickness is that there is no effective mitigation strategy against all symptoms for all individuals (Golding & Gresty, 2015), although this has been the goal of centuries of motion countermeasures. Mitigation strategies have proliferated throughout the centuries. Remedies like ginger root and acupressure are ancient (White, 2007; Bertolucci & DiDario, 1995) and have dubious effectiveness other than as a placebo effect. More modern but

182

Human Factors in Simulation and Training

also purported treatments involve vibrotactile stimulation of the vestibular “area” or transcranial electrical stimulation, both of which hope to “rearrange” or “overwhelm” the symptoms producing nausea (Heaney et al., 2018; Weech et al., 2020). Pharmaceuticals such as the anticholinergics are effective but often produce debilitating side effects including amnesia, drowsiness and blurred vision (Dahl et al., 1984; Leung & Hon, 2019). Clearly, a more effective counter measure, devoid of deleterious side effects, is needed. While motion sickness and anxiety are two different phenomena, they share some symptoms in common. For example, sickening situations can elicit anxiety. This may explain why some useful anti-nausea medications are also anxiolytics, such as the phenothiazines (Thorazine, Compazine, Phenergan), or sedating, such as the antihistamines. Some more complex behavioral mitigation strategies for motion sickness also seek to decrease stress or anxiety as the basis of the treatment. Individuals might be aware of their sensitivity to motion and become anxious or embarrassed about suffering the symptoms. A classically conditioned anticipatory response to the sickening situation may develop that includes anxiety and anticipatory symptoms could be produced at the mere sight or smell of the anticipated event. Forms of counterconditioning methods have been used successfully as complex behavioral treatments for anxieties and phobias for over 40 years (Avila et al., 1999; Lane 2009) and, for example, airsickness (Cheung & Hofer, 2005). These techniques first train the individual in muscle relaxation and regular, deep breathing techniques with pleasant visual imagery (Sang et al., 2005). Then they increasingly approximate the anxiety-producing stimulus. Over several days of countering or replacing less stressful behaviors in response to the stressful stimuli, the symptoms can be counter-conditioned or replaced with relaxation (Yen et al., 2005; Dobie, 2019; Koch et al., 2018). In another study, desensitization and biofeedback training and confidence-building exercises over 10 sessions in a VIMS environment were shown to produce significant resistance to exposure to a subsequent VIMS environment (Dobie et al., 1987). The US military has a successful desensitization program involving a variety of motion stimuli that employs this kind of counterconditioning. The US Navy treatment, which includes biofeedback, relaxation, exposure to incremental Coriolis forces, and flying, has reduced air sickness by 85% (Rogers & Van Syoc, 2011). Military services worldwide have developed extensive programs that attempt to desensitize motion sickness responses in susceptible personnel with success rates reported to be greater than 85% (Benson, 1999; Rogers & Van Syoc, 2011; Lucertina et al., 2013). However, these can take many weeks to complete and involve complicated combinations of deep breathing, muscle relaxation training, and biofeedback (Banks et al., 1992; Jones et al., 1985; Giles & Lochridge, 1985; Mert & Bles, 2007). A newer form of counterconditioning is Cognitive Behavioral Therapy (CBT). It is distinguished from the traditional forms in that, rather than imagining pleasant visual imagery, other thoughts are used such as thinking about numbers or poetry or distracting the individual’s thoughts from the stressful stimuli. The success of these counterconditioning treatments in stress reduction implies that they could be successful in mitigating the stress or anticipation of sickness which then may reduce

Mitigation of Motion Sickness Symptoms

183

these secondary symptoms. While we were not able to find evidence that CBT has been tried with motion sickness but at least one study discussed the means to do the studies (Dobie & May, 1994). One of the most effective mitigation strategies for motion sickness is the same as it was in ancient times; adaptation through repeated exposure to the sickness-producing stimulus (Golding, 2016). The simplest and most time-tested countermeasure for most of those afflicted by motion sickness symptoms is a gradual adaptation to the new demands on perception, as one gets their “sea legs” for the new motion. Reason (1978) called this the “vis medicatrix naturae” or the healing power of nature. On ships, in space, and in other constant exposure situations, adaptation and relief of symptoms can occur within a few days (Golding, 2006; Herr & Paloski, 2006) in most individuals. However, the “treatment” requires much discomfort for many hours while adaptation occurs. It implies that something is being learned, consciously or unconsciously, to reduce the discomfort over time with no escape from the exposure. Habituation and adaptation are similar terms to describe voluntary or involuntary mechanisms that allow us to pay less attention, to neurophysiologically respond less, to distracting stimuli in our environment (Baker et al., 2010). There are subtle differences in theories of habituation but both habituation and adaptation refer to central and peripheral mechanisms respectively, by which we physically respond less and less to stimuli which are distracting. Adaptation to motion environments may parallel some early experiments in optically adapting to distortions in the visual field (Mack, 1967; Redding, 1973a; Welch, 1978). It has been well established since the late 1960s that wearing prismatic or mirrored goggles (for example, continuously for days) elicits a remarkable adaptation in which visually-based motion errors decline continuously and behavior returns to normal (Redding, 1973a, 1973b; Welch et al, 1974). People have ridden bicycles or even flown airplanes successfully after a few days of wearing prismatic goggles. One conclusion from these and other such experiments is that the visual-vestibular and kinesthetic senses are able to quickly recalibrate when actively interacting with a visually disturbed environment so long as the environment remains constant over time (Welch, 2000b). This conclusion has been made for virtual environments as well (Stanney et al., 1998; Welch, 2000a). Likewise, virtually all space travelers adapt to the unusual motion conditions that produce motion sickness during the flight, although this adaptation may not be complete for several days (Ishii, 1993; Reschke et al., 1998; Welch, 2000a). Reports of astronauts and mission specialists with more than one flight suggest that adaptation also occurs across flights, whereby symptoms experienced in subsequent missions appear to be less severe than on the first flight (Parker et al., 1985). Procedures for speeding up the rate of adaptation would be useful. Pre-adapting space travelers to sensory conflicts before embarkation to immunize them against space adaptation syndrome might be even better. Vanderploeg, Stewart, and Davis (1985) have shown that of 22 space travelers who had an opportunity for more than one flight, 11 were sick in various degrees on their first flight, and 11 were not. Of the 11 who were not sick, all were symptom-free on their second exposure. Of the 11 who were sick on their first flight, 9 experienced symptoms on their second flight (although to a lesser

184

Human Factors in Simulation and Training

extent) and 2 did not. Even though the time between space flights was protracted, the adaptation obtained on the first flight appeared to carry over to the second. The data from these 22 repeated-measures subjects implies high reliability for adaptation to space sickness. The calculated reliability for this outcome is greater than r = 0.82. Considering this level of criterion reliability, the inability to predict who will become sick and who will not (Money et al., 1984) suggests that all of the relevant factors are not included in the prediction. The mitigation strategy discussed in this chapter, Adaptive Perceptual Learning (APL) training, assumes that there is a substantial transfer of training when one adapts, even incompletely, to one form of motion event that will aid in adapting to another; the more similar the environments are, the more successful this strategy should be. This approach assumes that all forms of motion sickness represent a disorientation of the visual-vestibular perception pathways induced by unusual perceived motions, in real or virtual environments, with or without physical movement. Therefore, the countermeasures for one should apply to the other. This could mean that pre-adapting to a less stressful stimulus might make it easier and faster to adapt to a more stressful situation in the near future. Pre-adapting astronauts to the visual/vestibular conflicts before embarkation might immunize them against space sickness. In cyberworlds, proceeding slowly through nauseogenic virtual landscapes or skipping over them at first may make it easier the next time to face that landscape. Examining the evidence for this proposal is the subject of this chapter.

INDIVIDUAL DIFFERENCES IN ADAPTATION It is probable that different individuals adapt at different rates (Welch, 2000a). Evidence for this view is clear-cut in the motor learning literature (Jones, 1970; Kennedy et al., 1980), but it is less obvious in the perceptual learning literature. What we do not know are the rules of how adaptation transfers over environments and whether some individuals possess more of this ability than others do. The work of Graybiel and Lackner (1983) comes closest to what we mean about transfer of adaptation. They showed changes in performance of two types: (a) that individuals adapt similarly to three different provocative measures of motion sickness (Lackner & Graybiel, 1984) and (b) individuals’ rates of acquiring and losing adaptation are consistent in different situations (Graybiel & Lackner, 1983). Human beings vary in both the speed and magnitude of their adaptation to a given perceptually distorted environment (Welch, 1978, 2000a, 2000b). Furthermore, although there are very few studies relevant to this issue, it appears that these individual differences are reliable over time. Test-retest correlations average about .75 for adaptation to prismatic displacement (Redding, 1973a; Welch et al., 1974) and optical tilt (Redding, 1973b; Mack, 1967), whereas a split-half reliability coefficient of .83 has been obtained for adaptation to head movement-induced illusory motion of the visual field while wearing right–left reversing goggles (Kottenhoff, 1957). Crawshaw and Craske (1976) correlated prism adaptation in two different experimental situations. Their study may be flawed in that they used the terminal level of performance as an index of adaptation. Their correlation values between the two

Mitigation of Motion Sickness Symptoms

185

conditions were .17 and .19. One might argue that a better index would be the acceleration of the acquisition curve (slope) in the two conditions. Conceivably, rapid adapters would adapt more quickly in both, regardless of where their terminal level performances were. Terminal level is only, to some extent, an index of adaptation. The second issue that one might argue is that, conceivably, the reliability of the measure of adaptation is also imperfect. To some extent, this would also reduce the overall correlations. It is recognized that several other perceptual events and situations result in adaptation (for example, delayed auditory feedback (Katz & Lackner, 1977) and that similar principles would apply from a perceptual integration standpoint. Thus, there appear to be “quick adapters” and “slow adapters” (and those in between), at least in Welch’s (1978) data. A question of interest is what, if any, personal characteristics correlate with (and therefore are predictive of) these different “adaptive styles.” In brief, the answer is that very few of the more obvious characteristics have been found to predict adaptability. Gender and age do not seem to be related to adaptation rate (Welch, 1978). Neither do many of the well-known “paper-and-pencil test” measures of personality. For example, Welch (1978) reported a study in which level of adaptation (as measured in several different ways) to prism-displaced vision failed to correlate with scores on the California Psychological Inventory, the Trait Anxiety Scale, the Achievement Anxiety Test, the Tennessee Self-Concept Scale, the Internal–External Locus of Reinforcement Test, and the Extroversion Scale. On the other hand, Kottenhoff (1957) did obtain a correlation of .72 between degree of introversion/extroversion and level of adaptation to loss of visual position constancy while wearing right–left reversing goggles. Introverted subjects experienced an increase in illusory motion, whereas the more extroverted subjects showed either no change or a decrease (i.e., adaptation). It is of more than passing interest that extroversion as measured by forms of the Maudsley Personality Inventory has a test-retest reliability of only r = .60 (Kennedy, 1972) so that if one were to correct Kottenhoff’s correlation for adaptation (Guilford, 1954), he would have >95% of the variance accounted for in his predictor. With the exception of Kottenhoff’s (1957) experiment, adaptability to distorted environments has not been predictable based on general personality characteristics. A more fruitful strategy for detecting such correlates may be to perform a microanalysis of the specific perceptual and perceptual-motor behaviors required by a particular distorted environment. Thus, for example, the common laboratory situation of reaching for targets while wearing light-displacing prism goggles involves, among other things, the ability to accurately fixate the visual target (i.e., where it appears to be through the goggles), to accurately guide the hand to the target, to accurately gauge the initial prism-induced reaching error, and to correct the error. It has been shown (Warren & Platt, 1974; Welch, unpublished data) that people differ reliably in their ability to do each of these subtasks and, more importantly, that these differences are correlated with subsequent adaptation. For example, Warren and Platt (1974) found that people who have good control of their eyes (i.e., are able to very accurately fixate the visual target) but relatively poor control over their reaching responses, reveal little visual adaptation to prismatic displacement and

186

Human Factors in Simulation and Training

commensurately greater proprioceptive adaptation. Just the reverse proportions of these two types of adaptation were obtained for subjects with poor eye control, but good hand control. Both Welch (unpublished data) and Warren and Platt (1974) have reported that the instruction to point at a prism-displaced target is not interpreted in the same way by all subjects. Specifically, some subjects take this to mean that, after the initial prism-induced error, they should quite deliberately point to where they know the target to be physically located. Frequently, these so-called “object pointers” aim for a location that actually appears to them to be off to one side of the displaced image and, by so doing, very quickly succeed in pointing accurately, sometimes as soon as the second trial. Other subjects continue to point to where the target appears to be located, only gradually correcting their errors. It is perhaps significant that these “image pointers” also fail to show as large a post-exposure negative aftereffect as the “object pointers,” indicating that they have achieved less substantial adaptation.

THE LONG-TERM RETENTION, CONDITIONING, AND TRANSFER OF ADAPTATION A number of studies have demonstrated a close relationship between adaptation to perceptual rearrangement and traditional situational learning (Welch, 1978). Of present interest is the evidence concerning the degree to which adaptation (a) exhibits transfer to new situations (stimulus and/or response generalization), (b) is retained for relatively long periods of time, (c) reveals “savings” on subsequent “relearning” sessions, (d) is subject to discriminative conditioning, and (e) can be maintained for two (or more) different distorted environments at the same time. Although we are aware that the adaptive responses measured in these studies may not be identical to those occurring in space sickness, it is felt that the research discussed in this section will suggest the issues, kinds of tasks, training regimes, and measures that will prove useful in providing astronauts and other space travelers with some degree of generalized “inoculation” against the perceptual and perceptual-motor disruptions caused by the environment of micro and macro gravity to which they are exposed. In principle, adaptation, like learning, can be measured immediately or sometime after. Second, it can be measured by means of the same tasks and stimulus conditions in which it was acquired or by other situations to which it may (or may not) generalize. The two present concerns are (a) evidence for the existence of long-term adaptation effects and (b) an assessment of the degree to which adaptation generalizes to other tasks and types of perceptual rearrangement.

LONG-LASTING ADAPTATION The traditional assumption in experiments on adaptation to visual rearrangement has been that, because such adaptation involves the contradiction of a lifetime of normal visual and visuomotor experience, it must necessarily be fragile, short-lived, and easily abolished by the reinstitution of undistorted vision. This assumption has

Mitigation of Motion Sickness Symptoms

187

led most investigators to test for adaptation as quickly as possible after the exposure to visual rearrangement has occurred. There are, however, a number of observations that belie the notion that adaptation is a short-lived phenomenon. First, it has been casually observed in a number of experiments involving two or more adaptation sessions spread over a period of time that, on sessions subsequent to the first one, subjects will manifest some initial (albeit partial) adaptation as soon as the distorting goggles are in place. This observation has been reported for adaptation to prismatic displacement in terms of (a) visuomotor aftereffects and reduction of effects (Hein, 1972; Klapp et al., 1974; Lackner & Lobovits, 1977; Welch et al., 1974; Wooster, 1923), (b) shifts in visual direction (Welch et al., 1974), and (c) modifications of felt eye position (McLaughlin & Webster, 1967). It has also been reported for visual adaptation to optical curvature (Festinger et al., 1967; Slotnick, 1969). In every instance, the effect was unexpected and only mentioned as a secondary finding of the experiment. Similar to the preceding phenomenon is the fact that in several experiments, adaptation has been found to increase in strength over a series of adaptation sessions, each of which is separated by an extended period of normal vision (Kinney et al., 1970a; McGonigle & Flook, 1978; Peterson & Peterson, 1938; Snyder & Snyder, 1957). Observations of a more controlled nature concerning the ease with which some people are able to shift from one perceptual environment to another come from studies of underwater perception. Luria and Kinney (1970) and Luria et al. (1967) have shown that professional divers experience much less initial face mask-induced visual distortion when entering the water and less visual aftereffect when leaving the water than is true for inexperienced divers. The most extensive study of long-term adaptation has been carried out by Jones and Holding (1975). They used pattern contingent color aftereffects and showed that adaptation magnitude declines only with testing. By using a series of post-adaptation time delays, they were able to show that significant adaptation effects could be observed for months after a single 15-minute adaptation period. Harris (1980) has suggested that other types of adaptation may also last for extended periods. Savreau (1979) has demonstrated that motion contingent color effects last at least a week, and Wolfe (1985) has demonstrated that over 4 minutes of adaptation leads to a longlasting tilt aftereffect. Boynton and Das (1966) report a related event. The presence of partial adaptation many months after the original exposure to the visual rearrangement and the ability to maintain adaptation to two different perceptual situations simultaneously may be interpreted in several ways. Some investigators (e.g., Held, 1968; Klapp et al., 1974) have assumed that these effects represent the persistence of adaptation (i.e., incomplete decay). This seems unlikely, however, since there is typically fully sufficient interpolated normal visual and visuomotor experience to completely abolish the adaptive shift. One likely alternative is that adaptation (partial or complete) can be conditioned to the situational cues associated with it and is elicited whenever the observer is once again in the presence of these cues. A second, although not mutually exclusive, possibility is that after extensive experience subjects develop a perceptual or perceptual-motor flexibility by which

188

Human Factors in Simulation and Training

they can easily shift from an un-adapted to adapted state, and vice versa, or between two different states of adaptation as soon as they identify the particular environment in which they have been placed. In short, they have learned to adapt, an ability that might be referred to as an “adaptation set,” analogous to the more familiar but almost forgotten “learning sets” (e.g., Harlow, 1959). Harlow referred to learning sets as a way in which primates and other highly intelligent organisms “learned to learn” (Harlow, 1949). Intelligent creatures, he argued, had the ability to pick out patterns or procedures, consciously or not, that improved the ability to recognize or respond to similar patterns in the future.

GENERALIZABILITY Traditional studies of adaptation to perceptual rearrangement have used tasks that are rather similar, if not identical, to those practiced during the adaptation period (Welch, 1978). Consequently, little is known about the degree to which adaptation might transfer to tasks that are very different from those encountered during adaptation. Likewise, there is scant information concerning whether one’s adaptation to one form of perceptual rearrangement will either transfer to, or predict, one’s adaptability to another form of rearrangement. Melamed et al. (1979) discuss the fact that “the total prism shift in target pointing is equal to the algebraic sum of the shifts in the other two measures” (e.g., visual shift and proprioceptive shift—the equation being TP = VS + PS). Although a twocomponent linear additive model of prism adaptation is attractive, one wonders whether the effect of spacing the subject’s responses during the exposure period may not be a factor. Regarding the first of these two aspects of generalization, Kinney et al. (1970b), using prismatic displacement, combined three exposure activities with the same three tasks used as pre and posttests of adaptation. The tasks were (a) placing a small chess piece marker on a square within a checkerboard grid, (b) reaching under a transparent table for a target, and (c) rapidly spearing a bull’s-eye with a wooden dowel. Every subject was measured on all three tasks during the pre- and post-exposure periods, but engaged in only one of them during 5 minutes of prismatic exposure. The greatest amount of adaptation (about 65% of the total possible compensation) occurred for the trained task. The nontrained tasks for a given exposure condition revealed generalized adaptation, but with some decrement. Redding (1973a, 1973b, 1975a, 1975b) has examined the second issue of generalization: whether adaptation to one form of distortion will influence adaptation to another form. He found that when subjects were confronted (in a single session) with a visual field that was, simultaneously, prismatically displaced and tilted, adaptation to each of these distortions occurred at the same rate as when subjects were adapted to each separately. Furthermore, the magnitude of subjects’ adaptation to one distortion was not correlated with the magnitude of their adaptation to the other. Thus, it would appear that displacement and tilt adaptation are independent processes that do not transfer to one another. Because the perception of visual location and orientation may be based on qualitatively different processes, the preceding failure of transfer may not be too surprising. Alternatively, perhaps for transfer to occur from one type

Mitigation of Motion Sickness Symptoms

189

of adaptation to another it is necessary to implement a much more extensive training regime on each, perhaps alternating between the two types of distortion. Jell et al. (1985) reported a reduction in human optokinetic after nystagmus in one direction or another, depending on the exposure history of the subject. It is wellknown that optokinetic after nystagmus can be reduced due to damage or destruction of the labyrinth and by lesions in parts of the parahypoglossal nuclei or pretectum. The 1985 Jell, Ireland, and LaFortune study also revealed changes, but the authors do not comment on whether this change is due to lowered arousal or mere drop-off in the values of cumulative eye displacement, duration, or slow phase nystagmus. The authors conclude that this is simple habituation and used cumulative displacement as their most sensitive parameter. The authors suggest that “psychological habituation” (Collins, 1974) may have been a factor. In a 1987 study (Kennedy et al.), subjects were adapted to Purkinje stimulation (Benson & Bodin, 1966) involving approximately 0.5 minutes of bodily rotation, followed by a head turn about an axis orthogonal to that of the preceding rotation. This situation produces dizziness, illusory visual motion, and difficulty walking. The experience is similar to the effects of Coriolis stimulation, except that with the latter the head movements take place during rotation rather than afterward. It was first hypothesized that repeated exposure would cause a decline in the experience of the effects from the Purkinje stimulation. The question was then to see whether this adaptation would transfer to a situation of so-called pseudo-Coriolis (Dichgans & Brandt, 1973) stimulation in which, instead of the subject being rotated, the surrounding visual field is turned and the subject moves his head. The Kennedy et al. (1987) study was designed to evaluate whether adaptation acquired in one stimulus condition involving unusual vestibular stimulation would transfer to another condition where similar, but not identical, conflicting inputs were presented. The amount of transfer was significant, and somewhat unexpected, because in previous studies the hallmark had been the specificity of adaptation (Guedry, 1965). The training condition entailed bizarre stimulation of the cupula endolymph system from the post-rotatory effects (the Purkinje stimulus). The adaptation to this bizarre stimulation transferred to a condition in which the stimuli to the canals and otoliths are the same as would occur with no physical rotation present. This fact implies that the transferred adaptation was not merely some form of suppression or fatigue at the sensory level but a higher-order modification within the central nervous system. Possibly this is the source of its generalizability. (Dobie & May, 1990; Dobie et al., 1990) successfully replicated the Kennedy study where they found that subjects exposed to bodily rotation exhibited increased tolerance to visually induced self-vection (VISV). However, exposure to VISV did not result in greater tolerance to bodily rotation. Harm and Parker (1994) examined the relationship between perceptual reports obtained during a space mission and in preflight adaptation trainer (PAT) devices. Perceptual reports from the astronauts indicated that the PAT device had features similar to those encountered in microgravity. The reports also suggested that these similarities reduced some of the symptoms of space motion sickness during space flight. Welch et al. (1998) examined the possibility that the human vestibulo-ocular reflex (VOR) is subject to dual

190

Human Factors in Simulation and Training

adaptation (the ability to adapt more completely after repeated exposure to sensory rearrangement) and adaptive generalization (the ability to adapt more easily to new sensory rearrangement because of prior dual adaptation training). These researchers showed both adaptation and dual adaptation of the VOR, but no adaptive generalization when tested with a target/head gain of 1.0. Clearly, there is little research concerning the generalizability of adaptation to perceptual rearrangement as it applies to space motion sickness. More studies are needed on this issue, particularly (given the present concern) with long-term generalization.

ADAPTIVE PERCEPTUAL LEARNING (APL) TRAINING The plasticity of the central nervous system permits humans to adapt to temporary ecological changes. These short-term accommodations may be considered under the general rubric of “adaptation to the environment.” Welch (2000b) concludes that Human beings (and perhaps mammals in general) are able to adjust their behavior, and to a much lesser extent their visual perception, to any sensory rearrangement to which they are actively exposed, given that this rearrangement remains essentially constant over time.

Several texts (Welch, 1978; Dolezal, 1982) and reviews (Harris, 1965; Kennedy, 1970; Lackner & Dizio, 1998; Held, 1965; Welch, 2000a, 2000b) make important points, but are silent concerning the implications for the adapting to space sickness. A workshop (McCauley, 1984) and other literature (Kennedy & Frank, 1986; Kennedy et al., 2001) have found this line of investigation useful in understanding simulator sickness, and the same point has been made for the virtual environment technology (Stanney et al., 1998; Welch, 2000a). For space sickness, it is important to know whether any studies have shown transfer of adaptation from one environment to another—not merely adapting to one environment. The literature studying the transfer of adaptation between two conditions is scant (Welch, 2000a, 2000b), but some studies are available (Dobie & May, 1990; Dobie et al., 1987; Dobie et al., 1990; Fineberg, 1977; Fried, 1962; Fregly & Kennedy, 1965; Goodenough & Tinker, 1931; Graybiel & Lackner, 1983; Harm & Parker, 1993; 1994; Taub & Goldberg, 1973; Welch et al., 1998). We would argue that since humans are adaptable, the effects of almost any environmental stressor on performance physiology will change over time, and adaptation will ensue and follow certain rules. If these rules were known, predictions could be made. Space sickness develops in conditions in which nauseogenic stimuli are present for a long period. The perceptual situation of an astronaut or pilot exposed to unusual gravitational inertial forces (including zero and subzero gravity) for some period has been compared in many ways to that found in experiments involving perceptual rearrangement, such as optically induced displacement, curvature, tilt, or right–left reversal (Welch, 1978, 2000a). In both instances, the observer is confronted with a variety of inter and intrasensory conflicts that initially disrupt perception and

Mitigation of Motion Sickness Symptoms

191

behavior, and may cause nausea (Dolezal, 1982). Likewise, in both situations people reveal an ability to adapt to these imposed conflicts, as manifested in a reduction or elimination of the initial disruptive responses. Thus, overcoming motion sickness, correcting performance, and regaining normal perception when one is subjected to unusual gravitational forces may involve many of the same processes as adaptation to perceptual rearrangement in general. The similarity between the processes of overcoming space sickness and experimentally imposed perceptual rearrangement provides the motivation for the present approach to perceptual learning. Based on software we developed to rapidly reconfigure virtual reality (VR) devices in our laboratory (Kennedy et al., 2001), we have obtained evidence that changing specific aspects of the VR device—gain, polarity, head tracking, phase relation, and transport delay—produces systematic and replicable changes in the incidence and severity of motion sickness symptomatology. In other words, with software modification, we have developed VR research to study the perceptual rearrangement problem where we have been able to develop quantifiable dose– response relationships that have been successful in eliciting graded motion sickness responses among our participants. The present study used this software to rapidly reconfigure a VR device in order to develop a paradigm for reducing the symptoms of space motion sickness through perceptual training. Either graded motion sickness was induced through the systematic distortion of the relevant characteristics of the VR device or repeated exposure to self-propelled rotation trials was used until adaptation was attained. The generalization of this adaptation was then tested with the use of an optokinetic nystagmus drum. More specifically, we created a pseudo-Coriolis condition through the VR device by reversing head-tracking polarity. This was done with an adaptation protocol where symptoms were kept at a manageable level. Subsequently, we attempted to transfer this adaptation to a pseudo-Coriolis condition induced by a vection drum rotating at 120° per second. Through this process, we set out to demonstrate the feasibility of this Phase I research which was to transfer perceptual adaptation acquired in one environment and relief of symptoms in environments not yet experienced.

PRE-ADAPTATION ON VR Twenty adults (10 males and 10 females) ranging between the ages of 18 and 34 were tested on the effect of pre-adaptation training on virtual reality and vection drum exposure. The training consisted of a simulated rotary stimulation (SRS) procedure in which, participants were asked to raise their right hands above their heads and grasp their right earlobe with their left hand, bend at the waste and spin in a clockwise direction under self-propelled condition. The participants spun 10 times in 30 seconds (10 RPMs), and this constituted a trial. One or more moderators were always available to support unsteady performers. After standing, they were asked to rate their dizziness and walk a seven-foot line on the floor. The steps taken were counted until the participant stepped away from the line. The SSQ is a self-report checklist consisting of 27 symptoms that are rated by the participant in terms of degree of severity on a 4-point Likert-type scale (Kennedy

192

Human Factors in Simulation and Training

et al., 1993). Participants were asked to complete an SSQ before the exposure to the virtual reality device. The SSQ was then administered following VE exposure and following OKN drum exposure. For the experimental group, the study was conducted in five sessions over 5 days. On the first 4 days, participants in the experimental group experienced five trials of the SRS that lasted for about 2 hours. In the fifth session on the final day (the only day in the control case), control and experimental subjects were exposed to the VE and to pseudo-Coriolis in the OKN rotating drum (Control subjects experienced one SRS to establish their baseline). Following each task in the study (SRS, VR, and OKN drum), the participants were given an hour of post-testing (SSQ, past pointing, and posture tasks) at 0-, 30-, and 60-minute intervals. Following completion of the informed consent, each participant was given three questionnaires to evaluate his or her eligibility to participate in the research: (1) Research Participant Information Questionnaire, (2) Simulator Sickness Questionnaire, and (3) Motion History Questionnaire. Once participants were deemed eligible to participate, they performed the pre-exposure tests: postural stability, past pointing, and vestibuloocular reflex test (OR). (NB., the apparatus-based tests [posture, past pointing, and VOR] and the paper-and-pencil questionnaires are not taxing to the participant, nor conducive to discomfort, and reports of their usage abound in the scientific literature. Norms are available for these tests from >1600 cases.) After participants exited the virtual environment, they were asked to complete the post-simulator sickness questionnaire (SSQ), which served as our main dependent variable, followed by the more objective (posture and past pointing) tests. Participants were required to remain at the test site for at least 60 minutes following the virtual reality exposure to ensure that any effects experienced because of the exposure have dissipated. During this time, the SSQ was administered at 15-minute intervals, and the posture and past pointing tasks were administered at 30 minutes and 60 minutes following VE exposure. The postural stability and past pointing performances were compared to scores before VE exposure to verify that they were not noticeably different. Additionally, subjects were asked about their physical condition. If they requested that they remain, they were allowed to stay at the experimental site until adverse feelings subsided. If the researcher determined that they might need further time to recuperate, participants were advised to remain at the experimental site until the symptoms subside or they had to have a means of transportation away from the experimental site other than themselves. However, before being permitted to leave the experimental site, participants could not be experiencing any characteristic symptoms of motion sickness (reported on the postSSQ) or postural disequilibrium. Participants remained in the laboratory until all symptoms had subsided. In session two, participants from both groups were asked to enter the OKN drum and to be seated facing forward. They were then instructed on how and when to use the response key in the drum. Participants were then asked to close their eyes until the experiment began. The participants were then told to open their eyes and gaze directly at the rotating inner surface of the drum until a perception of circular

Mitigation of Motion Sickness Symptoms

193

self-motion (CV) was experienced. Once CV was experienced, participants signaled its presence by pressing the handheld button. Next, while the drum continued to turn, participants were asked to tilt their head 45° toward the left shoulder and to rate their dizziness. This pseudo-Coriolis stimulation has been shown to induce motion sickness (Dichgans & Brandt, 1973). Each participant then turned his/her head upright and made another rating. This procedure was repeated for the right shoulder and again upright before the drum was stopped (total time about 2 minutes). The drum rotated at a velocity of 120° per second, a rate we knew would produce substantial pseudo-Coriolis experience at a 1 second head tilt to 45° and then 45° return to upright in 1 second. This procedure allowed us to repeat, in a 30-minute session, this sequence enough times so that adaptation ensued without losing a substantial number of subjects due to emesis. After participants exited the OKN drum, they were asked to complete the postSSQ, which served as our main dependent variable, followed by the more objective (posture, past pointing, and VOR) tests. Participants were required to remain at the test site for at least 60 minutes following the OKN drum exposure to ensure that any effects experienced because of the exposure have dissipated. During this time, the SSQ was administered at 15-minute intervals, and the posture, past pointing, and VOR tests were administered at 30 minutes and 60 minutes following the OKN drum exposure. If necessary, additional tasks (games involving eye–hand coordination) were given to the participant to aid the participant in readapting to the natural environment. The postural stability and past pointing performances were compared to scores before (pre-SSQ) VE exposure in order to verify that they were not noticeably different. Subjects were tested prior to being released to go home. Figure 6.1 shows, the difference in the dizziness rates was higher among the control group than the experimental group showing transfer of adaptation into the

FIGURE 6.1  Mean dizziness post-VR exposure.

194

Human Factors in Simulation and Training

FIGURE 6.2  Mean SSQ score post-VE exposure.

virtual reality condition as a function of prior simulated self-propelled rotary stimulation exposure. Similarly, the analysis also showed the same effect of adaptation training on VR as reported in the simulation sickness questionnaire (SSQ) following VR exposure. As Figure 6.2 shows, higher simulation sickness ratings were reported by the control (mean = 44.25) than the experimental (mean = 9.72) group and these values compare favorably to scores from subjects exposed to space and sea sickness where similarly high values are obtained. The experimental group in this study exhibited scores that resemble or are lower than the scores of experimental pilots when exposed to flight simulation and control subjects exhibited higher scores than that group.

PRE-ADAPTATION ON VECTION DRUM The MANOVA showed a significant effect of adaptation training on vection drum exposure, indicating that the experimental group who had prior training with simulated rotary stimulation and VE exposure reported lower rates of dizziness (mean = 1.63) than the control group who did not experience simulated rotary stimulation and VE exposure (mean = 3.92). As Figure 6.3 shows, the difference in the dizziness rates was higher among the control group than the experimental group showing adaptation in the vection drum (OKN) condition as a function of simulated rotary stimulation and VE exposure. Similarly, a MANOVA yielded a significant effect of adaptation training on simulation sickness (SSQ) following the vection (OKN) drum exposure. As Figure 6.4 shows, higher simulation sickness ratings were reported by the control (mean = 61.71) than the experimental (mean = 17.20) group.

Mitigation of Motion Sickness Symptoms

195

FIGURE 6.3  Mean dizziness rating postvection exposure.

FIGURE 6.4  Mean SSQ score postvection exposure.

PRE-ADAPTATION ON VIMS These studies indicate that pre-exposure to milder conditions that can elicit motion sickness can reduce the symptoms experienced on a second more intense exposures. Others have found similar effects. For example, pre-exposure to a shorter duration OKN event with fewer degrees per second (dps) will reduce symptoms on a subsequent, longer and greater dps (Hu et al., 1991b). Clément and Deguine (2007) required provocative pitch forward and back movements during exposure to a constant velocity

196

Human Factors in Simulation and Training

rotational environment over 5 days. They found that symptoms were reduced with each exposure. In another study, exposure to a rotating optokinetic drum reduces subjective measures and physiological measures with subsequent exposures (Hu et al., 1991a). Following training on a novel visual-spatial task, Smyth, et al., were able to reduce the incidence of carsickness in a driving task (Smyth et al., 2021). Finally, Pre-exposure to a variety of visual-spatial orientations on a VR based space navigation task compared to a non-varying orientation revealed that a variety of orientation tasks improved orientation in a subsequent VR environment (Stroud et al., 2005). Data collected for a separate study (French et al., 2023) was re-examined to determine if APL training occurred. In the separate study, a VR version of the B&W stripe pattern on a traditional OKN drum was compared to a traditional drum for differences in SSQ symptoms. There were no differences between VR OKN and drum OKN but both showed significant effects compared to pre-OKN SSQ scores. This indicates that VR OKN produces the same symptom severity as traditional Drum OKN and allows the VR OKN to be used as a replacement for drum OKN. This is beneficial since more people can be tested at a time in VR OKN than drum OKN and VR is more mobile and less expensive to use. Figure 6.5 shows the different OKN induction methods of VR or drum.

FIGURE 6.5  (a) Participant wearing EOG electro-oculogram electrodes sitting in the drum OKN device while the B&W stripes rotate around them. (b) The image on the screen is a 2D representation of the 3D HMD

Mitigation of Motion Sickness Symptoms

197

FIGURE 6.6  SSQ results for the four dimensions of the SSQ. Box plot shows median scores +/− max and min score after 10 minutes in the OKN device the first and second time.

In order to prevent order effects, half the participants in this study received VR OKN first and the other half received drum OKN first. Three days later, the OKN was reversed and those receiving VR first got drum OKN, those receiving drum first got VR OKN. This allowed us to compare first OKN, either VR or drum, as the pre-exposure training of APL. The second OKN would give us an indication if APL worked (the SSQ scores should decrease on the second OKN) or not (the SSQ scores would remain the same as they did in the comparison study between VR and Drum OKN, or increased from the novel OKN exposure). Figure 6.6 shows box plots (median ± min and max SSQ) results for all 4 SSQ dimensions. The Nausea and Total dimensions were significantly decreased (p < 0.032) on the second OKN exposure following a one-tailed, Wilcoxon matched pairs signed rank test. The other two dimensions, Disorientation and Oculomotor, can be seen in the figure to be in the same direction but not significant from first OKN to second. These results are comparable to the SRS, VR, and drum results described earlier and support an APL interpretation of the results.

CONCLUSION These kinds of results have been emerging sporadically for many years but no formal description of a technique like Adaptive Perceptual Learning training has been put forward. We propose the term “APL” to help organize and focus the approach to studying motion sickness mitigation methods. More studies need to explore this phenomenon which may break the log jam of progress into what causes motion sickness in general, and what applied treatments are available. APL represents a simple, pre-treatment with less severe environments to help reduce symptoms experienced in subsequent and more severe environments. It should be remembered that this is not a new idea. The term “perceptual learning” was originally defined by Eleanor Gibson (1963). She identified three requirements for perceptual learning to have occurred:

198

Human Factors in Simulation and Training

the learning must be perceptual in nature rather than say, consciously learned, it must be long lasting, and it must be the result of practice or prior experience (Gibson, 1963). Rather, we are suggesting that the lack of coordinated research efforts into the technique may be due to the fact that the procedure has not had a name. Scientists could not discuss implications and tests of the idea with a clear idea of what body of literature, like that presented in this chapter, was being referenced. Unlike previous findings by Guedry (1965), the present study reconfirmed our previous findings that the adaptation transfer is not task specific and could be extended to tasks that are not identical. Moreover, the present findings are also consistent with previous results by Dobie et al. (1990) who reported increased tolerance to visually induced self-vection as a function of bodily rotation exposure. Although it is clear from the present findings that exposure to bodily rotation is beneficial for reducing motion-related dizziness symptoms, it is not well understood whether this adaptation phenomenon can manifest itself in both directions. Previous studies have not extensively studied the double-direction effect of perceptual learning and adaptation in distorted environments. One reason may be that there is more “reafference” (von Holst, 1968), in the self-propelled condition and less in the more passive VR and vection conditions. This difference should be examined in future research. Similarly, the study data reported above, partially supports the results by Harm and Parker (1994) and Welch et al. (1998) who previously reported both adaptation and dual adaptation of the VOR but failed to obtain adaptive generalization. Our findings suggest that the transferred adaptation may be a higher-order modification within the central nervous system, which in turn may account for its generalizability. The results also point to the need of further examining individual differences in the rate of adaptation. Some people may be more prone to simulation sickness than others and, therefore, identifying the traits for “adaptability” would have several practical implications for adapting to space sickness and other situations entailing perceptual adjustment. Notably, an adaptation training program in the form of virtual environment or vection drum may help alleviate several of the motion symptoms found in other visual-vestibular conflict environments. For example, Vanderploeg, Stewart, and Davis (1985) have previously reported that of the 22 space travelers who have had an opportunity for more than one flight, 11 were sick in various degrees on their first flight and 11 were not, and subsequently were symptom-free on their second exposure. These findings clearly indicate the need for pre-adaptation training of those who are prone to simulation sickness-type symptoms for a variety of applications including NASA astronauts. These findings are consistent with previous findings by Kennedy et al. (1987) who similarly reported that prior adaptation to rotary simulated or Purkinje stimulation transferred to pseudo-Coriolis as was demonstrated by the large difference in reported dizziness between the control and experimental conditions. In this study, Kennedy and his associates used only a self-propelled turning test and transferred to a vection drum condition. These findings suggest that training in the form of simulated self-propelled rotary stimulation and virtual environment exposure help reduce the level of sensory rearrangement found or experienced in certain simulation sickness-related tasks.

Mitigation of Motion Sickness Symptoms

199

The visual and behavioral adaptation that occurs to a distorted environment described above, where one can quickly adapt to prismatic glasses or rearranged visual fields, is remarkable in that it occurs quickly, resetting a lifetime of normal visual experience in just a few days. Many experiencing the discomfort of cybersickness terminate their experience without slowly trying to adapt to it. The evidence shows that the adaptation is a long-term phenomenon, if established, (Klapp et al., 1974; Lackner & Lobovits, 1977) and may strengthen the longer it is continued (McGonigle & Flook, 1978). There is good evidence of cross-adaptation in that exposure to one visual condition significantly reduced the adaptation rate in another but similar condition (Kennedy et al., 1987). The implications are that (a) habituation to nauseogenic stimuli is possible and rapid, although with a strong individual component and (b) that habituation may generalize to a similar but not identical environment. With regards to the last point, one could conceivably train repeatedly on a milder version of the virtual stimulus that is causing the nausea and rapidly adapt to a comparable but more difficult stimulus. Although provocative tests of motion sickness reveal generally positive correlation with “Zero G”-induced sickness from parabolic flights (Reschke et al., 1984), there is a substantial amount of unexplained variance. Kennedy (1970) and Reason and Graybiel (1972) suggest that adaptability is a strong predictor of susceptibility to motion sickness. Specifically, rather than looking for overriding personality characteristics as potential correlates of individual differences in motion sickness, emphasis could be placed on a careful assessment of people’s “perceptual adaptation traits,” performances, and idiosyncratic behavioral tendencies. For example, the number and/or extent of incidental head movements that a person makes during a baseline (non-motion) period might correlate with the amount of motion sickness experienced in a subsequent motion environment. Another potential correlate of motion sickness response might be the observer’s characteristic absolute and difference thresholds for visual and vestibular motion, each measured separately in a non-motion environment. Perhaps people who are especially sensitive to unconflicted visual and/or vestibular motion will respond more dramatically or quickly and/or adapt more gradually to a motion environment in which these two senses are placed into conflict than will individuals with higher thresholds. The provocative vestibular tests employed by various scientists (Lackner & Graybiel, 1984; Lentz, 1984; Lentz & Guedry, 1978; Oman et al., 1984; Reschke et al., 1984) generally entail the assessment of motion sickness symptomatology, including vomiting, following a strong, abrupt (usually less than 30 minutes), and relatively unpracticed stimulus. Yet, in their current form, these tests do not assess after-reactions, adaptive capacity, or adaptive retention (Lentz & Guedry, 1978). However, we know that adaptation occurs, and if there are individual differences in adaptability, perhaps the combination of both provocative and adaptation testing would improve our ability to predict who will develop symptoms in space and in cyber applications. This should be the future direction for research that seeks a simple, effective, non-pharmaceutical approach to motion sickness mitigation. In summary, our results showed that APL pre-adaptation training in the form of simulated rotary Purkinje stimulation produces reduced levels of simulation

200

Human Factors in Simulation and Training

sickness in both the virtual and vection (OKN) drum environments. The significant differences in dizziness, nausea, oculomotor, and other related simulation sickness symptoms found between the control and experimental groups are a clear indication of perceptual adaptation. These results are consistent with the relatively enduring adaptation to prism-displaced vision demonstrated even months after the initial prism exposure. It is not well understood whether the adaptation training can be sustained and maintained over a prolonged period of time. In addition, the transfer of adaptation from one situation of visual-vestibular conflict to another situation warrants further investigation. Also, the dichotomy of simulated rotary stimulation task (Coriolis versus pseudo-Coriolis) is an important dimension in examining individual differences in adaptation to perceptual rearrangement and space sickness. Adaptive learning effects were also shown in that exposure to one form of OKNVIMS induction was protective of OKN-VIMS exposure in another. For astronauts and others exposed to unusual motion environments, these results argue that pretraining with milder forms of expected motion anomalies would offer protection against other, more severe forms in the near future. Further research is needed to address these issues through a series of interlocking empirical experiments. We hope the organizing principle will be aided by collecting these procedures under the name of APL training.

ACKNOWLEDGMENTS We would like to thank Paul Huchens, Dan Compton, and Cecelia Grizzard for the data collection and technical assistance. Also, we would like to thank Drs. Norm Lane and Bob Jones for their insightful comments during the course of this research. This research was supported by a NASA contract NAS2-02016. Charles DaRoshia was the technical monitor.

IN HONOREM This chapter is dedicated in loving memory to Robert S. Kennedy. One look at the number of citations with his name on them in this chapter and virtually any work on neurovestibular effects of motion sickness will clearly reveal the debt the entire field owes to RSK. His passing will be missed but his contributions should inspire similar virtues in those he trained. We will always be grateful for his kindness and patience and willingness to mentor his generation and the next generation of scientists interested in the phenomenon of VIMS. The number of us who he touched with his insight and guidance is worthy of this thank you “gratias ago tibi carus amicus.”

REFERENCES Avila, C., Antònia, M., Generós, P., & Ignacio Ibáñez-Ribes, O. M. (1999). Anxiety and counter-conditioning: The role of the behavioral inhibition system in the ability to associate aversive stimuli with future rewards. Personality and Individual Differences, Dec; 27(6), 1167–1179.

Mitigation of Motion Sickness Symptoms

201

Baker, A., Mystkowski, J., Culver, N., Yi, R., Mortazavi, A., & Craske, M. G. (2010). Does habituation matter? Emotional processing theory and exposure therapy for acrophobia. Behaviour Research and Therapy, 48(11), 1139–1143. https://doi​.org​/10​.1016​/j​.brat​ .2010​.07​.009 Banks, R. D., Salisbury, D. A., & Ceresia, P. J. (1992). The Canadian forces airsickness rehabilitation program, 1981–1991. Aviation, Space, and Environmental Medicine, Dec; 63(12), 1098–1101. PMID: 1360796. Bender, M. B., & Shanzer, S. (1983). History of optokinetic nystagmus. Neuro-ophthalmology, 3(2), 73–88. Benson, A. J. (1978). Motion sickness. In Dhenin, G. & Ernsting, J. (Eds.), Aviation Medicine: Physiology and Human Factors (pp. 468–493). London: British Crown Copyright. Benson, A. J. (1999). Spatial disorientation: Common illusions. In Ernsting, J., Nicholson, A. N., & Rainford, D. J. (Eds.), Aviation Medicine (3rd ed., p. 445). London: Butterworths. Benson, A. J., & Bodin, M. A. (1966). Interaction of linear and angular accelerations on vestibular receptors in man. Aerospace Medicine, 37, 144–154. Bertolini, G., & Straumann, D. (2016). Moving in a moving world: A review on vestibular motion sickness, frontiers in neurology, 7. https://www​.frontiersin​.org​/article​/10​.3389​/ fneur​.2016​.00014 Bertolucci, L. E., & DiDario, B. (1995). Efficacy of a portable acustimulation device in controlling seasickness. Aviation, Space, and Environmental Medicine, 66(12), 1155–1158. Bos, J. E., Bles, W., & Groen, E. L. (2008). A theory on visually induced motion sickness. Displays, 29(2008), 47–57. Boynton, R. M., & Das, S. R. (1966). Visual adaptation: Increased efficiency resulting from spectrally distributed mixtures of stimuli. Science, 154, 1581–1583. Cheung, B., & Hofer, K. (2005). Desensitization to strong vestibular stimuli improves tolerance to simulated aircraft motion. Aviation, Space, and Environmental Medicine, 76(12), 1099–1104. Clément, G, Deguine, O, Bourg, M, & Pavy-LeTraon, A. (2007). Effects of vestibular training on motion sickness, nystagmus, and subjective vertical. The Journal of Vestibular Research, 17(5–6), 227–237. PMID: 18626134. Collins, E. (1974). Habituation of vestibular responses and visual stimulation. In Kornhuber, H. H. (Ed.), Handbook of Sensory Physiology (Vol. V1/2, pp. 369–386), Vestibular Systems. Berlin, Heidelnerg, NY: Springer–Verlag. Crawshaw, M., & Craske, B. (1976). Oculomotor adaptation to prisms: Complete transfer between eyes. British Journal of Psychology, 67(4), 475–478. Dahl, E., Offer‐Ohlsen, D., Lillevold, P. E., & Sandvik, L. (1984). Transdermal scopolamine, oral meclizine, and placebo in motion sickness. Clinical Pharmacology & Therapeutics, 36(1), 116–120. Dichgans, J., & Brandt, T. (1973). Optokinetic motion sickness as pseudo–Coriolis effects induced by moving visual stimuli. Acta Otolaryngologica, 76, 339–348. Dobie, T. (2019). Motion Sickness A Motion Adaptation Syndrome (Chapter 6). Springer International Publishing. https://doi​.org​/10​.1007​/978​-3​-319​-97493-4 Dobie, T. G., & May, J. G. (1990). Generalization of tolerance to motion environments. Aviation, Space, and Environmental Medicine, 61(8), 707–711. Dobie, T. G., & May, J. G. (1994). Cognitive-behavioral management of motion sickness. Aviation, Space, and Environmental Medicine 65(10 Pt 2), C1–C2 (ISSN: 0095-6562). Dobie, T. G., May, J. G., Fischer, W. D., & Elder, S. T. (1987). A comparison of two methods of training resistance to visually–induced motion sickness. Aviation, Space, and Environmental Medicine, 58(9, Sect 2), 34–41.

202

Human Factors in Simulation and Training

Dobie, T. G., May, J. G., Gutierrez, C., & Heller, S. S. (1990). The transfer of adaptation between actual and simulated rotary stimulation. Aviation, Space, and Environmental Medicine, 61(12), 1085–1091. Dolezal, H. F. (1982). Living in a World Transformed. New York: Academic Press. Festinger, L., Burnbham, C. A., Ono, H., & Bamer, D. (1967). Efference and the conscious experience of perception. Journal of Experimental Psychology Monograph, 74(4), 1–36. Fineberg, M. L. (1977). The effects of previous learning on the visual perception of velocity. Human Factor, 19, 157–162. Fischer, M. H., & Kornmüller, A. E. (1930). Vertigo. In Bethe, A., von Bergmann, G., Embden, G., & Ellinger, A. (Eds.), Handbook of Normal and Pathological Physiology (pp. 442–494). Berlin, Heidelberg: Springer. Fried, C. (1962). Studies on the Perceptual Threshold for Motion. II. Effects of Induced Motion on Threshold Velocity (Technical Memorandum No. 18–62). Aberdeen Proving Ground, MD: Army Human Engineering Laboratories. Fregly, A. R., & Kennedy, R. S. (1965). Comparative effects of prolonged rotation at 10 RPM on postural equilibrium in vestibular normal and vestibular effective human subjects. Aerospace Medicine, 36(12), 1160–1167. French, J., Vuillemot, F., & Bush, D. (2023). Comparison of traditional OKN Drum with VR-OKN drum on subjective symptoms and cortisol. In preparation. Gibson, E. J. (1963). Perceptual learning. Annual Review of Psychology, 14, 29–56. https://doi​ .org​/10​.1146​/annurev​.ps​.14​.020163​.000333 Giles, D. A., & Lochridge, G. K. (1985). Behavioral airsickness management program for student pilots. Aviation, Space, and Environmental Medicine, Oct; 56(10), 991–994. PMID: 3904710. Golding, J. F. (2006). Motion sickness susceptibility. Autonomic Neuroscience, 129(1–2), 67–76. Golding, J. F. (2016). Motion sickness. In Furman, J. M. & Lempert, T. (Eds.), Handbook of Clinical Neurology, Elsevier, 137, 371–390. ISSN 0072-9752, ISBN 9780444634375. Golding, J. F., & Gresty, M. A. (2015). Pathophysiology and treatment of motion sickness. Current Opinion in Neurology, Feb; 28(1), 83–88. https://doi​.org​/10​.1097​/ WCO​ .0000000000000163 https://doi​.org​/10​.1016​/ B978​- 0​- 444​- 63437​-5​.00027-3 Goodenough, F. L., & Tinker, M. A. (1931). The retention of mirror reading ability after two years. Journal of Educational Psychology, 22, 503–504. Graybiel, A., & Lackner, J. R. (1983). Motion sickness acquisition and retention of adaptation effects compared in three motion environments. Aviation, Space and Environmental Medicine, 54, 307–311. Graybiel, A., & Knepton, J. (1976). Sopite syndrome: A sometimes sole manifestation of motion sickness. Aviation, Space, and Environmental Medicine, Aug; 47(8), 873–882. Guedry, F. E. Jr. (1965). Habituation to complex vestibular stimulation in man: Transfer and retention of effects from twelve days of rotation at 10 RPM. Perceptual and Motor Skills, 21, 459–481. Guilford, J. P. (1954). Psychometric Methods. New York: McGraw–Hill Book Company. Harm, D. L., & Parker, D. E. (1993). Perceived self–orientation and self–motion in micro– gravity, after landing and during preflight adaptation training. Journal of Vestibular Research: Equilibrium and Orientation, 3(3), 297–305. Harm, D. L., & Parker, D. E. (1994). Preflight adaptation training for spatial orientation and space motion sickness. Journal of Clinical Pharmacology, 34(6), 618–627. Harlow, C. S. (1959). Learning set and error factor theory. In Koch, S. (Ed.), Psychology: A Study of a Science (Vol. 2, pp. 492–537). New York: McGraw-Hill.

Mitigation of Motion Sickness Symptoms

203

Harlow, H. F. (1949). The formation of learning sets. Psychological Review, 56(1), 51–65. https://doi​.org​/10​.1037​/ h0062474 Harris, C. S. (1965). Perceptual adaptation to inverted, reversed, and displaced vision. Psychological Review, 72, 419–444. Harris, C. S. (1980). Insight or out of sight? Two examples of perceptual plasticity in the human adult. In Harris, C. S. (Ed.), Visual Coding and Adaptability (pp. 105–160). Hillsdale, NJ: Lawrence Erlbaum Associates. Heaney, D., Jagneaux, D., & Baker, H. (2018). New device might have solved VR locomotion sickness. Retrieved from https://uploadvr​.com ​/ototech​-vibrating​-headband​-vr​-sickness/ Hein, A. (1972). Acquiring components of visually guided behavior. In Pick, A. D. (Ed.), Minnesota Symposia on Child Psychology (pp. 53–68). Minneapolis, MN: University of Minneapolis Press. Held, R. (1965). Plasticity in sensory–motor systems. Scientific American, 213, 84–91. Held, R. (1968). Dissociation of visual functions by deprivation and rearrangement. Psychologische Forschung, 31, 338–348. Heer, M., & Paloski, W. H. (2006). Space motion sickness: Incidence, etiology, and countermeasures. Autonomic Neuroscience, 129(1–2), 77–79. Homick, J. L., Reschke, M. F., & Vanderploeg, J. M. (1984). Space adaptation syndrome: Incidence and operational implications for the space transportation system program. Proceedings of AGARD Conference, Motion Sickness: MECHANISMS, Prediction, Prevention and Treatment (AGARD–CP–372). Neuilly– Sur–Seine, France: Advisory Group for Aerospace Research and Development. Hu, S., Stern, R. M., Vasey, M. W., & Koch, K. L. (1989). Motion sickness and gastric myoelectric activity as a function of speed of rotation of a circular vection drum. Aviation, Space, and Environmental Medicine, 60(5), 411–414. Hu, S., Grant, W. F., Stern, R. M., & Koch, K. L. (1991a). Motion sickness severity and physiological correlates during repeated exposures to a rotating optokinetic drum. Aviation, Space, and Environmental Medicine, Apr; 62(4), 308–314. PMID: 2031631. Hu, S., Stern, R. M., & Koch, K. L. (1991b). Effects of pre-exposures to a rotating optokinetic drum on adaptation to motion sickness. Aviation, Space, and Environmental Medicine, Jan; 62, 53–56. ISSN: 0095-6562. Hu, S., Davis, M. S., Klose, A. H., Zabinsky, E. M., Meux, S. P., & Jacobsen, H. A. (1997). Effects of spatial frequency of a vertically striped rotating drum on vection-induced motion sickness. Aviation, Space and Environmental Medicine, 68, 306–311. Huppert, D., Benson, J., & Brandt, T. (2017). A historical view of motion sickness—A plague at sea and on land, also with military impact. Frontiers in Neurology, 8, 114. https:// www​.frontiersin​.org​/article​/10​.3389​/fneur​.2017​.00114, https://doi​.org​/10​.3389​/fneur​ .2017​.00114. ISSN=1664-2295. Ishii, M. (1993). Space and vertigo: In relation to space motion sickness. Japanese Journal of Aerospace and Environmental Medicine, 30(1), 41–45. Jell, R. M., Ireland, D. J., & La Fortune, S. (1985). Human optokinetic after nystagmus. Acta Otolaryngology, 99, 95–101. Jones, M. B. (1970). A two–process theory of individual differences in motor learning. Psychological Review, 77(4), 353–360. Jones, P., & Holding, D. (1975). Extremely long–term persistence of the McCullough effect. Journal of Experimental Psychology: Human Perception and Performance, 4, 323–332. Jones, D. R., Levy, R. A., Gardner, L., Marsh, R. W., & Patterson, J. C. (1985). Self-control of psychophysiologic response to motion stress: Using biofeedback to treat airsickness. Aviation, Space, and Environmental Medicine, Dec; 56(12), 1152–1157. PMID: 3910020.

204

Human Factors in Simulation and Training

Katz, D. I., & Lackner, J. R. (1977). Adaptation to delayed auditory feedback. Perception and Psychophysics, 22(5), 476–486. Kennedy, R. S. (1970). Visual Distortion: A Point of View (Monograph No. 15). Pensacola, FL: Naval Aerospace Medical Institute. Kennedy, R. S. (1972). The Relationship Between Habituation to Vestibular Stimulation and Vigilance: Individual Differences and Subsidiary Problems. Doctoral dissertation, University of Rochester, NY (Also NAMRL Monograph No. 20, Naval Aerospace Medical Research Laboratory, Pensacola, FL.). Kennedy, R. S., & Frank, L. H. (1986). A review of motion sickness with special reference to simulator sickness. Paper presented at the 65th Annual Meeting of the Transportation Research Board, Washington, DC. Kennedy, R. S., Jones, M. B., & Harbeson, M. M. (1980). Assessing productivity and well– being in Navy workplaces. Proceedings of the 13th Annual Meeting of the Human Factors Association of Canada (pp. 108–113). Rexdale, Ontario, Canada: Human Factors Association of Canada. Also, Naval Biodynamics Laboratory, New Orleans, LA: November 1981, pp. 8–13. (Research Report No. NBDL–82R004). (NTIS No. AD A111180). Kennedy, R. S., Berbaum, K. S., & Frank, L. H. (1984a). Visual distortion: The correlation model. Proceedings of the SAE Aerospace Congress and Exhibition (Paper No. 841595). Long Beach, CA: Society of Automotive Engineers. Kennedy, R. S., Lilienthal, M. G., Dutton, B., Ricard, G. L., & Frank, L. H. (1984b). December. Simulator sickness: Incidence of simulator aftereffects in Navy flight trainers. Proceedings of the SAFE Symposium (pp. 299–302). Las Vegas, NV. Kennedy, R. S., Berbaum, K. S., Williams, M. C., Brannan, J., & Welch, R. B. (1987). Transfer of perceptual–motor training and the space adaptation syndrome. Aviation, Space, and Environmental Medicine, 58(9 Suppl.), A29–A33. Kennedy, R. S., Lane, N. E., Berbaum, K. S., & Lilienthal, M. G. (1993). Simulator Sickness Questionnaire (SSQ): A new method for quantifying simulator sickness. International Journal of Aviation Psychology, 3(3), 203–220. Kennedy, R. S., Stanney, K. M., & Rolland, J. (2001). Optokinetic studies of the relationship between vection and cybersickness (Report No. N61339-00-C-0054). Orlando, FL: Naval Air Warfare Center Training Systems Division. Kennedy, R. S., Drexler, J., & Kennedy, R. C. (2010). Research in visually induced motion sickness. Applied Ergonomics, 41(4), 494–503. ISSN 0003-6870, https://doi​.org​/10​ .1016​/j​.apergo​.2009​.11​.006 Kinney, J. A. S., Luria, S. M., Weitzman, D. O., & Markowitz, H. (1970a). Effects of Diving Experience on Visual Perception Under Water (NSMRL Report No. 612). Groton, CT: U.S. Naval Submarine Medical Center. Kinney, J. A. S., McKay, C. L., Luria, S. M., & Gratto, C. L. (1970b). The Improvement of Divers’ Compensation For Underwater Distortions (NSMRL Report No. 633). Groton, CT: U.S. Naval Submarine Medical Center. Klapp, S. T., Nordell, S. A., Hoekenga, K. C., & Patton, C. B. (1974). Long–lasting aftereffect of brief prism exposure. Perception and Psychophysics, 15, 399–400. Koch, A., Cascorbi, I., Westhofen, M., Dafotakis, M., Klapa, S., & Kuhtz-Buschbeck, J. P. (2018). The neurophysiology and treatment of motion sickness. Deutsches Ärzteblatt International, 115, 687–696, 687. Kottenhoff, H. (1957). Situational and personal influences on space perception with experimental spectacles. Acta Psychologica, 12, 79–87. Kovalev, A., Klimova, O., Klimova, M., & Drozhdev, A. (2020). The effects of optokinetic nystagmus on vection and simulator sickness. Procedia Computer Science, 176, 2832–2839.

Mitigation of Motion Sickness Symptoms

205

Lackner, J. R., & DiZio, P. (1998). Adaptation in a rotating artificial gravity environment. Brain Research Review, 28(1–2), 194–202. Lackner, J. R., & Graybiel, A. (1984). Influence of gravitoinertial force level on apparent magnitude of Coriolis cross–coupled angular accelerations and motion sickness. Proceedings of AGARD Conference, Motion Sickness: Mechanisms, Prediction, Prevention and Treatment (AGARD–CP–372). Neuilly–Sur–Seine, France: Advisory Group for Aerospace Research and Development. Lackner, J. R., & Lobovits, D. (1977). Adaptation to displaced vision: Evidence for prolonged aftereffects. Quarterly Journal of Experimental Psychology, 29, 65–69. Lane, J. (2009). The neurochemistry of counterconditioning: Acupressure desensitization in psychotherapy. Energy Psychology: Theory, Research, and Treatment, 1(1), 31–44. Lentz, J. M. (1984). Laboratory tests of motion sickness susceptibility. Proceedings of AGARD Conference, Motion Sickness: Mechanisms, Prediction, Prevention and Treatment (AGARD–CP–372). Neuilly–Sur– Seine, France: Advisory Group for Aerospace Research and Development. Lentz, M., & Guedry, F. E. (1978). Motion sickness susceptibility: A comparison of laboratory tests. Aviation, Space, and Environmental Medicine, 49, 1281–1288. Leung, A. K., & Hon, K. L. (2019). Motion sickness: An overview. Drugs Context, 8, 2019-94. https://doi​.org​/10​.7573​/dic​.2019​-9-4 Lucertini, M., Verde, P., & Trivelloni, P. (2013). Rehabilitation from airsickness in military pilots: Long-Term treatment effectiveness. Aviation, Space, and Environmental Medicine, Nov; 84(11), 1196–1200. Luria, S. M., & Kinney, J. A. S. (1970). Underwater vision. Science, 167, 1454–1461. Luria, S. M., Kinney, J. A., & Weissman, S. (1967). Estimates of size and distance underwater. American Journal of Psychology, 80, 282–286. Mack, A. (1967). The role of movement in perceptual adaptation to a tilted retinal image. Perception and Psychophysics, 2, 65–68. McCauley, M. E. (Ed.). (1984). Simulator sickness: Proceedings of a workshop. National Academy of Sciences/National Research Council/National Academy of Sciences. Washington, DC: Committee on Human Factors. McGonigle, B. O., & Flook, J. (1978). Long–term retention of single and multistate prismatic adaptation by humans. Nature, 272, 364–366. McLaughlin, S. C., & Webster, R. C. (1967). Changes in straight–ahead eye position during adaptation to wedge prisms. Perception and Psychophysics, 2, 37–44. Melamed, L. E., Beckett, P. A., & Halay, M. (1979). Individual differences in the visual component of prism adaptation. Perception, 8, 699–706. Mert, A., & Bles, W. (2007). Hyperventilation in a motion sickness desensitization program. Aviation, Space, and Environmental Medicine, 78(4), 505–509. Money, K. E. (1970). Motion Sickness. Physiological Reviews, Jan; 50(1). https://doi​.org​/10​ .1152​/physrev​.1970​.50​.1.1 Money, K. E. (1972). Measurement of susceptibility to motion sickness. In Lansberg, M.P. (Ed.), AGARD Conference Proceedings No. 109: Predictability of Motion Sickness in the Selection of Pilots (pp. B2-1–B2-4). Nueilly-sur-Seine, France: Advisory Group for Aerospace Research and Development. Money, K. E., Watt, D. G., & Oman, C. M. (1984). Preflight and postflight motion sickness testing of the spacelab I crew. Proceedings of AGARD Conference, Motion Sickness: Mechanisms, Prediction, Prevention and Treatment (AGARD–CP–372). Neuilly–Sur– Seine, France: Advisory Group for Aerospace Research and Development. Nguyen, T. (1996). Space sickness. Proceedings of the 5th International Conference on Space ‘96 (Vol. 2). Albuquerque, NM, June 1–6.

206

Human Factors in Simulation and Training

Muth, E. R. (2006). Motion and space sickness: Intestinal and autonomic correlates. Autonomic Neuroscience, 129(1–2), 58–66. ISSN 1566-0702. Nooij, S. A. E., Pretto, P., Oberfeld, D., Hecht, H., & Bülthoff, H. H. (2017). Vection is the main contributor to motion sickness induced by visual yaw rotation: Implications for conflict and eye movement theories. PLoS One, 12(4), e0175305. Oman, C. M. (1991). Sensory conflict in motion sickness: An observer theory approach. In Ellis, S. R., Kaiser, M., & Grunwald, A. (Eds.), Pictorial Communication in Virtual and Real Environments (pp. 362–376). London: Taylor and Francis. Oman, C. D., Lichtenberg, B. K., & Money, K. E. (1984). Space motion sickness monitoring experiment: Spacelab 1. Proceedings of AGARD Conference, Motion Sickness: Mechanisms, Prediction, Prevention and Treatment (AGARD–CP–372). Neuilly–Sur– Seine, France: Advisory Group for Aerospace Research and Development. Parker, D. E, Reschke, M. F., Arrott, A. P., Homick, J. L., & Lichtenberg, B. V. (1985). Otolith tilt–translation reinterpretation following prolonged weightlessness: Implications for preflight training. Aviation, Space, and Environmental Medicine, 56, 601–606. Peterson, J., & Peterson, J. K. (1938). Does practice with inverting lenses make vision normal? Psychological Monographs, 225, 12–37. Reason, J. T. (1970). Motion sickness: A special case of sensory rearrangement. Advanced Science, 26, 386–393. Reason, J. (1978). Motion sickness: Some theoretical and practical considerations. Applied Ergonomics, 9(3), 163–167. ISSN 0003-6870. https://doi​.org​/10​.1016​/0003​ -6870(78)90008-X. Reason, J. T., & Graybiel, A. (1972). Factors contributing to motion sickness susceptibility: Adaptability and receptivity. Proceedings of AGARD Conference: Predictability of Motion Sickness in the Selection of Pilots (AGARD–CP–109). Neuilly–Sur–Seine, France: Advisory Group for Aerospace Research and Development. Redding, G. M. (1973a). Simultaneous visual adaptation to tilt and displacement: A test of independent processes. Bulletin of Psychonomic Society, 2, 41–42. Redding, G. M. (1973b). Visual adaptation to tilt and displacement: Same or different processes? Perception and Psychophysics, 14, 193–200. Redding, G. M. (1975a). Simultaneous visuomotor adaptation to optical tilt and displacement. Perception and Psychophysics, 17, 97–100. Reschke, M. F., Homick, J. L., Ryan, P., & Mosely, E. C. (1984). Prediction of the space adaptation syndrome. Proceedings of AGARD Conference, Motion Sickness: Mechanisms, Prediction, Prevention and Treatment (AGARD–CP–372). Neuilly–Sur– Seine, France: Advisory Group for Aerospace Research and Development. Reschke, M. F., Bloomberg, J. J., Harm, D. L., Paloskli, W. H., Layne, C., & McDonald, V. (1998). Posture, locomotion, spatial orientation, and motion sickness as a function of one space flight. Brain Research Review, 28, 102–117. Rogers, D., & Van Syoc, D. (2011). Clinical Practice Guideline for Motion Sickness. American Society of Aerospace Medicine Specialists.Virginia: Aerospace Medical Association, November 14. Sang, F. Y. P., Billar, J., Gresty, M. A., & Golding, J. F. (2005). Effect of a novel motion desensitization training regime and controlled breathing on habituation to motion sickness. Perceptual and Motor Skills, 101(1), 244–256. Savreau, D. (1979). Persistence of simple and contingent motion aftereffects. Perception and Psychophysics, 26(3), 187–194. Slotnick, R. S. (1969). Adaptation to curvature distortion. Journal of Experimental Psychology, 81, 441–448. Snyder, F. W., & Snyder, C. W. (1957). Vision with spatial inversion: A follow–up study. Psychological Record, 7, 20–30.

Mitigation of Motion Sickness Symptoms

207

Smyth, J., Jennings, P., Bennett, P., & Birrell, S. (2021). A novel method for reducing motion sickness susceptibility through training visuospatial ability – A two-part study. Applied Ergonomics, 90, 103264. ISSN 0003-6870. Stanney, K. M. (Ed.). (2002). Handbook of Virtual Environments: Design, Implementation, and Applications. Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Stanney, K. M., Mourant, R. R., & Kennedy, R. S. (1998). Human factors issues in virtual environments: A review of the literature. Presence, 7(4), 327–351. Stroud, K. J., Harm, D. L., & Klaus, D. M. (2005). Preflight virtual reality training as a countermeasure for space motion sickness and disorientation. Aviation, Space, and Environmental Medicine, 76, 352–356. Taub, E., & Goldberg, I. A. (1973). Prism adaptation: Control of intermanual transfer by distribution of practice. Science, 180, 755–757. Thornton, W. E., Pool, S. L., Moore, T., & Vanderploeg, J. (1987). Clinical characterization and etiology of space motion sickness. Aviation, Space, and Environmental Medicine, 58(9, Suppl.), A1–A8. Vanderploeg, J. M., Stewart, D. F., & Davis, J. R. (1985). Space Motion Sickness. Houston, TX: NASA (NASA Report NASA–S–85–02963). von Holst, E. (1968). Relations between the central nervous system and the peripheral organs. In Haber, R. N. (Ed.), Contemporary Theory and Research in Visual Perception (pp. 497–503). New York: Holt, Rinehart, and Winston. Warren, D. H., & Platt, B. B. (1974). The subjects: A neglected factor in recombination research. Perception, 3, 421–438. Warwick-Evans, L. A., Symons, N., Fitch, T., & Burrows, L. (1998). Evaluating sensory conflict and postural instability: Theories of motion sickness. Brain Research Bulletin, 47(5), 465–469. ISSN 0361-9230. Welch, R. B. (1978). Perceptual Modification: Adapting to Altered Sensory Environments. New York: Academic Press. Welch, R. B. (2000a). Adapting to virtual environments. In Stanney, K. M. (Ed.), Handbook of Virtual Environments: Design, Implementation, and Applications. Mayhaw, NJ: Lawrence Erlbaum Associates, Publishers. Welch, R. B. (2000b). Adapting to telesystems. In Hettinger, L. & Haas, M. (Eds.), Psychological Issues in the Design and Use of Virtual and Adaptive Environments. Mayhaw, NJ: Lawrence Erlbaum Associates, Publishers. Welch, R. B., Choe, C. S., & Heinrich, D. R. (1974). Evidence for a three–component model of prism adaptation. Journal of Experimental Psychology, 103, 700–705. Welch, R. B., Bridgeman, B., Williams, J. A., & Semmler, R. (1998). Dual adaptation and adaptive generalization of the human vestibulo–ocular reflex. Perception and Psychophysics, 60(8), 1415–1425. Weech, S., Wall, T., & Barnett-Cowan, M. (2020). Reduction of cybersickness during and immediately following noisy galvanic vestibular stimulation. Experimental Brain Research, 238, 427–437. White, B. (2007). Ginger: An overview. American Family Physician, Jun 1; 75(11), 1689–1691. Wolfe, J. (1985). Fatigue and structural change: Two consequences of visual pattern adaptation. Investigative Ophthalmology and Visual Science (Supplement), 24, 215. Wooster, M. (1923). Certain factors in the development of a new spatial coordination. Psychological Monographs, 32(4), 1–96. Yen Pik, S. F., Billar, J., Gresty, M. A., & Golding, J. F. (2005). Effect of a novel motion desensitization training regime and controlled breathing on habituation to motion sickness. Perceptual and Motor Skills, 101, 244–256.

7

Decision-Making under Crisis Conditions A Training and Simulation Perspective Jiahao Yu, Tiffany Nickens, Dahai Liu, and Dennis A. Vincenzi

CONTENTS Introduction.............................................................................................................209 Effects of Time Stress and Uncertainty on Decision-Making................................ 210 Other Effects on Human Decision Makers under Crisis Conditions...................... 211 Decision-Making Theories...................................................................................... 212 Decision-Making Performance Measures............................................................... 214 Crisis Decision-Making Training............................................................................ 215 General and Stress Training.................................................................................... 215 Simulation............................................................................................................... 217 Microworld.............................................................................................................. 219 Conclusion.............................................................................................................. 220 References............................................................................................................... 220

INTRODUCTION Humans make decisions every day. These range from life-planning decisions, such as whether to take a job after college or go to graduate school, to quotidian decisions about what to eat for lunch. Decision-making is a task in which “a person must select one option from a number of alternatives” with “some amount of information available” and under the influence of “time frame” and context uncertainty (Wickens, Lee, Liu, & Becker, 2004). For decisions such as what to eat for lunch, humans can take enough time to consider all the available options, and even if a bad decision is made, the consequence is not significant. Unfortunately, this is not the case when making a decision during a crisis. Decision-making in a crisis situation involves time stress and uncertain information and can be an issue of life or death, such as navigating an airplane through severe weather or deciding when to deploy a parachute if the airplane malfunctions in such a situation. Sometimes, people even need to make

DOI: 10.1201/9781003401353-7

209

210

Human Factors in Simulation and Training

critical decisions in a stressful setting while solving another task simultaneously (Gathmann et al., 2014). Indeed, as Orasanu and Connolly (1993) noted, for crisis decision-making, “the stakes are often high and the effects on lives are likely to be significant.” For public leaders facing uncertainty, such as Covid-19, crisis decisionmaking becomes an adaptive process with four fundamental functions: cognition, communication, coordination, and control (Comfort et al., 2020). A crisis can best be described as a “rare and unique” event (Sniezek, Wilkins, & Wadlington, 2001), bringing with it an “unexpected, life-threatening, and timecompressed” (McKinney, 1993) sequence of events. The characteristics of a crisis include the following (Sniezek et al., 2001): • Uncertainty—Not understanding enough about the event or situation to know how to carry out an appropriate action or to know what the corresponding outcome(s) of that action would be. • Threat to property/life—A chance that possessions and/or human life “could be lost, or soon will be” (Sniezek et al., 2001). • Quick occurrence—Resulting effects of crisis quickly spread out to other areas (i.e., panic, supplies shortage, potential for violence, etc.). Immediate actions are critical in restricting the magnitude of damage. • Uncontrollability—Many of the crisis’ outcomes can be “partially influenced” (Sniezek et al., 2001), but not completely controlled. These characteristics make coherent decision-making under crisis nearly impossible, not to mention the lack of evidence in this evidence-informed process (Khalid et al., 2019). Nevertheless, even during this time of uncertainty, high stress levels, and time pressure, the individual or team knows that “not making a decision is not an option” (Flin & Arbuthnot, 2002). One individual or all people involved must take control of the situation and not only prevent it from getting worse, but not “fan the flames” either. This set of actions is known as crisis management. As Sniezek et al. (2001) explain, crisis management can be compared to risk management, only the situation is “real, not potential.” Good and accurate decision-making under time pressure and uncertainty is what makes crisis management effective. In this chapter, we will summarize the research findings in this area. Firstly, we will briefly highlight some of the background information on time stress and uncertainty, particularly the effect they have on decision makers. This will be followed by decision-making theories as related to crisis decision-making. In the final section, we will discuss the training and simulation issues for crisis decision-making.

EFFECTS OF TIME STRESS AND UNCERTAINTY ON DECISION-MAKING Research has shown that time pressure can reduce the quality of decision-making because limited time is available for thinking through various possible actions (Edland & Svenson, 1993; Maule, Hockey, & Bdzola, 2000). Time pressure as a task characteristic has different meanings for different tasks. Some researchers use

Decision-Making under Crisis Conditions

211

the terms “time urgency” or “time stress” (Rastegary & Landy, 1993) and “time window” (Rothrock, 2001). Time urgency refers to an accelerated pace of activities that results from striving to finish more and more tasks in a decreasing period of time, whereas time pressure is defined as “the difference between the amount of time available and the amount of the time required to solve the task” (Rastegary & Landy, 1993). In most cases, crisis decision-making requires humans to respond within an appropriate time interval. For example, pilots must decide whether to proceed or turn back when encountering severe weather conditions. Such decisions should be made within the appropriate time window—neither too early nor too late (Rothrock, 2001). As a result, the decision-maker must decide what actions to take within a finite amount of time, as well as determine when to implement the chosen actions (Brehmer, 1992). Even when the duration is relatively adequate, the chronic stress brought by the upcoming crisis can still impact decision-making and lead to a selection for immediate rewards instead of long-term payoffs (Morgado et al., 2015; Mudra & Tong, 2020). As for uncertainty, many definitions exist. Related terms include vagueness, incompleteness, ambiguity, conflict, and randomness (Davis & Hall, 2003). Definitions can be classified into two categories: (1) the variability of a given situation and (2) the characteristics of information regarding the situation (Rastegary & Landy, 1993). Thus, in human decision-making, uncertainty can be characterized as the unknown probability (or likelihood) of a possible outcome (Busemeyer, 1985) or the lack of complete information on which to base a decision (Kuipers, Moskowitz, & Kassirer, 1988; e.g., where an incoming hurricane is likely to make landfall and should certain cities be evacuated). Brecke and Garcia (1995) classified uncertainty in a decision problem as “primary” and “secondary” uncertainties. Primary uncertainty is the action uncertainty to the decision, whereas secondary uncertainty includes situation uncertainty, goal uncertainty, and option uncertainty. The information required to end primary uncertainty pertains to secondary uncertainty. Different levels of uncertainty can interact to make the uncertain situation more complex; those emanating from the environment, the organization, or individuals (Rastegary & Landy, 1993). Although many variations of the term persist, what can be agreed upon is that uncertainty plays a significant role in increasing stress in decision-making processes and affects performance (Rastergary & Landy, 1993). Otherthan the variation from the term itself, as daily decisions of uncertainty are made within a social context, but decisions of uncertainty in the non-social domain may be different (FeldmanHall et al., 2015). FeldmanHall et al. (2015) found that “acute stress dampens an individual’s likelihood of making ambiguously uncertain decisions in social contexts but heightens how often they engage in ambiguously uncertain decisions in non-social contexts.”

OTHER EFFECTS ON HUMAN DECISION MAKERS UNDER CRISIS CONDITIONS As a result of the many stressors placed on the human during an emergency situation, many physiological and psychological effects can occur and limit cognitive resources

212

Human Factors in Simulation and Training

(Mandler, J. M., 1979; Mandler, G., 1982). Under normal circumstances, the average person can simultaneously process roughly five to nine concepts (Miller, 1956); however, when a high volume of stimuli is suddenly experienced, thought capacity decreases to two concepts (Waugh & Norman, 1965). Because of the overwhelming amount of stimuli and stressors encountered in a crisis situation, a short-term memory deficit will undoubtedly ensue (Mandler, 1979; Hockey, 1986). As in a domino effect, communication between teammates and/or the command base is often lacking because of short-term memory failure—thereby creating further problems (Stuster, 1996; Wickens, 2005). For obvious reasons, communication is a critical link between not only the team members but also the team and command base and is essential for receiving correct and complete information on the crisis incident to achieve a better situational understanding. However, it is also found that psychological stress in combination with a parallel executive task can preserve the decision-making performance from decreasing (Comfort et al., 2020; Pabst et al., 2013). Further problems include attention tunneling, or when attention is devoted solely to one dilemma at a time, by way of prioritization. Moray and Rotenburg (1989) described this as “cognitive lock-up.” With the probability of more than one problem surfacing, limitation of the expanse of damages is greatly compromised by the lack of cognitive processes. Confusion can also be induced in high-stress environments. This is due to the sudden influx of information and the human attempting to receive and process this information as quickly as possible (Horvitz & Barry, 1995). With working memory degrading as a result of the situation, giving each piece of incoming information sufficient amount of time and thought is extremely difficult. In a time of emergency, there is moreover a need to make a decision as quickly as possible, no matter whether all alternatives have been evaluated or not. Hockey (1986) discovered this when participants, under different combinations or by themselves, were placed under noise and anxiety stress—a situation not uncommon to that seen in crises. Unfortunately, premature decision-making can be counterproductive and lead to additional problems, especially if the situation was not understood correctly to begin with. From an evolutionary approach, human decision-making was shaped by the cost and benefits of the ancestral past. Some aggressive mechanisms may be evoked during the crisis conditions regardless of the cost and benefits. Even people can evaluate the situation rationally, the judgment and decision-making are often subconscious and biased (Johnson et al., 2012). Crisis situations can have many stressors associated with them that will limit the human’s decision-making ability. To further understand the effect of decisionmaking under crisis conditions, special decision-making theories are needed. In the next section, we will discuss some of the theories associated with decision-making and how these theories address the effects produced by time stress and uncertainty.

DECISION-MAKING THEORIES To improve human decision-making skills in crisis situations, researchers have strived to find the best strategies and models to describe, predict, and aid human

Decision-Making under Crisis Conditions

213

decision-making processes. Sometimes, different models like the organizational culture model and bureaucratic politics model need to be used collectively to understand some decision-making in crisis (Monten & Bennett, 2010). Early efforts, now known as the “classical” normative model, utilized probability and statistical theories. These included subjective expected utility (SEU) theory and multi-attribute utility (MAU) theory (Wickens et al., 2004). These theories assumed every decisionmaker to be a complete rational entity. Combining this assumption with probability models (such as Bayesian models and the Markovian model), researchers argued that optimal behavior could be expressed by using quantitative measures. This normative model is based on the assumptions that (1) the human decision-maker has complete and valid information available for each decision and (2) the decision-maker has unlimited time to put all the information into the normative model, compare the outcomes of each alternative, and make the right decision. The decision-making process is assumed to proceed through a linear series of steps, i.e., the DECIDE model (Jensen, 1988). Despite the wide application of the normative model, the assumptions underlying these rational models are often violated in crisis situations. Consider an aeronautical weather-related decision-making (WRDM) scenario (Wiggins & O’Hare, 1995). When approaching a weather condition, a pilot must decide whether to continue proceeding to the original destination, to divert the plane to a new destination, or to cancel the flight and return to the departure point. The pilot’s decision, which has to be made within the appropriate and very short time frame (neither too early nor too late), is based on limited and uncertain information such as the present state of aircraft, the current performance characteristics of the aircraft, meteorological conditions, aerodrome specifications, and topological maps. Each of these pieces of information has unquantifiable effects on future events. That is, according to the rational normative model, the pilot has neither the time nor the information to decide what to do. Instead, the pilot must decide what to do in a more intuitive manner. The pilot’s decision will probably be based on his or her prior experiences. In an emergency situation, humans typically make decisions without formally quantifying each information cue and all outcome alternatives into probabilities (Beach & Lipshitx, 1993). Therefore, researchers have argued that during a crisis, humans tend to make decisions more “naturally” or “intuitively” rather than completely “rationally” (Klein, 1997, 2000; Orasanu & Connolly, 1993; Zsambok, 1997), as the ancestral past mentioned above (Johnson et al., 2012). In naturalistic descriptive models, from Klein’s Recognition-Primed Decision-Making (RPD; Klein, 1989) to Rasmussen’s Skill-, Rule-, and Knowledge-based (SRK) behavior in decision-making (Rasmussen, 1983), the belief that experience and extensive practice within a particular domain is the only way to improve decision-making skills has become widely accepted. According to Klein’s (1989) RPD model, decision-making is primed by the decisionmaker’s recognition of the situation based on his or her experience with this taskspecific domain. If an unfamiliar situation is encountered, this RPD model will not work (Orasanu, 1997); a person must have some previous experience or familiarity with a situation to develop any sort of valid hypothesis about the current situation.

214

Human Factors in Simulation and Training

Take for example the difference in decision-making between a novice and an expert. At the novice level, decision-making is slow, analytical, unreliable, effortful, and disjointed (Brecke & Garcia, 1995). At the expert level, decision-making is intuitive, fast, reliable, effortless, and parallel. Researchers who identified these characteristics in decision-making tasks include Deitch (2001), Kirlik, Fisk, Walker, and Rothrock (1998), Klein (2000), Means, Salas, Crandall, and Jacobs (1993), Mosier (1997), Shanteau (1995), and Wiggins and O’Hare (1995). In Varma’s (2019) axiomatic model of cognitive decision-making during a crisis, the strategy process is affected by several factors including politicization, formalization of the decision-making process, financial report, the impact of the crisis, etc. In the model, variables can be put on a scale with proscriptive variables at one end while supportive variables at the other end. This scale is cross-joined with the advocacy/accommodation continuum yield a Cartesian product of communication options (Varma, 2019).

DECISION-MAKING PERFORMANCE MEASURES Decision-making in dynamic environments such as crises lacks a single valid performance measure to assess its effectiveness. Decision-making differs from other skills such as perceptual skills or psychomotor control skills. The latter two can be objectively measured and the measurement outcomes directly reflect skill level. One major problem with measuring decision-making is that the outcomes from decision-making in uncertain environments are relatively random. That is, even if the decision-maker applied perfect decision-making strategies, the outcome might not be successful due to unpredictable interventions of chance, whereas conversely, a lucky guess could produce a successful outcome. Although basing performance on the outcome does greatly simplify things, it does not identify “training needs or provide trainees with feedback” (Johnston, Cannon-Bowers, & Smith-Jentsch, 1995; Johnston, Smith-Jentsch, & Cannon-Bowers, 1997). This implies that for assessing decision-making efficiencies under uncertainty, an individual outcome is not a direct measure. Therefore, “measures of performance” (or processes) and “measures of effectiveness” (or outcomes) (Smith-Jentsch, Johnston, & Payne, 1998) are needed to understand the whole picture of decision-making effectiveness. Examples of measuring both decision-making processes and outcomes exist in the literature. For example, Cohen, Freeman, and Thompson (1997) used several different measures to assess decision-making efficiency, including the number of issues considered, amount of evidence identified, number of explanations of conflict generated, number of alternatives generated, accuracy of assessment, consensus and confidence in assessment, and frequency of contingency planning. Johnston et al. (1997) developed a framework to measure outcomes and processes at both the individual and team level. In attempting to find evidence of internal thought processes, Woods (1993) applied a “process-tracing” or “protocol analysis” methodology. Verbal protocol, behavior protocols, walkthroughs, and interviews are the most common techniques for process tracing. In naturalistic decision-making studies, retrospective

Decision-Making under Crisis Conditions

215

self-reports such as these as well as other interview techniques are widely used (Boreham, 1989; Doherty, 1993). One drawback in these approaches is the reliance on human recall of past events, which can substantially limit the reliability and validity of the measures. Other research has used regression techniques, such as Lens model, to measure relationships between environmental cues and human decisions (Bisantz, Kirlik, Gay, Phipps, Walker, & Fisk, 2000; Hammond, 1993; Jha & Bisantz, 2001; Rothrock & Kirlik, 2003). Although the research on measuring decision-making processes and outcomes has been enlightening, much work remains to be done. With the measurement of decision-making effectiveness, the efficacy of decisionmaking training can be determined. Traditional measures of training effectiveness apply a transfer-of-training paradigm (Liu & Vincenzi, 2004; Liu, Blickensderfer, Vincenzi, & Macchiarella, 2006). Transfer of training can be measured in many ways; both as an outcome or process. For example, as one of the most popular process measures, transfer of training can be measured by making comparisons between the durations of the training needed to perform a task at a certain skill level (time to standard; Liu et al., 2006). Learning curve techniques have been used in some transfer-of-training studies (Damos, 1991; Liu et al., 2006; Spears, 1985; Taylor, Lintern, Hulin, Talleur, Emanuel, & Phillips, 1999) and may be useful in assessing decision-making skill development. Raw data obtained by decision-making performance measures (e.g., accuracy of assessment) can be used in developing a learning curve and determining just how effective the training program is. The learning-curve-fitting methods provide a much more detailed analysis of data, and the three aspects of performance, i.e., beginning, asymptotic, and rate of improvement, can be examined separately (Liu, Nickens, & Wang, 2006).

CRISIS DECISION-MAKING TRAINING The next question is, exactly how do we train for a rare and abrupt situation occurring in a crisis environment that requires humans to decide on efficacious actions in a short span of time with ambiguous information? Due to the dynamic nature of crises, which makes training according to fixed protocols extremely difficult (Cesta et al., 2014), little research has been conducted in this area. In this section, we will discuss what needs to be involved in the training program, the problems experienced with traditional training processes, and what unconventional training processes could be beneficial in ensuring a successful crisis management training program.

GENERAL AND STRESS TRAINING One current method of training for high-stress environments involves two separate aspects: general training and stress training. General training ensures that “required knowledge, skills, and abilities” are acquired by means of classroom training or simulation under predictable conditions (Driskell & Johnston, 1998). This training content should extensively cover, from beginning to end, all mission goals, depending

216

Human Factors in Simulation and Training

on the particular mission and domain. Even with crises being as unique as they are, set procedures should be learned and exercised on a multitude of possible scenarios and system malfunctions. Research has shown that as long as the individual understands the relationships between symptoms and causes (Dienes & Fahey, 1995) and the “dependencies between all system components” (Kersholt, 1997), control of the situation can be obtained. If there is specific information needed that has not been made available, they must be able to use their knowledge of the system to find this information (Gonzalez, Vanyukov, & Martin, 2005). One result, other than the sheer knowledge that will be required to work with complex system interdependencies, is that the individual will undoubtedly face “unintended consequences” of their decisions (Gonzalez et al., 2005). This may result from hasty or forced decision-making, lack of complete information, or even from the dynamic environment itself. Another consequence is that of goal conflicts (Gonzalez et al., 2005). In many instances, the available resources (i.e., humanpower, time, supplies, etc.) are simply not enough to sustain the situation. A decision must be made that will prioritize these needs and determine what resources will be focused where. The trainee also needs to learn what side effects will be produced as a result of his or her decisions in this dynamic system and how to make trade-offs when certain goals are threatened. One increasingly popular method of learning the intricate system relationships and how the individual’s decisions affect the system as a whole is through microworld (or scaled-world) simulations (e.g., Controller Teamwork Evaluation and Assessment Methodology [CTEAM] or Networked Fire Chief [NFC]). More will be discussed on this type of simulation further on in the chapter. Therefore, and perhaps not surprisingly, it is imperative that each individual have exposure in dealing with nearly every possible situation, whether it is a planned part of the mission or something outside of that, and how to respond accordingly (Cohen, Freeman, & Thompson, 1997); thereby rendering multiple tasks less novel. Stress training, on the other hand, is used solely to prepare someone on how to cognitively and behaviorally respond in a high-stress environment; This means that the majority of the training is performed outside of the classroom, without “normal” or expected conditions (Driskell & Johnston, 1998). These stress-training tasks involve uncertain cues and time pressure that are extremely critical in ensuring transfer of training; so that when the real event happens, effective actions occur “naturally.” Much research has been conducted on the viability of exposing trainees to stress and how it later affects task performance (Ivancevich et al., 1990; Johnston & Cannon-Bowers, 1996; Meichenbaum, 1985; Novaco, Cook, & Sarason, 1983; Smith, 1980; Zakay & Wooler, 1984). For example, one such stress exposure training (SET) program (Driskell & Johnston, 1998) follows these steps:

1. Information provision: An introduction to the symptoms of stress and how stress influences performance. Allows the trainee to become familiar with sensory information, procedural information, and instrumental information associated with a stressful environment, giving them a sense of greater control over the situation.

Decision-Making under Crisis Conditions

217

2. Skills acquisition: Provides exposure to “attentional focus, overlearning, and decision-making skills” training. 3. Application and practice: The application of knowledge and criticalthinking skills obtained by the effects of stress to scenarios similar to those that could probably be experienced, with the stress level being gradually increased over time. Stress training provides a number of advantages applicable to a high-stress situation that traditional or general training cannot, the first of which is that it gives a better understanding of stressful environments (Driskell & Johnston, 1998; Johnston & Cannon-Bowers, 1996). This allows the trainee to learn to “form accurate expectations” concerning crisis situations, thereby allowing for better “predictability” (Driskell & Johnston, 1998). Furthermore, skills are acquired to overcome anxiety and other stress effects produced by high-stress levels that hinder performance. When trained on what to expect and how to respond, individuals will be skilled in acknowledging and then “cognitively controlling” (Driskell & Johnston, 1998) or suppressing these stress effects to perform appropriately and efficiently. Lastly, this type of training builds performance confidence (Driskell & Johnston, 1998; Johnston & Cannon-Bowers, 1996). Those who learn to approach tasks in a positive or confident manner are found to be less likely to become distracted by extraneous variables in the environment and focus instead on the task at hand.

SIMULATION Crisis situations are nearly impossible to replicate in training and it is not safe to expose trainees to them. Traditional training (i.e., instructor, classroom, etc.) has typically been deemed insufficient for all spheres of crisis management training. Sniezek et al. (2001) identified the following issues in developing a crisis management training program through traditional training methods: expert selection and recruitment, determining training content, effectiveness assessment, feedback, interactions with trainer, scheduling, cost, realism, and transfer of training. As a result, the best training program to overcome these traditional training concerns would be through the use of simulations; a training method that produces a wide range of scenarios, with an “immersive interface,” complete experimental control, and a performance feedback system (Sniezek et al., 2001). There are many advantages to using simulation over the traditional training techniques. According to Sniezek et al. (2001), for effective crisis management, humans need to train “under acute stress” or at least under a combination of “arousal, time pressure, and anxiety”; conventional training methods, unlike simulation, simply cannot provide this. If a simulated training program can successfully produce these results by accurately replicating the natural environment with sequences likely to be experienced, trainees will become “immersed” and approach each scenario as though it were real, as opposed to only “managing a simulated event” (Crego & Spinks, 1997). The level of realism produced is an important key in promoting

218

Human Factors in Simulation and Training

transfer of training and, if deficient in any way, can greatly hinder training transfer (Zakay & Wooler, 1984). Other advantages include the ability of the trainee to understand the effects and side effects of his or her chosen actions. As mentioned previously, it is extremely important for the trainee to learn and understand inputs and outputs of the system to know what decisions to make and to begin to gain control of the situation. Simulation incorporates these complex interdependent relationships of the system into the training. Other benefits include the training of multiple trainees at any given time. This is especially beneficial in situations where individuals may be inactive for periods of time before the action (e.g., military personnel being transported over long distances to a war zone) and can use the system for refreshing or recurrent training. Trainees will also be able to interact with one another on the same task, even if they exist in different domains (e.g., air traffic control [ATC] trainees communicating with student pilots in separate simulated environments), an essential in team decisionmaking training. The automated feedback system in simulation programs also assists in satisfying a few of the traditional training issues addressed previously. As Kirlik et al. (1998) noted, there are four areas that would strengthen the individual’s training experience if implemented in the feedback system: timeliness, standardization, diagnostic precision, and presentation mode. Timeliness: An automated feedback system allows trainees to receive instant information at the end of the trial, or even during the trial if requested. Often, feedback is obtained too late to be of any use to the trainee (Kirlik et al., 1998). Habit breaking can also be achieved if the system has been programmed to intervene when the trainee commits an error during the simulation (Sniezek et al., 2001). Standardization: Although expert trainers can provide individualized performance feedback to the trainee, it is labor intensive (Sniezek et al., 2001) and can be “highly idiosyncratic” (Kirlik et al., 1998). The trainer must be able to identify the processes used by the trainee to achieve the outcome. Unfortunately, this is not always possible or feasible. As has been already established, the process is as important as the outcome in identifying where improvements are needed. Additionally, trainers tend to have their own preferences in training, and what is deemed important by one may differ from other trainers’ viewpoints, or even differ from the training program itself. With numerous trainers involved in the program, each trainee could receive variations in training; this in turn could hinder future team interactions. Diagnostic precision: Following a training session, the trainee must be informed as to where and how errors occurred, not just that x number of errors were committed. An automated feedback system would be able to diagnose the exact failure and provide an explanation about what went wrong, as well as offer suggestions on how to improve or prevent it from happening again. Explanations are essential in ensuring that the trainees not only understand the feedback (Sniezek et al., 2001), but that they do not “attribute the error to the particular events in the scenario” and instead take away from it a more generalized lesson (Kirlik et al., 1998).

Decision-Making under Crisis Conditions

219

Presentation mode: Kirlik et al. (1998) found that verbal feedback from trainers during the training session resulted in more interference by creating a “secondary task.” The presentation mode used in their study implemented a text-based “real-time, embedded feedback” system. In a study by Passenier and Kerstholt (1996), participants were supplied with an additional computer screen containing information about the system and the relationships between subsystems; 20% more problems were solved by participants who used this technique than those who did not. Currently, simulations are being used widely in decision-making training, but more specifically are being used for dynamic decision-making. As will be described next, one form of simulation gaining popularity in dynamic decision-making research is microworld simulation.

MICROWORLD Although training in the field does provide the highest level of fidelity and allows trainees to replicate real-world tasks, it is extremely difficult to manipulate and control training scenarios, especially when incorporating time stress and uncertainty. Microworld (or scaled-world) simulations, on the other hand, offer a “compromise between experimental control and realism” (Gonzalez et al., 2005). This type of simulation has become an increasingly useful educational and dynamic decisionmaking research tool over the past three decades (Granlund, Johansson, Persson, Artman, & Mattson, 2001). It enables trainees to operate in a “scaled” version of the environment, thereby giving the users a top-down view of how their decisions and actions made in real time affect the system as a whole. Microworld simulations show the system/environment as it changes autonomously and when each decision or action is enforced. If the user hesitates in decision-making or makes no decision at all, the simulation incorporates this time of inactivity into the current scenario. In addition, this type of simulation gives the researcher the capability to shape the training session to precisely meet the needs of the trainee and the researcher. It ensures that trainees receive a “deeper and more integrated understanding” of the system and environment in which they are immersed in— especially of the “environmental inputs and behavioral outputs” (Ehret, 1998). Some of the current microworld simulation programs that have been evaluated and shown to incorporate relatively high dynamics and complexity are NEWFIRE, Fire Chief, Duress II, Moro, and Water Production Plant (Gonzalez et al., 2005). Although this method of training has been deemed useful in dynamic decision-making studies, further research is needed on its true advantages and disadvantages when applied in crisis training. Overall, although the benefits of simulation far outweigh conventional training methods in crisis training, a huge barrier faced when implementing simulation into the training program is the initial cost. This cost must cover the “research and development costs” associated with a system such as this (Sniezek et al., 2001). The simulation end results are only as good as the model; therefore, an extensive amount of time and effort must be committed to the development of the design.

220

Human Factors in Simulation and Training

CONCLUSION In this chapter, characteristics of a crisis and the effect they have on the human decision-maker have been discussed, as well as problems associated with relying solely on traditional training methods to develop effective decision-making skills during a crisis. Although traditional training methods are adequate for acquiring general domain knowledge and skills, its use otherwise is relatively limited. Crisis training involves much more complicated requirements. During a crisis, individuals face time pressure, high risk, and ambiguous information in a dynamic environment. Research has shown that SET can assist in mitigating many of these effects mentioned. Therefore, the crisis training program relies heavily on simulation to meet these needs not satisfied by traditional methods.

REFERENCES Beach, L. R., & Lipshitz, R. 1993. Why classical decision theory is an inappropriate standard for evaluating and aiding most human decision making. In Klein, G. et al. (Eds.), Decision Making in Action: Models and Methods (pp. 21–36). Norwood, NJ: Ablex Publishing Corp. Bisantz, A. M., Kirlik, A., Gay, P. Phipps, D. A., Walker, N., & Fisk, A. D. 2000. Modeling and analysis of dynamic judgment tasks using a lens model approach. IEEE Transactions on Systems, Man, and Cybernetics, 30(6), 605–616. Boreham, N. C. 1989. Modeling medical decision-making under uncertainty. British Journal of Educational Psychology, 59, 187–199. Brecke, F. H., & Garcia, S. K. 1995. Training methodology for logistic decision making, USAF-AMRL- Technical-Report (Brooks). October 1995; AL/HR-TR 1995-0098: iii– vii, 1–94. Brehmer, B. 1992. Dynamic decision making: Human control of complex systems. Acta Psychologica, 81, 211–241. Busemeyer, J. R. 1985. Decision making under uncertainty: A comparison of simple scalability, fixed-sample, and sequential-sampling models. Journal of Experimental Psychology: Learning, Memory and Cognition, 11(3), 538–564. Cesta, A., Cortellessa, G., & De Benedictis, R. 2014. Training for crisis decision making – An approach based on plan adaptation. Knowledge-Based Systems, 58, 98–112. https://doi​ .org​/10​.1016​/j​.knosys​.2013​.11​.011 Cohen M. S., Freeman, J. T., & Thompson, B. T. 1997. Integrated critical thinking training and decision support for tactical anti-air warfare. 3rd International Command and Control Research and Technology Symposium Proceedings. Comfort, L. K., Kapucu, N., Ko, K., Menoni, S., & Siciliano, M. 2020. Crisis Decision‐Making on a global scale: Transition from cognition to collective action under threat of COVID‐19. Public Administration Review, 80(4), 616–622. https://doi​.org​/10​.1111​/puar​.13252 Crego, J., & Spinks, T. 1997. Critical incident management simulation. In Flin, R., Salas, E., Strub, M., & Martin, L. (Eds.), Decision Making Under Stress: Emerging Themes and Applications (pp. 85–94). Burlington, VT: Ashgate publishing Company. Damos, D. L. 1991. Examining transfer of training using curve fitting: A second look. The International Journal of Aviation Psychology, 1(1), 73–85. Davis, J. P., & Hall, J. W. 2003. A software-supported process for assembling evidence and handling uncertainty in decision-making. Decision Support System, 35, 415–433.

Decision-Making under Crisis Conditions

221

Deitch, E. 2001. Learning to land: A qualitative examination of pre-flight and in-flight decision-making processes in expert and novice aviators. Dissertation, Virginia Polytechnic Institute and State University. Dienes Z., & Fahey, F. 1995. Role of specific instances in controlling a dynamic system. Journal of Experimental Psychology: Learning, Memory and Cognition, 21(4), 848–862. Doherty, M. E. 1993. A laboratory scientist’s view of naturalistic decision making. In Klein, G. et al. (Eds.), Decision Making in Action: Models and Methods (pp. 362–389). Norwood, NJ: Ablex Publishing Corp. Driskell, J. E., & Johnston, J. H. 1998. Stress exposure training. In Cannon Bowers, J. A., & Salas, E. (Eds.), Making Decisions Under Stress: Implications for Individual and Team Training (pp. 191–217). Washington, DC: American Psychological Association. Edland, A., & Svenson, O. 1993. Judgment and decision making under time pressure: Studies and finding. In Svenson, O., & John Maule, A. (Eds.), Time Pressure and Stress in Human Judgment and Decision Making (pp. 27–40). New York and London: Plenum Press. Ehret, B. D. 1998. Scaled worlds as research tools: A demonstration. Human Factors and Ergonomics Society 42th Annual Meeting (pp. 1157). Santa Monica, CA: Human Factors and Ergonomics Society. FeldmanHall, O., Raio, C. M., Kubota, J. T., Seiler, M. G., & Phelps, E. A. 2015. The effects of social context and acute stress on decision making under uncertainty. Psychological Science, 26(12), 1918–1926. https://doi​.org​/10​.1177​/0956797615605807 Flin, R., & Arbuthnot, K. 2002. Incident Command: Tales from the Hot Seat. Aldershot: Ashgate. Gathmann, B., Schulte, F. P., Maderwald, S., Pawlikowski, M., Starcke, K., Schäfer, L. C., Schöler, T., Wolf, O. T., & Brand, M. 2014. Stress and decision making: Neural correlates of the interaction between stress, executive functions, and decision making under risk. Experimental Brain Research, 232(3), 957–973. https://doi​.org​/10​.1007​/ s00221​- 013​-3808-6 Gonzalez, C., Vanyukov, P., & Martin M. K. 2005. The use of microworlds to study dynamic decision making. Computers in Human Behavior, 21, 273–286. Granlund, R., Johansson, B., Persson, M., Artman, H., & Mattson, P. 2001. Exploration of Methodological Issues in Micro-world Research—Experiences from Research in Team Decision Making. Presented at a workshop on the use of micro-worlds in research. Granada, Spain. Retrieved online at http://www​.nada. kth​.se​/~ artma​n /Art​icles​/ Misc​/ MIKR​​O​_GRA​​NADWO​​RKSHO​​​P​.pdf​ Hammond, K. R. 1993. Naturalistic decision making from a Brunswikian viewpoint: Its past, present, future. In Klein, G. et al. (Eds.), Decision Making in Action: Models and Methods (pp. 205–228). Norwood, NJ: Ablex Publishing Corp. Hockey, G. R. J. 1986. Changes in operator efficiency as a function of environmental stress, fatigue and circadian rhythms. In Boff, K. R., Kaufman, L., & Thomas, J. P. (Eds.), Handbook of Perception and Human Performance (pp. 1–49). New York: Wiley. Horvitz, E., & Barry, M. 1995. Display of information for time-critical decision making. Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, San Francisco, CA: Morgan Kaufmann Publishers. Ivancevich, J., Matteson, M., Freedman, S., & Philips, J. 1990. Worksite stress management interventions. American Psychologist, 45, 252–261. Jha, P., & Bisantz, A. M. 2001. Modeling fault diagnosis in a dynamic process control task using a multivariate lens model. Proceedings of the Human Factors and Ergonomics Society 45th Annual Meeting, Minneapolis/St. Paul, MN.

222

Human Factors in Simulation and Training

Jensen, R. S. 1988. Creating a ‘1000 Hour’ pilot in 300 hours through judgment training. Proceedings of the Workshop on Aviation Psychology, Newcastle, Australia: Institute of Aviation, University of Newcastle. Johnson, D. D. P., McDermott, R., Cowden, J., & Tingley, D. 2012. Dead certain: Confidence and conservatism predict aggression in simulated international crisis decision-making. Human Nature (Hawthorne, N.Y.), 23(1), 98–126. https://doi​.org​/10​.1007​/s12110​- 012​ -9134-z Johnston, J., & Cannon-Bowers, J. A. 1996. Training for stress exposure. In Driskell, J. E., & Salas, E. (Eds.), Stress and Human Performance (pp. 223–256). Mahwah, NJ: Lawrence Erlbaum. Johnston, J. H., Cannon-Bowers, J. A., & Smith-Jentsch, K. A. 1995. Event-based performance measurement system for shipboard command teams. Proceedings of the First International Symposium on Command and Control Research and Technology (pp. 274–276). Washington, DC: Institute for National Strategic Studies. Johnston, J., Smith-Jentsch, K. A., & Cannon-Bowers, J. A. 1997. Performance measurement tools for enhancing team decision-making training. In Brannick, M. T., Salas, E., & Prince, C. (Eds.), Team Performance Assessment and Measurement: Theory, Methods, and Applications (pp. 311–327). Mahwah, NJ: Lawrence Erlbaum Associates. Kerstholt, J. H. 1997. Dynamic decision making in non-routine situations. In Flin, R., Salas, E., Strub, M., & Martin, L. (Eds.), Decision Making Under Stress: Emerging Themes and Applications (pp. 185–192). Burlington, VT: Ashgate publishing Company. Khalid, A. F., Lavis, J. N., El-Jardali, F., & Vanstone, M. 2019. Stakeholders’ experiences with the evidence aid website to support ‘real-time’ use of research evidence to inform decision-making in crisis zones: A user testing study. Health Research Policy and Systems, 17(1), 106–106. https://doi​.org​/10​.1186​/s12961​- 019​- 0498-y Kirlik, A., Fisk, A. D., Walker, N., & Rothrock, L. 1998. Feedback augmentation and parttask practice in training dynamic decision-making skills. In Cannon-Bowers J. A., & Salas, E. (Eds.), Making Decisions Under Stress (pp. 91–113). Washington, DC: American Psychological Association. Klein, G. A. 1989. Recognition-primed decisions. In Rouse, W. (Ed.), Advances in ManMachine Systems Research (pp. 47–92). Greenwich, CT: JAI Press. Klein, G. 1997. An overview of naturalistic decision-making applications. In Zsambok, C., & Klein, G. (Eds.), Naturalistic Decision Making (pp. 48–61). Mahwah, NJ: Lawrence Erlbaum Associates. Klein, G. 2000. How can we train pilots to make better decisions? In O’Neil, H., & Andrews, D. (Eds.), Aircrew Training and Assessment (pp. 165–194). Mahwah, NJ: Lawrence Erlbaum Associates. Kuipers B., Moskowitz, A. J., & Kassirer, J. P. 1988. Critical decision under uncertainty: Representation and structure. Cognitive Science, 12, 177–210. Liu, D., Blickensderfer, E., Vincenzi, D., & Macchiarella, N. 2006. Transfer of training. In Vincenzi, D., & Wise, J. (Eds.), Human Factors and Simulation (pp. 49–60). Liu, D., Nickens, T., & Wang, Y. 2006. Modeling Decision-Making Learning Process Under Crisis Situations. Paper presented at the 10th Annual Fall Simulation Interoperability Workshop, Orlando, FL. Liu, D., & Vincenzi, D. 2004. Measuring simulation fidelity: A conceptual study. Proceedings of Human Performance, Situation Awareness and Automation Conference, Daytona Beach, FL. Mandler, J. M. 1979. Categorical and schematic organization in memory. In Puff, C. R. (Ed.), Memory Organization and Structure (pp. 259–299). New York: Academic Press. Mandler, G. 1982. Stress and thought processes. In Goldberger, L., & Breznitz, S. (Eds.), Handbook of Stress: Theoretical and Clinical Aspects (pp. 88–104). New York: Free Press.

Decision-Making under Crisis Conditions

223

Maule, A. J., Hockey, G. R., & Bdzola, L. 2000. Effects of time-pressure on decision-making under uncertainty: Changes in affective state and information processing strategy. Acta Psychologica, 104, 283–301. McKinney, E. H. 1993. Flight leads and crisis decision making. Aviation, Space, and Environmental Medicine, 64, 359–362. Means, B., Salas, E., Crandall, B., & Jacobs, T. O. 1993. Training decision makers for the real world. In Klein, G. et al. (Eds.), Decision Making in Action: Models and Methods (pp. 306–327). Norwood, NJ: Ablex Publishing Corp. Meichenbaum, D. 1985. Teaching thinking: A cognitive-behavioral perspective. In Segal, J. W., Chipman, S. F., & Glaser, R. (Eds.), Thinking and Learning Skills, 2: Research and Open Questions. London: Lawrence Erlbaum Associates. Miller, G. 1956. The magical number seven plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97. Monten, J., & Bennett, A. 2010. Models of crisis decision making and the 1990–91 gulf war. Security Studies, 19(3), 486–520. https://doi​.org​/10​.1080​/09636412​.2010​.505129 Moray, N., & Rotenberg, I. 1989. Fault management in process control: Eye movements and action. Ergonomics, 32, 1319–1342. Morgado, P., Sousa, N., & Cerqueira, J. J. 2015. The impact of stress in decision making in the context of uncertainty. Journal of Neuroscience Research, 93(6), 839–847. https:// doi​.org​/10​.1002​/jnr​.23521 Mosier, K. 1997. Myths of expert decision making and automated decision aid. In Zsambok, C., & Klein, G. (Eds.), Naturalistic Decision Making (pp. 319–331). Mahwah, NJ: Lawrence Erlbaum Associates. Mudra, R. A., & Tong, M. T. 2020. Making “good” choices: Social isolation in mice exacerbates the effects of chronic stress on decision making. Frontiers in Behavioral Neuroscience, 14, 81–81. https://doi​.org​/10​.3389​/fnbeh​.2020​.00081 Novaco, R., Cook, T., & Sarason, I. 1983. Military recruit training: An arena for stress-coping skills. In Meichenbaum, D., & Jaremko, M. (Eds.), Stress Reduction and Prevention (pp. 377–418). New York: Plenum. Orasanu, J. 1997. Stress and naturalistic decision making: Strengthening the weak links. In Flin, R., Salas, E., Strub, M., & Martin, L. (Eds.), Decision Making Under Stress: Emerging Themes and Applications (pp. 43–66). Burlington, VT: Ashgate publishing Company. Orasanu, J., & Connolly, T. 1993. The reinvention of decision making. In Klein, G., Orasanu, J., Calderwood, R., & Zsambok, C. (Eds.), Decision Making in Action: Models and Methods (pp. 3–20). Norwood, NJ: Ablex. Pabst, S., Schoofs, D., Pawlikowski, M., Brand, M., & Wolf, O. T. 2013. Paradoxical effects of stress and an executive task on decisions under risk. Behavioral Neuroscience, 127(3), 369–379. https://doi​.org​/10​.1037​/a0032334 Passenier, P. O., & Kerstholt, J. H. 1996. Design and evaluation of a decision support system for integrated bridge operation. Report TNO TM-1996 C064. Soesterberg: Human Factors Research Institute. Rasmussen, J. 1983. Skills, rules, and knowledge; Signals, signs, and other distinctions in human performance models. IEEE Transactions on Systems, Man, and Cybernetics, 13, 257–266. Rastegary, H., & Landy, F. 1993. The interaction among time urgency, uncertainty and time pressure. In Svenson, O., & Maule, A. (Eds.), Time Pressure and Stress in Human Judgment and Decision Making. New York: Plenum Press. Rothrock, L. 2001. Using the time windows to evaluate operator performance. International Journal of Cognitive Ergonomics, 5(1), 1–21. Rothrock, L., & Kirlik, A. 2003. Inferring rule-based strategies in dynamic judgment tasks: Toward a noncompensatory formulation of the lens model. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 33(1), 58–72.

224

Human Factors in Simulation and Training

Shanteau, J. 1995. Expert judgment and financial decision making. In Green, B. (Ed.), Risky Business: Risk Behavior and Risk Management. Stockholm University. Retrieved July 2005 online from http://www​.ksu​.edu​/psych​/cws​/pdf​/financial​_experts95​.PDF Smith, R. E. 1980. A cognitive/affective approach to stress management training for athletes. In Nadeau, C. H., Halliwell, W. R., Newell, K. M., & Roberts, G. C. (Eds.), Psychology of Motor Behavior and Sport—1979 (pp. 54–73). Champaign, IL: Human Kinetics. Smith-Jentsch, K. A., Johnston, J. H., & Payne, S. C. 1998. Measuring team-related expertise in complex environments. In Cannon-Bowers, J. A., & Salas, E. (Eds.), Decision Making Under Stress: Implications for Training and Simulation (pp. 61–87). Washington, DC: American Psychological Association. Sniezek, J. A., Wilkins, D. C., & Wadlington, P. L. 2001. Advanced training for crisis decision making: Simulation, critiquing, and immersive interfaces. Proceedings of the 34th Hawaii International Conference on System Sciences, Maui, HI: IEEE Computer Society. Spears, W. 1985. Measuring of learning and transfer using curve fitting. Human Factors, 27, 251–266. Stuster, J. 1996. Bold Endeavors: Lessons from Polar and Space Exploration. Annapolis, ML: Naval Institute Press. Taylor, H. L., Lintern, G., Hulin, C. L., Talleur, D. A., Emanuel, T. W. Jr., & Phillips, S. I. 1999. Transfer of training effectiveness of a personal computer aviation training device. International Journal of Aviation Psychology, 9(4), 319–335. Varma, T. 2019. Understanding decision making during a crisis: An axiomatic model of cognitive decision choices. International Journal of Business Communication (Thousand Oaks, Calif.), 56(2), 233–248. https://doi​.org​/10​.1177​/2329488415612477 Waugh, N., & Norman, D. 1965. Primary memory. Psychological Review, 72, 89–104. Wickens, C. D. 2005. Attentional tunneling and task management. Proceedings of the 13th International Symposium on Aviation Psychology, Dayton, OH: Wright-Patterson AFB. Wickens, C., Lee, J. D., Liu Y., & Becker S. 2004. An Introduction to Human Factors Engineering. Hoboken, NJ: Prentice Hall. Wiggins, M., & O’Hare, D. 1995. Expertise in aeronautical weather-related decisionmaking: A cross-sectional analysis of general aviation pilots. Journal of Experimental Psychology: Applied, 1(4), 305–320. Woods, D. D. 1993. Process-tracing methods for the study of cognition outside of the experimental psychology laboratory. In Klein, G. et al. (Eds.), Decision Making in Action: Models and Methods (pp. 228–252). Norwood, NJ: Ablex Publishing Corp. Zakay, D., & Wooler, S. 1984. Time pressure, training and decision effectiveness. Ergonomics, 27, 273–284. Zsambok, C. E. 1997. Naturalistic decision making: Where are we now? In Zsambok, C., & Klein, G. (Eds.), Naturalistic Decision Making (pp. 3–16). Mahwah, NJ: Lawrence Erlbaum Associates.

8

Healthcare Simulation and Training Sarah A. Powers and Mark W. Scerbo

CONTENTS Introduction............................................................................................................. 225 Benefits of Simulation in Healthcare............................................................. 226 Drawbacks of Simulation in Healthcare........................................................ 226 Dimensions of Simulation............................................................................. 227 History of Medical Simulation................................................................................ 228 Mannequins.................................................................................................... 228 Virtual Reality Systems................................................................................. 229 Surgical Systems................................................................................ 230 Collaborative and Immersive Training Systems................................ 231 Standardized Patients..................................................................................... 233 Hybrid Systems.............................................................................................. 234 Training................................................................................................................... 235 Team Training......................................................................................................... 236 Benefits.......................................................................................................... 237 History and Scope of Team Training in Healthcare....................................... 237 Training Transfer..................................................................................................... 238 Healthcare Simulation and the Pandemic............................................................... 242 Conclusion..............................................................................................................244 References............................................................................................................... 245

INTRODUCTION The use of simulation in high-risk fields such as aviation, aerospace, and the military has been widespread for nearly a century (Rosen, 2008; Weinger & Gaba, 2014). Comparatively, the integration of simulation in healthcare is relatively new. Simulation has been defined as “a technique – not a technology – to replace or amplify real experiences with guided experiences that evoke or replicate substantial aspects of the real world in a fully interactive manner” (Gaba, 2004, p. 126). The need for methods to enhance training in healthcare became clear with the publication of the Institute of Medicine’s (IOM) report, To Err Is Human, in 2000 (Kohn et al., 2000). This report estimated that nearly 98,000 people die each year

DOI: 10.1201/9781003401353-8

225

226

Human Factors in Simulation and Training

from medical errors. Several of the recommendations offered in this report to increase safety in the US healthcare system stressed the use of simulation for training. In addition to the ultimate goal of enhancing patient safety, there are several other driving forces of simulation in healthcare. Perhaps the biggest factor has been an increase in the capabilities and availability of technology for simulation (Gaba, 2004; Issenberg & Scalese, 2008; Nestel & Kelly, 2018; Satava, 2001). Another is the growing size of the healthcare workforce combined with the reduction in time available for training (Bradley, 2006; Issenberg & Scalese, 2008; Nestel & Kelly, 2018; Sinz, 2007). There has been a shift in healthcare training, with a greater focus on streamlined and shorter more efficient training (Bradley, 2006). However, a consequence of this emphasis has been that many students are less prepared for clinical practice (Cartwright et al., 2005; Perlmen et al., 2017). Simulation can supplement or even replace current training requiring patient contact hours. There is a growing trend to count time spent training in simulation as a suitable substitute for clinical time in healthcare curricula (Accreditation Council for Graduate Medical Education [ACGME], 2020; Nestel & Kelly, 2018; Sinz, 2007).

Benefits of Simulation in Healthcare There are numerous benefits of simulation in healthcare, many of which are related to enhancing training and safety. First, simulation can be made easily adaptable to users at all levels for skill acquisition, assessment, and retention (Bradley, 2006; Gordon, 1974; Jones et al., 2015; Yunoki & Sakai, 2018). This adaptability enables individualized training and can accommodate various learning styles, which can also help to increase student engagement compared to more traditional forms of instruction. Unlike real patients, simulations don’t get tired, worried, and are always available (Gordon, 1974). Additionally, simulators provide an ethical substitution for patients when trainees are practicing dangerous, invasive, or rare treatments (Buck, 1991; Gaba, 2004; Gordon, 1974; Ziv et al., 2003). Simulation provides novice trainees more opportunities to practice and refine their skills before performing any procedures on an actual patient. From the trainees’ perspective, simulation also provides a less stressful environment because they are less likely to feel pressure or embarrassment compared to practicing on a real patient (Forrest, 2019) and they know that any mistake will not harm an actual patient (Bradley, 2006). Ultimately, these benefits translate into improved patient safety (Buck, 1991; Bradley, 2006; Gaba, 2004; Nestle & Kelly, 2018). Despite these numerous benefits of simulation in healthcare, there are potential drawbacks that must also be considered.

Drawbacks of Simulation in Healthcare Several barriers have slowed the widespread implementation of simulation in healthcare. Potential drawbacks of simulation in healthcare include cost, zero or negative transfer, technology limitations, and fidelity. One of the most frequently cited drawbacks is the cost associated with simulation (Buck, 1991; Gordon, 1974; Rosen, 2008; Satava, 2001; Sinz, 2007). Although cost can be a concern, it is important

Healthcare Simulation and Training

227

to note that cost varies widely depending on many different aspects of a simulation (Gaba, 2004). High-fidelity simulation systems and activities will likely incur a high cost, whereas low-fidelity simulations can be quite inexpensive. For example, the high-fidelity SimMan costs upward of $68,000 (Laerdal Medical Corp, 2014), which does not include other yearly operational costs associated with maintenance, repairs, supporting information technology, and maintaining instructional support staff (Walsh & Jaye, 2012). Additionally, time spent in simulation can incur a higher cost if it takes time away from clinical services. However, when simulation training can replace clinical training, the relative cost can be lower. Thus, there are tradeoffs when considering how the implementation of simulation training in healthcare will impact overall cost. Another concern relates to the reliability of the technology used for simulation. Specifically, when the cost of a simulator is high, there are concerns regarding whether it or other related technology may fail during training or practice (Buck, 1991). A final concern is whether simulators sufficiently, and more importantly accurately, represent the things they purport to simulate (Gordon, 1974). If a simulator does not adequately represent the genuine clinical activity, this bias is likely to transfer into practice and can have ramifications on patient care and outcomes.

Dimensions of Simulation An overview of the diversity of simulation applications was provided by Gaba (2004) who categorized simulation into 11 dimensions. Later, Scerbo and Anderson (2012) organized these dimensions into three higher-level categories: goals, trainee or user characteristics, and method of implementation. Goals include the purpose and aims of the simulation activity. Broadly, these include education, training, performance assessment, and research. They also include the domain (e.g., primary care, procedural/surgery, or high hazard/emergency medicine) and the target of the simulation (e.g., motor skills, knowledge, attitudes, teamwork), as well as the age of the simulated participant(s), which can range from the neonate to the elderly. User characteristics include the unit of participation (i.e., individuals, teams, and even entire organizations; the experience level of the participants from novice to expert; and the discipline of the participants which could be physicians, nurses, or even management). Last, the method of implementation addresses the type of technology. At the most basic level, simulation can be conducted without any technology, such as verbal role-playing, while at the most complex level, simulation may require sophisticated technology, such as virtual patients embedded in a virtual replication of a clinical site. The method also includes the site for the simulation activity such as a simulation center or in the actual work environment (in situ) as well as the nature of participation which can range from observing the activity to being an active participant. Finally, the method also includes how feedback is provided, i.e., from an instructor, debriefing personnel, or simulator-generated performance metrics. Together these dimensions may not fully capture the breadth of current simulation applications, but they do provide a classification scheme to help researchers and others conceptualize the various elements across the diversity of simulation applications.

228

Human Factors in Simulation and Training

HISTORY OF MEDICAL SIMULATION Healthcare-related simulation can be traced back to the early 4th century (Owen, 2012). Very early simulators were typically anatomical models made from elements such as clay, stone, bronze, wood, ivory, and even wax (Jones et al., 2015; Owen, 2012). Documented use of simulation for the purpose of medical education is much more recent, having been first cited in the 17th and 18th centuries (Buck, 1991; Owen, 2012). More modern forms of simulation were introduced in the 20th century and have continued to evolve (Cooper & Taqueti, 2008). In this chapter, we will focus on a subset of the most common types of simulation. These include mannequins, virtual reality systems, immersive VR collaborative training systems, standardized patients, and hybrid systems.

Mannequins Some of the earliest simulators used for medical education are mannequins (Buck, 1991). These are life-sized physical reproductions of the human body with components that replicate critical systems (e.g., heart, lungs, digestive tract, etc.). More sophisticated mannequins also replicate human physiology and response to pharmacological agents (Gaba et al., 2001). One of the first mannequins was developed by Gregoire the Younger in the 18th century (Buck, 1991; Owen, 2012, 2018). This early mannequin simulated a woman’s abdomen and pelvis for the purpose of training midwives how to handle a variety of birthing complications. Although there are no known photos of this simulator, written accounts of its use questioned whether training would transfer to actual practice given the simulator’s low fidelity. In response to these concerns, several other slightly higher fidelity simulators were developed throughout the 19th century. One of the first modern mannequins is Resusci®-Anne developed by doll and toymaker, A.S. Laerdal, in the early 1960s for resuscitation training (Cooper & Taqueti, 2008; Rosen, 2008). Simulators for training resuscitation are important to reduce the risks of injury when practicing on healthy people and spreading contagious diseases (Buck, 1991). Resusci®-Anne is a life-sized mannequin that represents a young adult female. Early versions of this mannequin were developed solely to teach mouth-to-mouth resuscitation via a simulated airway and lungs capable of inflating and deflating (Buck, 1991; Cooper & Taqueti, 2008; Rosen, 2008). Later versions incorporated a spring and piston mechanism to simulate the resistance of the breastbone and ribcage during chest compressions. The Laerdal company also created child and adolescent versions (Buck, 1991). By the late 1960s, more complex simulators with integrated computer technology began to emerge. The earliest known simulator capable of replicating functions of human patients via computerized technology was Sim One (Abrahamson et al., 1969). Developed in the mid-1960s, Sim One was a full-body mannequin used for training endotracheal intubation in anesthesiology (Cooper & Taqueti, 2008; Gordon, 1974; Rosen, 2008). Sim One could generate a heartbeat, pulse, and blood pressure and was capable of breathing, opening and closing its mouth and eyes, exhibiting pupillary changes, and

Healthcare Simulation and Training

229

FIGURE 8.1  The Harvey® cardiopulmonary patient simulator.

reproducing responses to anesthesia injections (Buck, 1991). Using specialized input devices known as subcutaneous sensors, Sim One was able to interpret a learner’s actions and react accordingly. Although many felt Sim One was ahead of its time, it was not adopted because of its high cost and a reluctance among medical educators to appeal to this newer method of instruction (Buck, 1991; Bradley, 2006; Cooper & Taqueti, 2008). It wasn’t until 1986 that another mannequin incorporating computer technology was introduced (Cooper & Taqueti, 2008; Rosen, 2008). Developed by Michael Gordon, the Harvey® Cardiology Patient Simulator was designed to train learners how to recognize common auscultatory cardiac findings (Gordon, 1974). Harvey® was capable of simulating a variety of conditions and symptoms associated with cardiac disorders (see Figure 8.1). Some notable features included chest wall movement, murmurs, breath sounds, pupillary responses, and cyanosis (Buck, 1991; Cooper & Taqueti, 2008; Gordon, 1974). Harvey® is also considered to be the earliest example of a part-task trainer (Cooper & Taqueti, 2008), in that it represents only the portions of the body needed for specific training applications (Bradley, 2006). Compared to other simulators, Harvey® has undergone some of the most extensive testing regarding its educational efficacy (Cooper & Taqueti, 2008), demonstrating enhanced performance in clinical practice for students who trained with this system compared to those who trained with traditional methods of instruction (Ewy et al., 1987). At present, there are numerous mannequins available from many vendors that address cardiopulmonary functioning for adults, children, and neonates. Mannequins are also available now for training procedures in emergency medicine, labor and delivery, nursing, respiratory therapy, trauma, as well as basic anatomy.

Virtual Reality Systems Another form of simulation involves the use of virtual reality (VR), which has been described as computer-generated recreations of environments, objects, and people represented by avatars (Bradley, 2006). Early versions of VR systems for healthcare began to emerge in the 1990s. Joseph Rosen and Scott Delp created the first virtual representation of a lower limb used to practice tendon transplants (Delp et

230

Human Factors in Simulation and Training

al., 1990; Satava, 2001). Shortly thereafter, Dr. Richard Satava and Jaron Lanier created a virtual representation of organs in the upper abdomen (Satava, 1993). An important development in VR representations occurred in 1994, when researchers at the National Library of Medicine developed an interactive and searchable database of a male and female body created from CT, MRI, and phototomography scans of cadavers known as the Visible Human (Ackerman, 1998). This project is thought to be the basis for modern VR systems including much of the seminal work in surgical simulation (Satava, 2001). From the Visible Human Project data, Scott Delp created the Limb Trauma Simulator (Delp & Zajac, 1992; Satava, 2001). Surgical Systems The evolution of VR simulators in surgery was facilitated further by widespread adoption of the laparoscopic or minimally invasive method in the late 1980s. Unlike traditional open procedures, laparoscopic procedures are performed from outside the body. A miniature video camera and surgical instruments are inserted into the body through several small incisions. Laparoscopic procedures are particularly amenable to VR because the surgeon views the patient’s body cavity on a video display. Thus, developers could readily create computer-generated representations of anatomy, tissue, and laparoscopic instruments. One of the first VR systems for laparoscopic surgery was MIST VR (Sutton et al., 1997), which integrated a laparoscopic interface with a graphical 3D representation of laparoscopic instruments interacting with geometric shapes. Trainees performed basic psychomotor tasks representing the fundamental eye-hand coordinated movements needed for many laparoscopic manipulations (e.g., target acquisition, target transfer, traversal, target diathermy, etc.). Today, VR systems for laparoscopy often include similar psychomotor skill-building activities as well as whole-task training activities. For example, the LapSim® system from Surgical Science includes modules for appendectomy, hysterectomy, and laparoscopic cholecystectomy (gall bladder removal), among others. VR surgical training systems offer many advantages over other simulation techniques. First, they allow trainees to practice procedures repeatedly without cutting, altering, and destroying physical components, so they do not require replaceable parts. VR systems can increase user engagement through immersion and presence (Heinrichs et al., 2013; Heinrichs et al., 2010). Second, they can provide real-time performance-based feedback tailored to a particular individual (Gaba, 2004; Satava, 2001). Third, the simulation software can be easily updated to mirror changes in healthcare policies and procedures (Dev, 2016). Last, many VR surgical simulators incorporate haptic force feedback systems that give users the haptic sensation of probing or pulling on virtual tissue with simulated instruments. Laparoscopic and endoscopic procedures are good candidates for haptic force feedback because there is no direct contact with the patient’s organs and tissue. Instead, forces are felt indirectly through the laparoscopic instruments. This form of feedback is important for surgical simulation training, particularly among novices, when learning to apply the proper amount of force to manipulate but not damage tissue.

Healthcare Simulation and Training

231

Collaborative and Immersive Training Systems Outside of surgery, VR has also been used for other forms of healthcare simulation. Many current systems now use immersive VR headsets and support wireless movement within patient rooms, hospital environments, or onsite first-responder care. Another form of simulation uses VR for collaborative training. These systems have been used to facilitate training by enabling individuals to connect from remote sites and interact in a virtual world that mimics a clinical environment (Gaba, 2004; Heinrichs et al., 2013; Heinrichs et al., 2010; Issenberg & Scalese, 2008; Liaw et al., 2021; Rosen, 2008). One early system used Second Life. Originally developed in 2003, Second Life allows users to create an avatar and interact with others via the Internet (Beard et al., 2009; Rosen, 2008). Medical school instructors became interested in Second Life as a training tool for students to collaboratively practice history-taking and other clinical skills (Beard et al., 2009; Singh et al., 2013). Beard and colleagues (2009) used a medical application in Second Life and obtained some evidence to suggest that knowledge and skills gained with this format transferred to clinical practice. Since Second Life, more advanced three-dimensional virtual worlds (3DVW) specific to healthcare are being used for simulation-based interprofessional collaborative training (Liaw et al., 2021). For example, simulation of an interprofessional clinical encounter in a 3DVW known as Create Real-time Experience and Teamwork in a Virtual Environment (CREATIVE) was shown to improve team performance and interprofessional attitudes, as well as foster a mutual understanding of patient-centered care and team members’ interprofessional roles (Liaw et al., 2019, 2020). Another promising VW being investigated for use in medical education, training, surgical simulation, and more is Facebook’s Metaverse (Mozumder et al., 2022). The Metaverse goes beyond traditional VR technologies by incorporating augmented reality (AR) and a hybrid of VR and AR, known as mixed reality (MR). Although it is still in its very early stages of use, the Metaverse is a promising approach to immersive and collaborative training. Researchers have also studied virtual human technology to train diagnostic, reasoning, and communication skills to train students before interacting with patients (Kleinsmith et al., 2015; Lane et al., 2013). For example, Kron et al. (2017) developed the MPathic-VR system using virtual humans to address intercultural and interprofessional communication (see Figure 8.2). In one scenario, the patient, a young woman has been diagnosed with leukemia. The learner has to break the bad news to the patient’s mother, a woman from El Salvador whose cultural values differ from her daughter’s. In a follow-up scenario, the learner must engage in conflict resolution with the patient’s nurse who is upset because the learner failed to include her in the family meeting with the patient’s mother. Students who used MPathic-VR performed the communication scenarios, received feedback on their performance, and then repeated the scenarios to improve their scores. They were then given a transfer test where they applied their newly acquired skills in an objective structured clinical exam. The investigators showed that the students who trained with MPathic-VR scored higher than students trained with a conventional computer-based learning method. These findings suggest that training with virtual humans may provide a

232

Human Factors in Simulation and Training

FIGURE 8.2  User listening to the patient and her mother in the MPathic-VR system.

viable means to acquire communication and reasoning skills that transfer beyond the learning environment. On a grander scale, Scerbo and his colleagues (2007) at Old Dominion University and Eastern Virginia Medical School developed a fully immersive virtual operating room (VOR) to examine surgical team decision-making. The VOR environment and equipment were modeled on a standard OR configured for laparoscopic procedures and rendered in a 10 ft by 10 ft Cave automatic virtual environment (CAVE). They embedded a part-body mannequin and laparoscope to allow trainees to perform a simulated laparoscopic cholecystectomy. The VOR included a virtual attending surgeon, anesthetist, and circulating nurse. The virtual teammates were designed to interact with the trainee through speech scripts and movements based on the knowledge, personalities, and activities of genuine surgical team members. Figure 8.3 shows a trainee with a human scrub technician who hands the surgeon the operating instruments. In one scenario, in the middle of the procedure, the anesthetist alerts the trainee that the patient’s oxygen saturation level has dropped, and the trainee must troubleshoot the problem and decide whether to continue or abort the procedure. Thus, the VOR incorporates several forms of simulators and provides a more comprehensive OR environment to train procedural, communication, and social skills among surgical team members. Of course, one of the challenges with embedding humans in immersive virtual simulation is the division between physical and virtual elements. It is not possible to transition instruments from virtual to physical forms. However, Daher and colleagues (2020) developed a clever way to combine a human physical form with a virtual patient. They used a rear-projection AR system to represent the patient on a physical human shell. The patient lies on a table, but underneath is a set of shelves that house projectors, speakers, haptics, and heaters. Their physical-virtual patient can present multisensory symptoms such as changes in skin appearance, pulse, and

Healthcare Simulation and Training

233

FIGURE 8.3  Surgical resident and scrub technician interacting with the virtual attending surgeon in the virtual operating room with an embedded laparoscopic cholecystectomy simulator.

temperature. Trainees can touch the physical-virtual patient which can then initiate changes in facial expressions, speech, and underlying physiology (e.g., changes in eye movements and pupil size). The physical shell can also be changed to represent an adult, a child, or a male or female. The developers tested the system and found that most users were impressed with the fidelity and satisfied that critical cues needed for diagnoses were reasonably well rendered. Although the system is still constrained by numerous technical challenges, it does represent an important step forward in blurring the line between physical and virtual simulation.

Standardized Patients The most life-like form of simulation is the standardized patient (SP), sometimes referred to as a simulated patient. These are individuals trained to portray a patient in a standardized manner (Nestel et al., 2018). SPs are also used to perform two other important roles: assessing physician-patient interactions according to standardized criteria and providing immediate feedback to trainees about the quality of their interactions. The use of SPs was introduced in the US in 1964 by Howard Barrows for educating students about neurological examinations (Barrows, 1993; Rosen, 2008). SPs are now widely used for medical student assessments during their Objective Structured Clinical Examinations (OSCE; Nestel et al., 2018) and in the National Board of Medical Examiners United States Medical Licensing Examination Step 2 Clinical Skills for assessing clinical competency (Boulet et al., 2009). SPs bring a humanistic quality to simulation that is missing in many other forms of simulation (Nestel et al., 2018). However, there are also some important considerations and limitations that must be acknowledged. First, SPs are human and are susceptible to limitations of attention and working memory (Newlin-Canzone et al., 2013). Although SPs receive formal training to portray patients and provide feedback in a standardized manner, there can still be issues with the quality of the role-playing

234

Human Factors in Simulation and Training

or the feedback provided (Nestel et al., 2018). Second, because they are typically normal healthy individuals, they cannot simulate many underlying pathologies or conditions when examined physically. However, these limitations can be overcome to some extent with the use of moulage or appliances (see below). There are also ethical considerations regarding SPs themselves. Unlike devices, SPs are humans and come to the job with their own histories and experiences. For example, some simulations require SPs to partake in roles that emulate trauma (e.g., rape victim, alcoholic, etc.), which can elicit strong emotional reactions that can persist for several days (Nestel et al., 2018; Woodward, 1998). In other instances, SP roles may be played by a staff member of a medical institution which can elicit negative stereotypes among colleagues who associate that staff member with that role outside of the simulation and potentially harm his or her reputation (Nestel et al., 2018). Fortunately, there are methods that can help mitigate these concerns. The SP administrator should discuss the role requirements with the SP ahead of time to minimize the chance that the role will bring up any past trauma. The facilitator should encourage SPs to “de-role” before leaving the simulation site. There has been a growing movement among many in healthcare simulation to adopt policies for creating a psychologically safe environment for learning (a safe container) for trainees and staff (Rudolph et al., 2014).

Hybrid Systems Hybrid systems incorporate at least two types of simulation, typically a physical and virtual component (Satava, 2001). One of the first commercial examples is the VIST system (Mentice, Inc.) developed by Dawson and his colleagues (2000) for training interventional cardiology procedures. This system combined a physical representation of the body with simulated fluoroscopy and an interactive anatomical display. Trainees can choose among different catheters and guide wires, place them into the simulator, and monitor a virtual representation of the images used in interventional procedures. Some of the original examples of hybrid systems were often augmented standardized patients with part-task trainers, such as a partial mannequin, to enable simultaneous practice of technical and nontechnical skills (Bradley, 2006). For example, a female standardized patient could place a pelvic trainer between her legs to train labor and delivery procedures. The virtual operating room described above combined physical simulators into a virtual environment populated with virtual humans. Figure 8.4 shows another example of a hybrid system that uses a digital display, a physical model/appliance, and a human simulated patient for training ultrasonography. The learner places the sonography probe on any of the key locations on the abdominal appliance. As the learner moves the probe, fetal images derived from that location are updated on the display to indicate the correct or incorrect positioning and underlying anatomy. The simulated patient can ask questions of the learner (e.g., Can you tell if my baby is OK?). Thus, hybrid systems have the potential to expand training beyond specific procedures to incorporate a more wholistic provider–patient experience.

Healthcare Simulation and Training

235

FIGURE 8.4  A hybrid sonography system using a physical appliance, sonography display, and live model.

TRAINING Methods for training with simulation have been well documented in the literature and in this volume (Farmer et al., 2003; Roscoe & Williges, 1980; Swezey & Andrews, 2001; Vincenzi et al., 2009). In many high-risk occupations (e.g., aviation, military operations, nuclear power plant operations, etc.), computer-based simulators have been a historical and fundamental component of training. However, adoption of simulation-based training in healthcare is a much more recent phenomenon. Historically, medical education and other allied health professions traditionally followed an apprenticeship model in which procedures are learned by the “see one, do one, teach one” approach. In fact, Dawson (2002) noted that this approach to medical education had not changed since ancient Egyptian times. As noted above, publication of the To Err Is Human report (Kohn et al., 2000) focused attention on a health system that was rife with error. McGaghie et al. (2020) have indicated several other important factors that have led to a shift in the landscape for health professions training. First, the traditional apprenticeship method produces different and often insufficient educational opportunities. In surgery, for example, expectations for competency across many different procedures were rarely achieved by the end of residency (Bell et al., 2009). Second, methods for evaluating learner performance are incomplete and unreliable and the feedback provided does not always offer actionable and verifiable information for improvement. Third, a body of evidence shows that the traditional method of experience-based clinical education is not producing graduates who are adequately prepared for subsequent training in clinical practice. In light of recent concerns about the efficacy of the traditional approach to health professions training, McGaghie and his colleagues (2020) have advocated for an alternative that emphasizes simulation-based mastery learning. Their approach has its theoretical foundations in behavioral, constructivist, and social cognitive learning theory traditions. Fundamentally, mastery learning is a competency-based approach

236

Human Factors in Simulation and Training

to skill acquisition. Although there are variations, McGaghie (2020) has described seven critical features:

1) Begin by establishing baseline measures of a learner’s knowledge, skills, and abilities 2) Set and communicate measurable educational objectives in sequenced instructional units that progress in increasing levels of complexity 3) Prepare educational activities that meet the educational objectives 4) Establish minimal criteria for advancement and passing all educational units 5) Provide formative assessments, cognitive engagement, coaching, and feedback aimed at helping learners meet the educational objectives 6) Use evidence-based objective measures of knowledge, skills, and abilities for advancement through the instructional units 7) Engage in individualized, continual practice and assessment until mastery criteria are met

The simulation-based mastery learning approach is beginning to be adopted in different areas of health professions training with promising results. Ahn and colleagues (2016) adopted this approach for training video laryngoscopy (i.e., intubating patients with a breathing tube) and found that once mastery criteria were reached, skills were retained over a 6-month interval. Schroedl et al. (2020) found that the final level of skill attained by residents trained with the mastery method for managing the mechanical ventilation of patients exceeded that of those trained under the standard approach by over 50%. Reed et al. (2016) trained medical students on 6 basic procedures using the simulation-based mastery learning approach and found that 98% of the students retained knowledge of those procedures over a 1–9-month interval at a level that met or exceeded the minimal passing standards. Collectively, these studies show that the simulation-based mastery learning approach results in better, more consistent, and longer retained levels of performance than the traditional apprenticeship approach.

TEAM TRAINING The provision of healthcare is inherently a team-based activity. A variety of providers need to coordinate care to treat the same patient through the course of a single illness or even a straightforward appointment (Baker et al., 2005; Owen, 2016, 2018; Weller & Civil, 2018). However, the portion of medical or nursing curricula dedicated to team-based care is a fraction of what is spent on technical skills, if it is available at all (Baker et al., 2005; Moorthy et al., 2005; Weller & Civil, 2018; Yunoki & Sakai, 2018). The low priority given to team training is ironic when one considers that failures of technical skill are not necessarily the primary cause of errors (Chopra et al., 1992). Instead, a large number of preventable adverse events are attributed to failures of team-based care including teamwork and communication (Leape et al., 1991). Therefore, there is a need for more interdisciplinary and

Healthcare Simulation and Training

237

multidisciplinary team training (Owen, 2016) and simulation has been instrumental in facilitating these efforts.

Benefits Several benefits are seen when simulation-based team training is implemented in healthcare. An obvious benefit is that it can enhance teamwork and communication skills (Salas et al., 2006; Weller & Civil, 2018). Specifically, training with individuals from other disciplines can help team members to understand other members’ perspectives, contributing to a shared perspective among all team members (Gaba et al., 2001). Further, training as a team rather than as individuals is associated with a reduction in workload and improvement in performance on team tasks (Prichard et al., 2011). Additionally, studies that have explored the effectiveness of simulation for team training have found improvements in many areas including simulated case performance, teamwork in real clinical environments, attitudes toward safety, perceptions of clinical decision-making, and patient outcomes, among others (Weaver et al., 2014; Weller & Civil, 2018). These enhancements contribute further to a reduction in both patient morbidity and mortality, ultimately increasing patient safety (Salas et al., 2006).

History and Scope of Team Training in Healthcare The origin of simulation-based team training in healthcare can be traced back to the comprehensive simulation environment for anesthesiology (CASE) developed by David Gaba and his colleagues in the 1980s (Gaba & DeAnda, 1988). These investigators applied principles of crew resource management (CRM) used in aviation for training pilots to improve their teamwork and reduce the likelihood of critical events (Fritz et al., 2008; Salas et al., 2005) to anesthesiology, creating the anesthesia crisis resource management (ACRM) program (Howard et al., 1992). The goal of ACRM was to improve clinical team-based training, by training anesthetists and other team members to improve effective communication, positive group dynamics, and personnel and resource utilization. Robert Helmreich who worked with Gaba’s team also helped generate a similar program for the operating room, Team Oriented Medical Simulation (Helmreich & Schaffer, 1994). Soon simulation-based team training began to emerge in other areas of healthcare including intensive care, pediatrics, emergency medicine, cardiology, labor and delivery, neonatology, and radiology (Fritz et al., 2008; Gaba, 2010; Rosen, 2008). Several teaching institutions have now adopted the ACRM curriculum and require both trainees and experienced providers to undergo yearly training (Gaba et al., 2001). Healthcare team training expanded rapidly with the introduction of TeamSTEPPS™ in the mid-2000s. This program was developed through a collaboration of the US Department of Defense and the Agency for Healthcare Research and Quality (AHRQ), a division of the US National Institutes of Health. TeamSTEPPS™ is a set of evidence-based tools used to improve healthcare providers’ teamwork skills emphasizing leadership, situation monitoring, mutual support,

238

Human Factors in Simulation and Training

and communication. The program focuses on empowering healthcare professionals, patient families, and patients themselves to speak up whenever they have a significant concern to ensure the best possible quality of care. Research on the effectiveness of simulation for teaching TeamSTEPPS™ has primality focused on interdisciplinary teams, however, it has also been independently applied to teams in obstetrics, pediatrics, intensive care, anesthesiology, nursing, surgery, and emergency medicine (AHRQ, 2015). Although most research has shown positive benefits of TeamSTEPPS™ training on team skills, a recent review indicates that the program has led to improvements in patient safety, a reduction in medical errors, and increased patient satisfaction (Parker et al., 2019). Simulation-based team training has now become essential for several specialty areas of healthcare. One of the first areas outside of anesthesiology and surgery to begin using team training was emergency medicine. In this area, training is often focused on developing and enhancing general teamwork skills applicable to the variety of unique procedures and tasks these teams must perform (Weile et al., 2021). For example, it has been used to train effective team communication skills necessary for procedures such as resuscitation (Issenberg & Scalese, 2008) and to respond to trauma and cardiac arrest events (Weile et al., 2021). In addition, this method has been used to provide emergency medicine teams the opportunity to practice team skills that are critical for managing rare or unlikely situations, such as a mass casualty incident (Bracq et al., 2019; Fritz et al., 2008; Heinrichs et al., 2010). Simulation is also used in pediatric emergency medicine to train team skills. Like emergency medicine, this type of training is critical for trauma and cardiopulmonary resuscitation (Grant et al., 2016). Simulation-based training has been used to train pediatric teams in patient and family-centered care (e.g., breaking bad news and patient safety). Pediatric emergency medicine teams also undergo a type of training known as just-in-time (JIT) training, which provides pediatric teams the opportunity to rehearse a procedure via simulation before performing it on the patient. Another specialty area that utilizes simulation for team training is labor and delivery. Teams are trained to manage complications to the mother and the baby, such as shoulder dystocia (Crofts et al., 2016; Draycott et al., 2006; Maslovitz et al., 2007), postpartum hemorrhage (Birch et al., 2007; Draycott et al., 2006; Maslovitz et al., 2007; Riley et al., 2011), eclampsia (Draycott et al., 2006; Ellis et al., 2008), and breech delivery (Draycott et al., 2006; Maslovitz et al., 2007).

TRAINING TRANSFER The incorporation of simulation as a major training method for healthcare providers has increased dramatically since the early 2000s and numerous studies have been published touting the benefits of this approach. Ultimately, however, it is important to establish the effectiveness of simulation-based training. In this regard, many researchers have turned to the Kirkpatrick (1994) model. According to Kirkpatrick, training can be evaluated at four levels that transition from the training event itself to its impact on the operational environment. At Level 1, a learner’s attitudes and opinions regarding the training event are measured. At Level 2 interest lies in measuring

Healthcare Simulation and Training

239

the knowledge and skills a learner acquires from the training event. Level 3 targets transfer of training, measuring how knowledge and skills acquired from training affect performance back on the job. Last, at Level 4, interest lies with measuring specific work-related outcomes affected by the training. Another way to view the different levels is through the lens of translational science in a biomedical context (Dougherty & Conway, 2008; McGaghie et al., 2011a) in which treatments and solutions are evaluated first in the laboratory, then with patients, and finally within society as a whole. At present, much of the evidence supporting the effectiveness of simulation-based training is limited to Kirkpatrick levels 1 and 2, trainee attitudes and knowledge or skills measured in the simulated environment (Cook et al., 2011; McGaghie et al., 2011b; Paige et al., 2020). Initially, many studies of simulators were focused on student interest, enthusiasm, and confidence to garner growing instructional support for this alternative method of training (Cooper & Taqueti, 2008). Fourteen years later, however, Yunoki and Saki (2018) concluded that simulation training in healthcare has helped increase learner confidence, but that evidence of improved patient outcomes is still largely unknown. There continues to be a great need for research that goes beyond the lower Kirkpatrick levels and demonstrates the benefits of simulation-based training in clinical settings, on patient outcomes as well as at higher levels organizational and public policy (McGaghie et al., 2011a; Palaganas et al., 2016). However, some encouraging examples do exist. One of the seminal studies to demonstrate the benefits of simulation training with genuine patients was conducted by Seymour and his colleagues (2002). These investigators sought to determine whether laparoscopic surgical skills acquired on the MIST VR system would transfer to genuine laparoscopic procedures performed in the OR. They compared the performance of residents assigned to the standard “apprenticeship” training condition and those who had practiced on MIST VR for 3–8 training 1-hour sessions. Following training, all residents performed a procedure under the supervision of a surgeon and had videos of their performance recorded and assessed by surgeons who were blind to the conditions. The investigators found that residents who trained on MIST VR completed their surgeries in 29% less time and committed fewer errors than their counterparts in the standard training condition. Ultimately, this study showed that skills acquired by training on a laparoscopic VR simulator had positive benefits when transferred to the operating room. In a later study, Scerbo and colleagues (2006) compared two forms of simulation for training phlebotomy (i.e., drawing blood): a VR simulator and the more traditional approach using simulated limbs. They trained 20 medical students under one of the two methods and measured their performance with a 28-item checklist. The investigators found that performance improvements were limited to those who trained with the simulated limbs and not the VR system. A detailed comparison of the functional and physical characteristics of each simulation system revealed important differences which ultimately led the researchers to conclude that training with both systems might provide complementary benefits. One of the more compelling examples of successful transfer concerns simulationbased training for central venous catheter (CVC) insertion. A CVC is typically placed

240

Human Factors in Simulation and Training

into a large vein (e.g., internal jugular or subclavian vein) when larger volumes of fluid need to be infused than can be accommodated by smaller needles (e.g., hemodialysis). If not done properly, it can cause damage to the central veins, pulmonary or cardiac complications, and central line-associated bloodstream infections (which have an estimated mortality rate of 12% to 25%; Patel et al., 2019). Barsuk and his colleagues (2009) developed a training program for CVC placement following the mastery training model described above. In an initial study, a sample of residents was given a pretest, training, and a posttest, and performance was assessed with a 27-item checklist. The pretest results showed that mean performance scores fell under 50%. The residents were then given intensive training tailored to their specific needs/learning objectives, they were coached, and required to engage in repetitive simulation-based practice until they met minimum passing standards (80%). Posttest results showed that mean performance scores exceeded the minimum criterion. These investigators conducted a follow-up study in which they compared the performance of residents trained with the standard apprenticeship model to another group trained with the mastery approach (Barsuk et al., 2009). Both groups of residents were then monitored when they attempted CVC placements with genuine patients. The results showed that those who had received simulation-based mastery training needed significantly fewer attempts for successful placement and were less likely to fail in their attempts or require their catheters to be readjusted after placement. More important, in another study, Barsuk and his colleagues (2009) showed that patients who received CVCs from simulation-based mastery-trained residents were significantly less likely to develop central line-associated bloodstream infections. Moreover, a recent meta-analysis indicates that simulation-based training for CVC placement resulted in higher levels of procedural success (Madenci et al., 2014). Other evidence for transfer has been demonstrated with team training. In pediatric emergency medicine, Falcone et al. (2008) showed that simulated pediatric trauma team training with an emphasis on teamwork and communication resulted in improvements in trauma care related to initial assessments, airway management, cervical spine care, and pelvic fracture recognition and management. In another study, Andreatta et al. (2011) found a significant positive correlation between simulation-based mock code team training and pediatric cardiopulmonary arrest survival rates. Survival rates increased 33% after the initial training and nearly 50% one year after routine training had been implemented. Additionally, a recent review of the effectiveness of simulation-based neonatal and pediatric resuscitation team training found evidence that it improves team and technical performance for at least 6 months post-training (Lindhard et al., 2021). Benefits of simulation-based team training have also been observed in the area of labor and delivery with promising evidence for improved patient outcomes. Phipps et al. (2012) prospectively evaluated patient outcomes after labor and delivery teams received simulation team training based on principles of CRM and reported that the labor and delivery unit’s adverse outcome index (AOI) decreased significantly after the training intervention. Another prospective study by Riley et al. (2011) found a 37% decrease in perinatal mortality after hospital staff received team training

Healthcare Simulation and Training

241

on simulated obstetrical emergency scenarios. Some researchers have found that simulation-based team training resulted in a decrease in neonatal injuries at birth (Draycott et al., 2008; Crofts et al., 2016) or a reduction in the number of infants born with low 5-minute Apgar scores of 6 or less and hypoxic-ischemic encephalopathy (HIE) following the training. In a similar study, Siassakos et al. (2009) reviewed patient outcomes before and after staff completed simulation-based team training and found improved efficiency and a significant increase in the number of maneuvers teams used to alleviate cord compression. Fransen and colleagues (2017) also found that team training reduced neonatal shoulder injury and increased invasive treatment for severe postpartum hemorrhage. Further, in a recent review by Yucel et al. (2020), the authors reported that simulation-based team training in obstetrics reduced neonatal injuries, cesarean sections, and transfusion as well as increases in maneuvers for managing shoulder dystocia and cord compression. However, others have reported that the benefits are not necessarily observed much beyond the immediate period after training (van de Ven et al., 2017). Collectively, the work of Barsuk, his colleagues, and others shows that skills acquired from a simulation-based curriculum following the mastery training model can transfer from the training environment (Kirkpatrick levels 1 and 2) to improved patient care (Kirkpatrick level 3) and to a reduction in adverse outcomes, i.e., infections (Kirkpatrick level 4). Although the results of studies such as these are encouraging, measures of transfer used in other disciplines are not often seen in healthcare. For example, in the aviation community, there is a long history of measuring the amount of transfer from training to actual practice (Roscoe, 1980). Transfer is measured by calculating the difference in time between training under normal conditions and training with a new technique (i.e., simulation). Transfer is said to be positive if simulation training is more efficient than training under standard conditions. On the other hand, negative transfer would indicate that simulation training is less efficient than the standard. Povenmire and Roscoe (1973) suggested that transfer be measured with the transfer effectiveness ratio (TER), a ratio of the time saved in training to the time spent in simulation. TER values greater than 1 indicate that simulation training is efficient while values less than 1 show that simulation training introduces inefficiency. There have not been many studies reporting TERs for simulation in healthcare. Aggarwal and his colleagues (2007) reported the first TER for a laparoscopic VR simulator. They compared novice surgeons who practiced with the VR simulator to a control group that did not practice. Both groups were then assessed on their ability to perform a laparoscopic cholecystectomy on a set of 5 pig cadavers. The researchers estimated the TER to be 2.28, showing that their VR simulation training was very efficient. More recently, Lohre and colleagues (2020) recently determined the TER for training with a VR system addressing an orthopedic procedure (reverse shoulder arthroplasty) for a rotator cuff tear. Residents who used the VR system were compared to those who viewed a video. Post-training performance was measured on a cadaver. The investigators reported that VR training resulted in a TER that approached 1 (TER = 0.79) and determined that one 1 hour of simulation training was equivalent to 48 minutes of real-world training. It should be noted TER values

242

Human Factors in Simulation and Training

less than 1 do not necessarily mean that training with the simulator is inappropriate. In healthcare, TER values less than 1 may be entirely acceptable given safety concerns when transitioning to patient care. Although there continues to be a need for more research demonstrating the effectiveness of simulation training, particularly at the higher Kirkpatrick levels, it should be noted that investigators face some real challenges when gathering data from genuine patient care settings. The lack of evidence at the higher Kirkpatrick levels; however, must be considered in the context of some real challenges investigators face when gathering data from genuine patient care settings. First, it is difficult to control for the wide variety of individual differences among patients and all of their potential comorbidities. Thus, treatments can vary considerably for patients who present with the same complaint/condition. Second, there are federal laws that govern patient information. The Health Insurance Portability and Accountability Act of 1996 (HIPAA) was established to protect against the disclosure of sensitive patient health information without a patient’s consent. Thus, data gathered from genuine patients must comply with human subjects/ethical review and HIPAA rules. Third, providers treat patients in many places where training and education are not a primary emphasis. Fourth, even in facilities where training is an important goal, it takes a significant administrative effort to observe and evaluate trainees under rigorous experimental conditions (e.g., holding constant opportunities to perform a procedure, the evaluators, patient conditions, case load, shift time, fatigue, etc.). Last, in a recent review, Paige et al. (2020) noted that many educators who use simulation are themselves focused more on outcomes measured in the training environment than in clinical practice.

HEALTHCARE SIMULATION AND THE PANDEMIC On March 11, 2020, the World Health Organization declared Covid-19 a pandemic. The virus spread rapidly from country to country with rising infection rates and loss of life. The need to respond to overwhelming numbers of infected patients and keep frontline providers safe became paramount in all hospitals and healthcare facilities. Perhaps no other event in history generated a greater demand for healthcare simulation methods, centers, and personnel. The rapid rise in cases and hospitalizations placed a tremendous strain on the healthcare system, requiring the deployment of new personnel into acute care settings, development of new treatment spaces, and creative methods of dealing with equipment shortages. One of the immediate needs was to train staff to handle large numbers of Covid19 patients while minimizing the risk of exposure and spread within hospitals. Simulation was utilized initially to familiarize personnel in critical care units (intensive care units, emergency rooms) with basic Covid-19 care principles and then to address non-critical care staff from other specialties being deployed in Covid-19 care units. Mannequin simulators were used to train for respiratory failure, circulatory failure, bedside procedures, and Covid-19 patient care. In many instances, training was carried out on a massive scale. Delamarre et al. (2022) describe their training efforts to use in situ simulation and peer-teaching for how to don and doff

Healthcare Simulation and Training

243

personal protective equipment and manage airways in simulated infected patients. They report training over 1,600 healthcare workers in 99 sessions over 11 days! Zucco and colleagues (in press) also used just-in-time, in situ simulation training aimed at workflow changes among anesthesia, nursing, and surgery staff in their healthcare network over 3 weeks. They evaluated the impact of their training by measuring compliance with the new Covid-19 workflows for cases of confirmed or suspected Covid19. Their results showed 95% compliance with new Covid-19 workflow protocols accompanied by lower-than-expected positive test rates among staff. Other institutions relied on VR simulation and immersive training on smartphones or VR headsets to teach Covid-related knowledge and skills, such as proper PPE use, infection control measures, how to use ventilators, and approaches to endof-life conversations (Liaw et al., 2021; Young & Aquilina, 2021). The ability to provide this training remotely in addition to its scalability enabled a variety of individuals to participate, including practicing professionals, retirees, medical students, and volunteers all while protected from potential exposure to Covid-19. In a randomized control trial, individuals who underwent this training demonstrated significant improvements in their Covid-related knowledge as well as a reduction in Covidrelated anxiety and stress compared with a control group that received traditional training methods. Collectively, the results from these studies highlight the feasibility of using a variety of forms of simulation to quickly train large numbers of employees to care for Covid-19 patients with minimal risk of exposure. Other researchers report using in situ simulation to identify latent safety threats (LSTs). Sharara-Chami et al. (2020) conducted 15 simulations over 2 weeks and uncovered LSTs tied to inadequate preparedness for infection control, uncertainty of procedural guidelines and protocols, and poor communication. Balmaks (2021) and colleagues also used in situ simulation to develop an action plan for mitigating occupational hazards and spread of the virus relying on healthcare failure mode and effects analysis. They uncovered several organizational, individual, and environmental issues related to a lack of clear guidelines and policies, noncompliance with policies and procedures, the flow of patient traffic, as well as maintenance, cleaning, and information availability for staff and patients. The analysis enabled them to take action to control many of the failure modes identified and develop a nationally approved set of recommendations for donning and doffing. In spring of 2020, the pandemic nearly exhausted medical equipment, supplies, and even oxygen. Among the more serious problems faced by many hospital systems was a shortage of ventilators. Burnett et al. (2021) turned to high-fidelity simulation to evaluate potential solutions to the problem of ventilator shortages by testing designs for safely splitting ventilators, repurposing non-invasive ventilators to invasive ventilators, and expanding the ability to test new ventilator designs. Simulation facilitated all of this testing of devices and protocols before there was any need to use them on actual patients. Another area where simulation proved invaluable was studying the potential risk of exposure to the virus by staff who needed to perform aerosol-generating procedures (e.g., endotracheal intubation) in Covid-19 patients. Using simulation, Shavit (2020) and colleagues developed a method to evaluate the suitability and

244

Human Factors in Simulation and Training

acceptability of an alternative biological isolation gown to be used by emergency department staff. They used a non-visible fluorescent marker as an indicator of contamination and had participants perform airway management procedures following Covid-19 protocols. They then examined the garments after doffing with an ultraviolet light to visualize potential contamination and found that no markers were visualized on any of the participants suggesting that the gown was a suitable alternative. Oman et al. (2023) used simulation to evaluate devices designed to mitigate the spread of aerosol and droplet-sized particles. These researchers discovered that an acrylic box and a plastic drape designed as barriers did not mitigate particle spread, but that the drape increased the time needed to perform the procedure. In another unique demonstration, Lampotang and his colleagues (2022) used simulation to evaluate a method used in low-resource areas to conserve oxygen. Lampotang simulated an adult Covid-19 patient with hypoxemia and examined whether caregivers could interrupt the flow of oxygen flow by manually crimping the oxygen tubing with pliers during exhalation. They discovered that pinching the tubing reduced oxygen use by over 50% without a significant drop in simulated oxygen saturation. These studies show the benefit of using simulation to examine novel ideas prior to implementation with patients.

CONCLUSION In 2004, Gaba offered some predictions for simulation in healthcare. He described a future where simulation training is not only a requirement but is the driving force behind changes to healthcare curricula, where patients demand a level of safety comparable to aviation, where simulation-based standards for training are required by regulatory agencies, and where the effectiveness of medical devices is gathered in trials using simulation. Where are we today? Although we have not yet realized Gaba’s future much progress has been made. In the last 20 years, simulation has gone from being a novelty to an accepted and even expected and necessary method of training and education in healthcare. One of the first specialty areas to adopt and promote simulation was anesthesiology and the importance of simulation was underscored when the American Board of Anesthesiology began requiring providers to demonstrate communication and technical skills in simulation scenarios as part of their Maintenance of Certification in Anesthesiology program (AMA, 2021) (https:// theaba​.org​/staged​%20exams​.html). Professional societies have also emerged that bring together educators, practitioners, and researchers concerned with simulation (e.g., the Society in Europe for Simulation Applied to Medicine and the Society for Simulation in Healthcare in the US, which established Simulation in Healthcare, the first peer-reviewed journal dedicated solely to simulation in 2006). The International Association for Clinical Simulation in Nursing published standards for simulation performance in 2010 (INACSL, 2021). Centers dedicated solely to simulationbased training and education are now commonplace. The Society for Simulation in Healthcare offers accreditation of simulation center programs with over 100 centers in 10 countries receiving full accreditation since 2010 (SSH, 2021). Moreover, the

Healthcare Simulation and Training

245

technology supporting training and education in healthcare simulation has become more complex and diverse to address a wider range of specialty areas, clinical procedures and activities, as well as differences among patients. However, there are still many challenges that lie ahead for simulation to reach the vision that Gaba described. More research is needed to show the benefits of simulation training at the highest Kirkpatrick levels. Although simulation-based mastery training described by McGaghie, Barsuk, and Wayne (2020) shows much promise for improving learner performance and safety for patients, it is not norm. There is also a need to establish standards for training transfer like those used in aviation and other high-risk disciplines that rely on simulation training. Last, there are costs to consider. Acquiring simulation equipment, creating a center and maintaining a staff, and pulling providers from clinical duties to engage in training requires significant ongoing investment and administrative support. In spite of these challenges, simulation has clearly had a transformational effect on the training and education of healthcare providers. For the first time in history, it has afforded the opportunity for objective performance assessment with standardized metrics in a safe and controlled environment. Simulation has been shown to increase proficiency and reduce the performance variance and errors of providers. Moreover, lessons learned from the pandemic showed that simulation could train and prepare legions of healthcare providers to treat Covid-19 patients while minimizing their own risk of exposure. Finally, data have begun to show that patients treated by simulation-trained providers may be at a lower risk for harm. The recommendation by the US Institute of Medicine (Kohn et al., 2000) to use simulation for training 20 years ago has indeed begun to make the US healthcare system safer.

REFERENCES Abrahamson, S., Denson, J. S., & Wolf, R. M. (1969). Effectiveness of a simulator in training anesthesiology residents. Journal of Medical Education, 44(6), 515–519. Accreditation Council for Graduate Medical Education (ACGME). (2020). ACGME Program Requirements for Graduate Medical Education in Anesthesiology. Retrieved from https://www​.acgme​.org​/Portals​/0​/PFAssets​/ProgramRequirements​/040​_Anesthesiology​ _2020​.pdf​?ver​=2020​-06​-18​-132902​-423 Ackerman, M. J. (1998). The visible human project. Proceedings of the IEEE, 86(3), 504–511. http://doi​.org​/10​.1109​/5​.662875 Agency for Healthcare Research and Quality (AHRQ). (2015). TeamSTEPPS®: Research/ Evidence Base. Retrieved from https://www​.ahrq​.gov​/teamstepps​/evidence​-base​/ simulation​.html Aggarwal, R., Ward, J., Balasundaram, I., Sains, P., Athanasiou, T., & Darzi, A. (2007). Proving the effectiveness of virtual reality simulation for training in laparoscopic surgery. Annals of Surgery, 246, 771–779. http://doi​.org​/10​.1097​/SLA​.0b013e3180f61b09 Ahn, J., Yashar, M. D., Novack, J., Davidson, J., Lapin, B., Ocampo, J., & Wang, E. (2016). Mastery learning of video laryngoscopy using the glidescope in the emergency department. Simulation in Healthcare, 11(5), 309–315. http://doi​.org​/10​.1097​/SIH​ .0000000000000164 American Board of Anesthesiology. Retrieved on April 28 2021 on the WWW from https:// theaba​.org​/staged​%20exams​.html

246

Human Factors in Simulation and Training

Andreatta, P., Saxton, E., Thompson, M., & Annich, G. (2011). Simulation-based mock codes significantly correlate with improved pediatric patient cardiopulmonary arrest survival rates. Pediatric Critical Care Medicine, 12(1), 33–38. http://doi​.org​/10​.1097​/ PCC​.0b013e3181e89270 Baker, D. P., Gustafson, S., Beaubien, J. M., Salas, E., & Barach, P. (2005). Medical team training programs in health care. In K. Henriksen, J. B. Battles, E. S. Marks, & D. I. Lewin (Eds.), Advances in Patient Safety: From Research to Implementation, 4 (pp. 253–267). Rockville, MD: AHRQ Publication. Balmaks, R., Grāmatniece, A., Aija, V., et al. (2021). A simulation-based failure mode analysis of SARS-CoV-2 infection control and prevention in emergency departments. Simulation in Healthcare, 16, 386–391. Barrows, H. S. (1993). An overview of the uses of standardized patients for teaching and evaluating clinical skills. Academic Medicine, 68(6), 443–53. http://doi​.org/ 10.1097/00001888-199306000-00002 Barsuk, J. H., Cohen, E. R., Feinglass, J., McGaghie, W. C., & Wayne, D. B. (2009). Use of simulation-based education to reduce catheter-related bloodstream infections. Archives of Internal Medicine, 169(15), 1420–1423. http://doi​.org​/10​.1001​/archinternmed​.2009​ .215 Barsuk, J. H., McGaghie, W. C., Cohen, E. R., Balachandran, J. S., & Wayne, D. B. (2009). Use of simulation-based mastery learning to improve the quality of central venous catheter placement in a medical intensive care unit. Journal of Hospital Medicine, 4(7), 397–403. http://doi​.org​/10​.1002​/jhm​.468 Barsuk, J. H., McGaghie, W. C., Cohen, E. R., O’Leary, K. J., & Wayne, D. B. (2009). Simulation-based mastery learning reduces complications during central venous catheter insertion in a medical intensive care unit. Critical Care Medicine, 37(10), 2697–2701. http://doi​.org​/10​.1097​/CCM​.0b013e3181a57bc1 Beard, L., Wilson, K., Morra, D., & Keelan, J. (2009). A survey of health-related activities on second life. Journal of Medical Internet Research, 11(2), e17. 1–19. http://doi​.org​/10​ .2196​/jmir​.1192 Bell, R. H., Biester, T. W., Tabuenca, A., Rhodes, R. S., Cofer, J. B., Britt, L. D., & Lewis, F. R. (2009). Operative experience of residents in US general surgery programs: A gap between expectation and experience. Annals of Surgery, 249(5), 719–724. http://doi​.org​ /10​.1097​/SLA​.0b013e3181a38e59 Birch, L., Jones, N., Doyle, P. M., Green, P., McLaughlin, A., Champney, C., … Taylor, K. (2007). Obstetric skills drills: Evaluation of teaching methods. Nurse Education Today, 27, 915–922. https://doi​.org​/10​.1016​/j​.nedt​.2007​.01​.006 Boulet, J. R., Smee, S. M., Dillon, G. F., & Gimpel, J. R. (2009). The use of standardized patient assessments for certification and licensure decisions. Simulation in Healthcare, 4(1), 35–42. http://doi​.org​/10​.1097​/SIH​.0b013e318182fc6c Bracq, M. S., Michinov, E., & Jannin, P. (2019). Virtual reality simulation in nontechnical skills training for healthcare professionals: A systematic review. Simulation in Healthcare, 14(3), 188–194. http://doi​.org​/10​.1097​/SIH​.0000000000000347 Bradley, P. (2006). The history of simulation in medical education and possible future directions. Medical Education, 40(3), 254–262. http://doi​.org​/10​.1111​/j​.1365​-2929​.2006​.02394.x Buck, G. H. (1991). Development of simulators in medical education. Gesnerus, 48(1), 7–28. https://doi​.org​/10​.1163​/22977953​- 04801002 Burnett, G., Shah, A., Fried, E. A., et al. (2021). Using simulation to develop solutions for ventilator shortages from the epicenter. Simulation in Healthcare, 16, 78–79. Cartwright, M. S., Reynolds, P. S., Rodriguez, Z. M., Breyer, W. A., & Cruz, J. M. (2005). Lumbar puncture experience among medical school graduates: The need for formal

Healthcare Simulation and Training

247

procedural skills training. Medical Education, 39(4), 437–437. http://doi​.org​/10​.1111​/j​ .1365​-2929​.2005​.02118.x Chopra, V., Bovill, J. G., Spierdijk, J., & Koornneef, F. (1992). Reported significant observations during anaesthesia: A prospective analysis over an 18-month period. British Journal of Anaesthesia, 68(1), 13–17. https://doi​.org​/10​.1093​/ bja​/68​.1​.13 Cook, D. A., Hatala, R., Brydges, R., et al. (2011). Technology-enhanced simulation for health professions education: A systematic review and meta-analysis. JAMA, 306(9), 978– 988. http://doi​.org​/10​.1001​/jama​.2011​.1234 Cooper, J. B., & Taqueti, V. (2008). A brief history of the development of mannequin simulators for clinical education and training. Postgraduate Medical Journal, 84(997), 563–570. http://doi​.org​/10​.1136​/qshc​.2004​.009886 Crofts, J. F., Lenguerrand, E., Bentham, G. L., et al. (2016). Prevention of brachial plexus injury—12 years of shoulder dystocia training: An interrupted time-series study. British Journal of Obstetrics & Gynecology, 123(1), 111–118. http://doi​.org​/10​.1111​/1471​-0528​ .13302 Daher, S., Hochreiter, J., Schubert, R., Gonzalez, L., Cendan, J., Anderson, M., Diaz, D. A., & Welch, G. F. (2020). The physical-virtual patient simulator: A physical human form with virtual appearance and behavior. Simulation in Healthcare, 15(2), 115–121. http:// doi​.org​/10​.1097​/SIH​.0000000000000409 Dawson, S. L. (2002). A critical approach to medical simulation. Bulletin of the American College of Surgeons, 87(11), 12–18. Dawson, S. L., Cotoin, S., Meglan, D., Shaffer, D., & Ferrell, M. (2000). Designing a computer-based simulator for interventional cardiology training (with editorial comment). Catheterization and Cardiovascular Interventions, 51(4), 522–528. http:// doi​.org​/10​.1002​/1522​-726x(200012)51:43​.0​.co​​;2-7 Delamarre, L., Couarraze, S., Vardon-Bounes, F., et al. (2022). Mass training in situ during COVID-19 pandemic: Enhancing efficiency and minimizing sick leaves. Simulation in Healthcare, 17, 42–48. Delp, S. L., Loan, J. P., Hoy, M. G., Zajac, F. E., Topp, E. L., & Rosen, J. M. (1990). An interactive graphics-based model of the lower extremity to study orthopaedic surgical procedures. IEEE Transactions on Biomedical Engineering, 37(8), 757–767. http://doi​ .org​/10​.1109​/10​.102791 Delp, S. L., & Zajac, F. E. (1992). Force-and moment-generating capacity of lower-extremity muscles before and after tendon lengthening. Clinical Orthopaedics and Related Research, 284, 247–259. Dev, P. (2016). Simulation: A view into the future of education. In C. A. Weaver, M. J. Ball, G. R. Kim, & J. M. Kiel (Eds.), Healthcare Information Management Systems (pp. 317–329). http://doi​.org​/10​.1007​/978​-3​-319​-20765-0 Dougherty, D., & Conway, P. H. (2008). The “3T’s” road map to transform US health care: The “how” of high-quality care. JAMA, 299(19), 2319–2321. http://doi​.org​/10​.1001​/ jama​.299​.19​.2319 Draycott, T. J., Crofts, J. F., Ash, J. P., et al. (2008). Improving neonatal outcome through practical shoulder dystocia training. Obstetrics & Gynecology, 112(1), 14–20. http://doi​ .org​/10​.1097​/AOG​.0b013e31817bbc61 Draycott, T., Sibanda, T., Owen, L., Akande, V., Winter, C., Reading, S., & Whitelaw, A. (2006). Does training in obstetric emergencies improve neonatal outcome? BJOG: An International Journal of Obstetrics & Gynaecology, 113(2), 177–182. https://doi​.org​/10​ .1111​/j​.1471​- 0528​.2006​.00800.x Ellis, D., Crofts, J. F., Hunt, L. P., Read, M., Fox, R., & James, M. (2008). Hospital, simulation center, and teamwork training for eclampsia management: A randomized

248

Human Factors in Simulation and Training

controlled trial. Obstetrics & Gynecology, 111(3), 723–731. http://doi​.org​/10​.1097​/AOG​ .0b013e3181637a82 Ewy, G. A., Felner, J. M., Juul, D., Mayer, J. W., Sajid, A. W., & Waugh, R. A. (1987). Test of a cardiology patient simulator with students in fourth-year electives. Journal of Medical Education, 62(9), 738–743. Falcone Jr., R. A., Daugherty, M., Schweer, L., Patterson, M., Brown, R. L., & Garcia, V. F. (2008). Multidisciplinary pediatric trauma team training using high-fidelity trauma simulation. Journal of Pediatric Surgery, 43(6), 1065–1071. https://doi​.org​/10​.1016​/j​ .jpedsurg​.2008​.02​.033 Farmer, E., van Rooij, J., Riemersma, J., Jorna, P., & Moraal, J. (2003). Handbook of Simulation-Based Training. Burlington, VT: Ashgate. Forrest, K. (2019). What is simulation education. In K. Forrest & J. McKimm (Eds.), Healthcare Simulation at a Glance (pp. 4–5). Hoboken, NJ: John Wiley & Sons Ltd. Fransen, A. F., van de Ven, J., Schuit, E., van Tetering, A., Mol, B. W., & Oei, S.G. (2017). Simulation-based team training for multi-professional obstetric care teams to improve patient outcome: A multicentre, cluster randomized controlled trial. British Journal of Obstetrics & Gynecology, 124(4), 641–650. http://doi​.org​/10​.1111​/1471​- 0528​.14369 Fritz, P. Z., Gray, T., & Flanagan, B. (2008). Review of mannequin‐based high‐fidelity simulation in emergency medicine. Emergency Medicine Australasia, 20(1), 1–9. https://doi​.org​/10​.1111​/j​.1742​- 6723​.2007​.01022.x Gaba, D. M. (2004). The future vision of simulation in healthcare. Quality and Safety in Health Care, 13 (Supp 1), i2–i10. http://doi​.org/ 10.1097/01.SIH.0000258411.38212.32 Gaba, D. M. (2010). Crisis resource management and teamwork training in anaesthesia. British Journal of Anaesthesia, 105, 3–6. https://doi​.org​/10​.1093​/ bja​/aeq124 Gaba, D. M., & DeAnda, A. (1988). A comprehensive anesthesia simulation environment: Re-creating the operating room for research and training. Anesthesiology, 69(3), 387–394. Gaba, D. M., Howard, S. K., Fish, K. J., Smith, B. E., & Sowb, Y. A. (2001). Simulation-based training in anesthesia crisis resource management (ACRM): A decade of experience. Simulation & Gaming, 32(2), 175–193. https://doi​.org​/10​.1177​/104687810103200206 Gordon, M. S. (1974). Cardiology patient simulator: Development of an animated manikin to teach cardiovascular disease. The American Journal of Cardiology, 34(3), 350–355. https://doi​.org​/10​.1016​/0002​-9149(74)90038-1 Grant, V. J., Wolff, M., & Adler, M. (2016). The past, present, and future of simulation-based education for pediatric emergency medicine. Clinical Pediatric Emergency Medicine, 17(3), 159–168. https://doi​.org​/10​.1016​/j​.cpem​.2016​.05​.005 Heinrichs, L., Fellander-Tsai, L., & Davies, D. (2013). Clinical virtual worlds: The wider implications for professionals. In K. Bredl & W. Bösche (Eds.), Serious Games and Virtual Worlds in Education, Professional Development, and Healthcare (pp. 817– 836). Hershey, PA: IGI Global. Heinrichs, W. L., Youngblood, P., Harter, P., Kusumoto, L., & Dev, P. (2010). Training healthcare personnel for mass-casualty incidents in a virtual emergency department: VED II. Prehosp Disaster Med, 25(5), 424–432. http://doi​.org​/10​.1017​/S1049023X00008505 Helmreich, R. L., & Schaefer, H. G. (1994). Team performance in the operating room. In M. S. Bogner (Ed.), Human Error in Medicine (pp. 225–253). Boca Raton: CRC Press. Howard, S. K., Gaba, D. M., Fish, K. J., Yang, G., & Sarnquist, F. H. (1992). Anesthesia crisis resource management training: Teaching anesthesiologists to handle critical incidents. Aviation, Space, and Environmental Medicine, 63(9), 763–770. INACSL Standards of Best Practice: Simulation. Retrieved on April 28 2021 on the WWW from https://www​.inacsl​.org​/inacsl​-standards​-of​-best​-practice​-simulation​/ history​-of​ -the​-inacsl​-standards​-of​-best​-practice​-simulation/

Healthcare Simulation and Training

249

Issenberg, S. B., & Scalese, R. J. (2008). Simulation in health care education. Perspectives in Biology and Medicine, 51(1), 31–46. http://doi​.org​/10​.1353​/pbm​.2008​.0004 Jones, F., Passos-Neto, C. E., & Braghiroli, O. F. M. (2015). Simulation in medical education: Brief history and methodology. Principles and Practice of Clinical Research, 1(2). 46–54. http://doi​.org​/10​.21801​/ppcrj​.2015​.12.8 Kirkpatrick D. (1994). Evaluating Training Programmes; The Four Levels. San Francisco, CA: Berrett-Kochler Publishers. Kleinsmith, A., Rivera Gutierrez, D., Finney, G., Cendan, J., & Lok, B. (2015). Understanding empathy training with virtual patients. Computers in Human Behavior, 52, 151–158. https://doi​.org​/10​.1016​/j​.chb​.2015​.05​.033 Kohn, L., Corrigan, J., & Donaldson, M. (2000). To Err is Human: Building a Safer Health System. Washington, DC: National Academy Press. Kron, F. W., Fetters, M. D., Scerbo, M. W., White, C. B., Lypson, M. L., Padilla, M. A., … Becker, D. M. (2017). Using a computer simulation for teaching communication skills: A blinded multisite mixed methods randomized controlled trial. Patient Education and Counseling, 100(4), 748–759. https://doi​.org​/10​.1016​/j​.pec​.2016​.10​.024 Laerdal Medical Corp. (2014). Pricelist (p. 20). Retrieved from https://www​.ogs​.state​.ny​.us​/ purchase​/spg​/pdfdocs​/3823219745PL ​_ Laerdal​.pdf Lampotang, S., DeStephens, A., Zarour, I., et al. (2022). Manual conservation of supplemental oxygen in low-resource settings during the COVID-19 pandemic. Simulation in Healthcare, 17, 95–97. Lane, H. C., Hays, M. J., Core, M. G., & Auerbach, D. (2013). Learning intercultural communication skills with virtual humans: Feedback and fidelity. Journal of Educational Psychology, 105(4), 1026–1035. https://doi​.org​/10​.1037​/a0031506 Leape, L. L., Brennan, T. A., Laird, N., Lawthers, A. G., Localio, A. R., Barnes, B. A., … Hiatt, H. (1991). The nature of adverse events in hospitalized patients: Results of the Harvard medical practice study II. New England Journal of Medicine, 324(6), 377–384. http://doi​.org​/10​.1056​/ NEJM199102073240605 Liaw, S. Y., Choo, T., Wu, L. T., Lim, W. S., Choo, H., Lim, S. M., … Lau, T. C. (2021). “Wow, woo, win”-Healthcare students’ and facilitators’ experiences of interprofessional simulation in three-dimensional virtual world: A qualitative evaluation study. Nurse Education Today, 105, 1–6. http://doi​.org​/10​.1016​/j​.nedt​.2021​.105018 Liaw, S. Y., Wu, L. T., Soh, S. L. H., Ringsted, C., Lau, T. C., & Lim, W. S. (2020). Virtual reality simulation in interprofessional round training for health care students: A qualitative evaluation study. Clinical Simulation in Nursing, 45, 42–46. http://doi​.org​ /10​.1016​/j​.ecns​.2020​.03​.013 Liaw, S. Y., Wu, L. T., Wong, L. F., Soh, S. L. H., Chow, Y. L., Ringsted, C., … Lim, W. S. (2019). “Getting everyone on the same page”: Interprofessional team training to develop shared mental models on interprofessional rounds. Journal of General Internal Medicine, 34(12), 2912–2917. http://doi​.org​/10​.1007​/s11606​- 019​- 05320-z Lindhard, M. S., Thim, S., Laursen, H. S., Schram, A. W., Paltved, C., & Henriksen, T. B. (2021). Simulation-based neonatal resuscitation team training: A systematic review. Pediatrics, 147(4). https://doi​.org​/10​.1542​/peds​.2020​- 042010 Lohre, R., Bois, A. J., Pollock, J. W., et al. (2020). Effectiveness of immersive virtual reality on orthopedic surgical skills and knowledge acquisition among senior surgical residents: A randomized clinical trial. JAMA Network Open, 1–12. http://doi:10.1001/ jamanetworkopen.2020.31217 Madenci, A. L., Solis, C. V., & de Moya, M. A. (2014). Central venous access by trainees: A systematic review and meta-analysis of the use of simulation to improve success rate on patients. Simulation in Healthcare, 9(1), 7–14. http://doi​.org​/10​.1097​/SIH​ .0b013e3182a3df26

250

Human Factors in Simulation and Training

Maslovitz, S., Barkai, G., Lessing, J. B., Ziv, A., & Many, A. (2007). Recurrent obstetric management mistakes identified by simulation. Obstetrics & Gynecology, 109(6), 1295–1300. http://doi​.org​/10​.1097​/01​.AOG​.0000265208​.16659​.c9 McGaghie, W. C. (2020). Mastery learning: Origins, features, and evidence from health professions. In W. C. McGaghie, J. H. Barsuk, & D. B.Wayne (Eds.), Comprehensive Healthcare Simulation: Mastery Learning in Health Professions Education (pp. 27– 46). Cham, Switzerland: Springer. McGaghie, W. C., Barsuk, J. H., & Wayne, D. B. (2020). Comprehensive Healthcare Simulation: Mastery Learning in Health Professions Education. Cham, Switzerland: Springer. McGaghie, W. C., Draycott, T. J., Dunn, W. F., Lopez, C. M., & Stefanidis, D. (2011a). Evaluating the impact of simulation on translational patient outcomes. Simulation in Healthcare, 6(Suppl), S42–S47. http://doi​.org​/10​.1097​/SIH​.0b013e318222fde9 McGaghie, W. C., Issenberg, S. B., Cohen, E. R., Barsuk, J. H., & Wayne, D. B. (2011b). Does simulation-based medical education with deliberate practice yield better results than traditional clinical education? A meta-analytic comparative review of the evidence. Academic Medicine, 86(6), 706–711. https://doi​.org​/10​.1097​/ACM​.0b013e318217e119 Moorthy, K., Munz, Y., Adams, S., Pandey, V., & Darzi, A. (2005). A human factors analysis of technical and team skills among surgical trainees during procedural simulations in a simulated operating theatre. Annals of Surgery, 242(5), 631–639. http://doi​.org​/10​.1097​ /01​.sla​.0000186298​.79308​.a8 Mozumder, M. A. I., Sheeraz, M. M., Athar, A., Aich, S., & Kim, H. C. (2022). Overview: Technology roadmap of the future trend of metaverse based on IOT, blockchain, AI technique, and medical domain metaverse activity. In 2022 24th International Conference on Advanced Communication Technology (ICACT) (pp. 256–261). IEEE. 10.23919/ICACT53585.2022.9728808 Nestel, D., & Kelly, M. (2018). An introduction to healthcare simulation. In D. Nestel, M. Kelly, B. Jolly, & M. Watson (Eds.), Healthcare Simulation Education: Evidence, Theory and Practice (pp. 1–6). West Sussex: John Wiley & Sons, Ltd. Nestel, D., Sanko, J., & McNaughton, N. (2018). Simulated participant methodologies: Maintaining humanism in practice. In D. Nestel, M. Kelly, B. Jolly, & M. Watson (Eds.), Healthcare Simulation Education: Evidence, Theory and Practice (pp. 45–53). West Sussex: John Wiley & Sons, Ltd. Newlin-Canzone, E. T., Scerbo, M. W., Gliva-McConvey, G., & Wallace, A. M. (2013). The cognitive demands of standardized patients: Understanding limitations in attention and working memory with the decoding of nonverbal behavior during improvisations. Simulation in Healthcare, 8(4), 207–214. http://doi​.org​/10​.1097​/SIH​.0b013e31828b419e Oman, S. P., Sanghavi, D. K., Helgeson, S. A., et al. (2023). Simulation method for testing aerosol mitigation strategies, an observational study. Simulation in Healthcare, 18(1), 8–15. Owen, H. (2012). Early use of simulation in medical education. Simulation in Healthcare, 7(2), 102–116. http://doi​.org​/10​.1097​/SIH​.0b013e3182415a91 Owen, H. (2016). Simulation and teaching in resuscitation and trauma management. In H. Owen (Ed.), Simulation in Healthcare Education: An Extensive History (pp. 417–430). Cham, Switzerland: Springer. Owen, H. (2018). Historical practices in healthcare simulation: What we still have to learn. In D. Nestel, M. Kelly, B. Jolly, & M. Watson (Eds.), Healthcare Simulation Education: Evidence, Theory and Practice (pp. 16–22). West Sussex: John Wiley & Sons, Ltd. Paige, J. B., Graham, L. L., & Sittner, B. (2020). Formal training efforts to develop simulation educators: An integrative review. Simulation in Healthcare, 15(4), 271–281. http://doi​ .org​/10​.1097​/SIH​.0000000000000424

Healthcare Simulation and Training

251

Palaganas, J. C., Brunette, V., & Winslow, B. (2016). Prelicensure simulation-enhanced interprofessional education: A critical review of the research literature. Simulation in Healthcare, 11(6), 404–418. http://doi​.org​/10​.1097​/SIH​.0000000000000175 Parker, A. L., Forsythe, L. L., & Kohlmorgen, I. K. (2019). TeamSTEPPS®: An evidence‐based approach to reduce clinical errors threatening safety in outpatient settings: An integrative review. Journal of Healthcare Risk Management, 38(4), 19–31. https://doi​ .org​/10​.1002​/jhrm​.21352 Patel, A. R., Patel, A. R., Singh, S., & Khawaja, I. (2019). Central line catheters and associated complications: A review. Cureus, 11(5), e4717. http://doi​.org​/10​.7759​/cureus​.4717 Perlman, R. E., Pawelcazak, M., Yacht, A. C., et al. (2017). Program director perceptions of proficiency of the core entrustable professional activities. Journal of Graduate Medical Education, 9(5), 588–592. http://doi​.org​/10​.4300​/JGME​-D​-16​- 00864.1 Phipps, M. G., Lindquist, D. G., McConaughey, E., O'Brien, J. A., Raker, C. A., & Paglia, M. J. (2012). Outcomes from a labor and delivery team training program with simulation component. American Journal of Obstetrics and Gynecology, 206(1), 3–9. https://doi​ .org​/10​.1016​/j​.ajog​.2011​.06​.046 Povenmire, H. K., & Roscoe, S. N. (1973). Incremental transfer effectiveness of a groundbased general aviation trainer. Human Factors, 15, 534–542. https://doi​.org​/10​.1177​ /001872087301500605 Prichard, J. S., Bizo, L. A., & Stratford, R. J. (2011). Evaluating the effects of team-skills training on subjective workload. Learning and Instruction, 21(3), 429–440. https://doi​ .org​/10​.1016​/j​.learninstruc​.2010​.06​.003 Reed, T., Pirotte, M., McHugh, M., et al. (2016). Simulation-based mastery learning improves medical student performance and retention of core clinical skills. Simulation in Healthcare, 11(3), 173–180. http://doi​.org​/10​.1097​/SIH​.0000000000000154 Rudolph, J. W., Raemer, D. B., & Simon, R. (2014). Establishing a safe container for learning in simulation: The role of the presimulation briefing. Simulation in Healthcare, 9(6), 339–349. http://doi​.org​/10​.1097​/SIH​.0000000000000047 Riley, W., Davis, S., Miller, K., Hansen, H., Sainfort, F., & Sweet, R. (2011). Didactic and simulation nontechnical skills team training to improve perinatal patient outcomes in a community hospital. The Joint Commission Journal on Quality and Patient Safety, 37(8), 357–364. http://doi​.org​/10​.1016​/S1553​-7250(11)37046-8 Roscoe, S. N. (1980). Aviation Psychology. Ames, IA: Iowa University Press. Roscoe, S. N., & Williges, B. H. (1980). Measurement of transfer of training. In S. N. Roscoe (Ed.), Aviation Psychology (pp. 182–193). Ames, IA: Iowa University Press. Rosen, K. R. (2008). The history of medical simulation. Journal of Critical Care, 23(2), 157–166. https://doi​.org​/10​.1016​/j​.jcrc​.2007​.12​.004 Salas, E., Sims, D. E., & Burke, C. S. (2005). Is there a “big five” in teamwork? Small Group Research, 36(5), 555–599. https://doi​.org​/10​.1177​/1046496405277134 Salas, E., Wilson, K. A., Burke, C. S., Wightman, D. C., & Howse, W. R. (2006). A checklist for crew resource management training. Ergonomics in Design, 14(2), 6–15. https://doi​ .org​/10​.1177​/106480460601400204 Satava, R. M. (1993). Virtual reality surgical simulator. Surgical Endoscopy, 7(3), 203–205. Satava, R. M. (2001). Accomplishments and challenges of surgical simulation. Surgical Endoscopy, 15(3), 232–241. http://doi​.org/ 10.1007/s004640000369 Scerbo, M. W., & Anderson, B. L. (2012). Medical simulation. In P. Carayon (Ed.), Handbook of Human Factors and Ergonomics in Health Care and Patient Safety (2nd Ed., pp. 557–571). Boca Raton, FL: CRC Press. Scerbo, M. W., Belfore, L. A., Garcia, H. M., et al. (2007). A virtual operating room for context-relevant training. Proceedings of the Human Factors & Ergonomics Society 51st Annual Meeting, 507–511. Santa Monica, CA: Human Factors & Ergonomics Society.

252

Human Factors in Simulation and Training

Scerbo, M. W., Bliss, J. P., Schmidt, E. A., & Thompson, S. N. (2006). The efficacy of a medical virtual reality simulator for training phlebotomy. Human Factors, 48(1), 72– 84. http://doi​.org​/10​.1518​/001872006776412171 Schroedl, C. J., Frogameni, A., Barsuk, J. H., Cohen, E. R., Sivarajan, L., & Wayne, D. B (2020). Impact of simulation-based mastery learning on resident skill managing mechanical ventilators. ATS Scholar, 2, 34–48. http://doi​.org​/10​.34197​/ats​-scholar​ .2020​- 0023OC Seymour, N. E., Gallagher, A. G., Roman, S. A., O’briend, M. K., Bansal, V. K., Andersen, D. K., & Satava, R. M. (2002). Virtual reality training improves operating room performance. Annals of Surgery, 236(4), 458–464. http://doi​.org​/10​.1097​/00000658​ -200210000​- 00008 Sharara-Chami, R., Sabouneh, R., Zeineddine, R., et al. (2020). In situ simulation: An essential tool for safe preparedness for the COVID-19 pandemic. Simulation in Healthcare, 15, 303–309. Shavit, D., Feldman, O., Hussein, K., et al. (2020). Assessment of alternative personal protective equipment by emergency department personnel during the SARS-CoV-2 pandemic: A simulation-based pilot study. Simulation in Healthcare, 15, 445–446. Siassakos, D., Hasafa, Z., Sibanda, T., Fox, R., Donald, F., Winter, C., & Draycott, T. (2009). Retrospective cohort study of diagnosis–delivery interval with umbilical cord prolapse: The effect of team training. BJOG: An International Journal of Obstetrics & Gynaecology, 116(8), 1089–1096. http://doi​.org​/10​.1111​/j​.1471​- 0528​.2009​.02179.x Singh, H., Kalani, M., Acosta-Torres, S., El Ahmadieh, T. Y., Loya, J., & Ganju, A. (2013). History of simulation in medicine: From Resusci Annie to the Ann Myers medical center. Neurosurgery, 73 (suppl_1), S9–S14. https://doi​.org​/10​.1093​/neurosurgery​/73​ .suppl​_1​.S9 Sinz, E. H. (2007). Anesthesiology national CME program and ASA activities in simulation. Anesthesiology Clinics, 25(2), 209–223. https://doi​.org​/10​.1016​/j​.anclin​.2007​.03​.012 Society for Simulation in Healthcare. Retrieved on April 28 2021 on the WWW from https:// www​.ssih​.org​/Credentialing​/Accreditation. Sutton, C., McCloy, R., Middlebrook, A., Chater, P., Wilson, M., & Stone, R. (1997). MIST VR. A laparoscopic surgery procedures trainer and evaluator. Studies in Health Technology and Informatics, 39, 598–607. Swezey, R. W., & Andrews, De. H. (Eds.). (2001). Readings in Training and Simulation: A 30-Year Perspective. Santa Monica, CA: Human Factors and Ergonomics Society. van de Ven, J., Fransen, A. F., Schuit, E., van Runnard Heimel, P. J., Mol, B. W., & Oei, S. G. (2017). Does the effect of one-day simulation team training in obstetric emergencies decline within one year? A post-hoc analysis of a multicentre cluster randomised controlled trial. European Journal of Obstetrics & Gynecology and Reproductive Biology, 216, 79–84. http://doi​.org​/10​.1016​/j​.ejogrb​.2017​.07​.020 Vincenzi, D. A., Wise, J. A., Mouloua, M., & Hancock, P. A. (Eds.). (2009). Human Factors in Simulation and Training. Boca Raton, FL: CRC Press. Walsh, K., & Jaye, P. (2012). The relationship between fidelity and cost in simulation. Medical Education, 12(46), 1226–1228. http://doi​.org​/10​.1111​/j​.1365​-2923​.2012​.04352.x Weaver, S. J., Dy, S. M., & Rosen, M. A. (2014). Team-training in healthcare: A narrative synthesis of the literature. BMJ Quality & Safety, 23(5), 359–372. http://doi​.org​/10​.1136​ /bmjqs​-2013​- 001848 Weile, J., Nebsbjerg, M. A., Ovesen, S. H., Paltved, C., & Ingeman, M. L. (2021). Simulationbased team training in time-critical clinical presentations in emergency medicine and critical care: A review of the literature. Advances in Simulation, 6(1), 1–12. http://doi​ .org​/10​.1186​/s41077​- 021​- 00154-4

Healthcare Simulation and Training

253

Weinger, M. B., & Gaba, D. M. (2014). Human factors engineering in patient safety. Anesthesiology, 120(4), 801–806. https://doi​.org​/10​.1097​/ALN​.0000000000000144 Weller, J., & Civil, I. (2018). Teamwork and healthcare simulation. In D. Nestel, M. Kelly, B. Jolly, & M. Watson (Eds.), Healthcare Simulation Education: Evidence, Theory and Practice (pp. 127–134). West Sussex: John Wiley & Sons, Ltd. Woodward, C. (1998). Standardized patients: A fixed role therapy experience in normal individuals. Journal of Constructivist Psychology, 11(2), 133–48. https://doi​.org​/10​ .1080​/10720539808404645 Young, A., & Aquilina, A. (2021). Use of virtual reality to support rapid upskilling of healthcare professionals during COVID-19 pandemic. In XR Case Studies (pp. 137– 145). Cham: Springer. http://doi​.org​/10​.1007​/978​-3​- 030​-72781​-9​_17 Yucel, C., Hawley, G., Terzioglu, F., & Bogossian, F. (2020). The effectiveness of simulationbased team training in obstetrics emergencies for improving technical skills: A systematic review. Simulation in Healthcare, 15(2), 98–105. http://doi​.org​/10​.1097​/SIH​ .0000000000000416 Yunoki, K., & Sakai, T. (2018). The role of simulation training in anesthesiology resident education. Journal of Anesthesia, 32(3), 425–433. http://doi​.org/ 10.1007/ s00540-018-2483-y Ziv, A. Wolpe, P. R., Small, S. D., & Glick, S. (2003). Simulation-based medical education: An ethical imperative. Academic Medicine, 78(8), 783–788. http://doi​.org​/10​.1097​/01​ .SIH​.0000242724​.08501​.63 Zucco, L., Chen, M. J., Levy, N., et al. (2023). Just-in-time in-situ simulation training as a preparedness measure for the perioperative care of COVID-19 patients. Simulation in Healthcare, 18(2), 90–99.

9

Best Practices in Surgical Simulation Dominique Doster, Christopher Thomas, and Dimitrios Stefanidis

CONTENTS Introduction............................................................................................................. 255 Current Technologies Used in Surgical Simulation for Skills Training.................. 257 Technical Skills Simulation........................................................................... 257 Low-Cost Simulators......................................................................... 257 High-Cost Simulators........................................................................ 259 Nontechnical Skills Simulation.....................................................................260 The Objective Structured Clinical Examination (OSCE)..................260 Team-Based Training......................................................................... 261 Application of Simulation across the Continuum of Surgical Training.................. 261 Undergraduate Medical Education................................................................ 261 Graduate Medical Education......................................................................... 262 Postgraduate Training of Practicing Surgeons............................................... 265 Human Factors and Surgical Simulation................................................................ 265 Equipment Design and Ergonomics.............................................................. 265 Performance Optimization.............................................................................266 Mental Skills Optimization and Coaching.........................................266 Developing Expertise......................................................................... 267 Surgical Team Dynamics............................................................................... 267 Utilizing Simulation to Minimize Subjectivity in Assessment...................... 268 Future of Simulation and Human Factors in Surgery............................................. 268 Conclusion.............................................................................................................. 269 References............................................................................................................... 270

INTRODUCTION The apprenticeship model of surgical training was originally pioneered in 1892 by William Halsted, and traditionally involved the subjective observation of trainee performance under the supervision of a skilled surgical teacher with graded responsibility and enhanced independence until proficiency was reached (Wright Jr. & Schachar, 2020). The earliest evidence of surgical simulation in the modern model of surgical residency was Dr. Halsted’s use of dog labs as a means to teach procedural DOI: 10.1201/9781003401353-9

255

256

Human Factors in Simulation and Training

and team-based skills. While early attempts at simulation-based surgical training relied heavily on animal models and cadavers for practicing technical skills, the field of surgical simulation didn’t really take off until the emerging technologies of the early 1970s sparked the imagination of surgical educators. The ability to objectively track, measure and assess performance using virtual reality technology inspired surgical educators to embrace simulation and continues to be the motivation for its integration into surgical training programs today. The goal of simulation is to afford training opportunities to surgical trainees to hone their technical and nontechnical skills outside the operating room and clinical environment and provide an improved method of objective skill assessment compared to other traditional methods that rely purely on human raters. In the operating room, tension and stress can run high, and mistakes can lead to devastating consequences for patients. Such a high-stress, high-risk environment is hardly the place for new learners to acquire new skills. To address this important limitation of the traditional training paradigm, simulators allow practice on the same technical and nontechnical skills in a more controlled and low-stress environment. Simulators provide a variety of benefits, including the ability to optimize performance through deliberate practice of the trainee and removing assessment bias through the incorporation of objective performance metrics, that ultimately decrease risk to patients (Stefandis et al., 2019). These benefits have driven the Residency Review Committee (RRC) for Surgery at the Accreditation Council for Graduate Medical Education (ACGME) to recommend the integration of simulation as a means to teach technical and nontechnical skills to surgical residents. The American Board of Surgery now requires certification in two key simulation-based assessments of technical skill to be eligible to register for the certifying examination: Fundamentals of Laparoscopic Surgery (FLS) and the Fundamentals of Endoscopic Surgery (FES). The American College of Surgeons has also created a network of accredited simulation centers to provide training opportunities to surgical trainees and practicing surgeons to push the field of surgical simulation forward through the exchange of ideas and best practices. While workplace-based assessments remain the goal standard with regard to the assessment of learners (Norcini & Burch, 2007; Holmboe et al., 2010), simulationbased assessments can significantly contribute to the training of surgeons as they permit the focused and recurrent evaluation of technical and nontechnical skills in a safe learning environment (Ziv et al., 2003). However, care must be taken to ensure validity evidence supports simulation-based assessments as a reliable evaluation of skill (Cook & Hatala, 2016). This can be difficult to establish as technical and nontechnical perioperative encounters are challenging to replicate.​ Furthermore, incorporating simulation into surgical skills curricula requires a well thought-out approach that promotes deliberate practice and optimizes skill acquisition of the learner. The use of immediate feedback and/or debriefing helps learners solidify skills and techniques practiced in the simulation environment. Deliberate and repetitive practice is foundational to mastery learning (McGahie et al., 2010). Timing of simulation sessions would ideally occur before the skill is performed in the clinical environment to maximize the yield of training. The

Best Practices in Surgical Simulation

257

challenge arises when residency programs are faced with time constraints due to duty hour restrictions and training session scheduling conflicts with clinical prac­ tice (Stefandis et al., 2015). Nevertheless, optimizing the integration of simulation into surgical training curricula is becoming more of an expectation and less of an option.​ The purpose of this chapter is to elaborate on a number of aspects pertaining to best practices in surgical simulation. We seek to provide an appraisal of simulation modalities and their relevant uses in surgical training. Furthermore, the authors seek to review the use of simulation as a means to optimize the influence of human factors in the field of surgery.​

CURRENT TECHNOLOGIES USED IN SURGICAL SIMULATION FOR SKILLS TRAINING Technical Skills Simulation Low-Cost Simulators Providing easy access for learners to hone basic technical skills, low-cost simulators encompass a wide range of platforms starting with simple knot-tying boards and suturing pads. Nearly every physician can recall learning how to suture and knot ties during their surgery rotation on such low-cost simulators (Gomez et al., 2014; see Figures 9.1a and 9.1b). While often referred to as low-fidelity simulators, these low-cost platforms for learning basic procedural skills can actually quite closely replicate simple suturing and knot-tying techniques. Laparoscopic box trainers also fall under the category of relatively low-cost simulators and provide a platform for practicing the basic laparoscopic surgical techniques of passing items between graspers, intracorporeal suturing, knot tying and cutting. These box trainers are widely available at most training institutions and are used in the Fundamentals of Laparoscopic Surgery technical assessment (Zendejas et al., 2016).​

FIGURE 9.1  Examples of low-cost suturing pad (a) and knot-tying board (b).

258

Human Factors in Simulation and Training

While the previously mentioned simulators are platforms to practice generalizable surgical skills, other low-cost, procedure-specific simulators exist. These include rudimentary models made of affordable materials or 3D-printed forms that seek to recreate true-to-life anatomy for procedures like chest tube placement using butchered pork ribs, or cricothyroidotomy using ventilator tubing (see Figures 9.2a, 9.2b, 9.3, 9.4a, and 9.4b). Despite their low cost and ease of assembly, the haptic feedback can be surprisingly accurate.

FIGURE 9.2  Example of a low-cost chest tube simulator using cadaveric bovine ribs.

FIGURE 9.3  Central line procedural simulator for practicing internal jugular and subclavian central line placement.

Best Practices in Surgical Simulation

259

FIGURE 9.4  Low-cost bowel anastomosis model using simulated bowel. Can be used to practice hand-sewn and stapled bowel anastomoses.

The obvious strength of these simulators is their increased accessibility to trainees due to their cost-effectiveness. Students can purchase knot-tying boards and suture pads off the internet, and laparoscopic box trainers can be assembled at home. Importantly, the repetitive nature of deliberate practice that requires multiple practice sessions and repetition of specific skills to reach the desired level of performance makes these simulators highly valuable even for established simulation centers due to their low cost of use. Hence, such simulators are among the most widely used by education institutes of the American College of Surgeons (Korndorffer et al., 2006). Nevertheless, the simplistic nature of these models makes them less useful as trainees progress into learning more advanced skills and procedures (Korndorffer et al., 2013). This has driven the adoption of both low- and high-cost simulators in the ACS/ APDS skills curriculum for surgical trainees (Scott & Dunnington, 2008).​ High-Cost Simulators Virtual Reality (VR) simulators are most often used in the context of laparoscopic, endoscopic, and robotic simulation of tasks and procedures (see Figures 9.5a and 9.5b). Such simulators provide a number of benefits for procedural training: they allow repetitive practice of procedures at no additional cost, provide multiple objective metrics for trainee performance assessment, can provide virtual feedback and coaching on trainee performance, and can include multiple procedures on the same platform. Nevertheless, they are associated with an increased upfront cost that is often over $100,000, with yearly maintenance contracts costing more than $20,000 (Parham et al., 2019). Such costs can make these VR simulators unattainable for many small to medium-sized training programs.​ Prior to the technological advances that have taken place since the 1970s, cadavers and animal models were the primary platform for training outside the operating room both for technical and nontechnical surgical skills. Cadavers have traditionally

260

Human Factors in Simulation and Training

FIGURE 9.5  High-cost Simbionix laparoscopic simulator (a) and Davinci robotic simulator (b). Both platforms incorporate virtual reality technical skills tasks and specific operations.

been used in either a fresh frozen or embalmed format. Though cadavers provide the benefit of true-to-life anatomic layers and tissue planes, their lifespan is relatively short-lived. While the cost of obtaining a cadaver may pale in comparison to the VR simulators ($1,500–$3,000 depending on source), the required ventilation system and infrastructure needed to support such simulation endeavor is limited. The use of animal models in surgical simulation was identified in an effort to provide a similar physiology to an operative patient with a beating heart and active bleeding. Advanced Trauma Operative Management (ATOM) is a course dedicated to teaching skills needed to manage traumatic injuries and was developed around a porcine model. However, the use of animal models for educational simulations has met notable setbacks over the last several years due to ethical concerns, prompting many international trainees to travel outside of their home country to seek out these educational opportunities (Gala et al., 2012). While these various higher-fidelity simulators often provide the benefit of greater complexity, improved haptic feedback, and truer-to-life tissue handling, the greatest drawbacks are the cost and accessibility. The use of high-cost models for training students who don’t plan to pursue surgical specialization is unnecessary. However, as surgical trainees progress and are entrusted with more technically complex skills and procedures in the operating theater, the benefits of the added complexity of highfidelity models balance their cost (Johnston et al., 2016).

Nontechnical Skills Simulation The Objective Structured Clinical Examination (OSCE) OSCEs have been an integral part of clinical training and learner assessment for over three decades (Sloan et al., 1993). These examinations utilize standardized patients

Best Practices in Surgical Simulation

261

or mannequins in simulated clinical arenas to present scenarios similar to situations surgical trainees experience on the job. They are designed to assess clinical decisionmaking. Learners often report feeling similar stress to actual clinical encounters as the simulation environment closely replicates actual clinical areas and the patients are often human actors (Pena et al., 2015). These encounters are recorded and reviewed during debrief sessions and provide a valuable learning opportunity for students and residents to watch actions they may or may not realize they perform (Pucher et al., 2014). These OSCE scenarios often present common perioperative complications and provide an assessment of students’ and residents’ clinical management and bedside manner (Sudan et al., 2014). Team-Based Training Nontechnical simulation has been an integral asset to the development and training of surgeon-led provider teams both inside the operating room and in other clinical arenas. Given the complexity and dynamic nature of the operating room, surgeons are charged with the responsibility to minimize human error and maximize efficiency during cases. Surgical team training strategies can transform dysfunctional OR teams into highly reliable and effective OR teams (Sudan et al., 2014; Robertson et al., 2017). Simulation-based trauma team training includes multidisciplinary teams of surgeons, emergency medicine physicians, and nurses who work together on scenarios commonly seen in Emergency Department trauma bays. These programs are associated with improved team dynamics, task performance quality and speed, knowledge, and provider satisfaction (McLaughlin et al., 2019). The ACS/ APDS has found these team-based scenarios vital to the curriculum for surgical trainees and has adopted 10 of these scenarios as part of Phase 3 of their curriculum. However, given the cost and complexity of organizing multidisciplinary team training sessions, this phase of the curriculum has seen the slowest adoption rate (Korndorffer et al., 2013).

APPLICATION OF SIMULATION ACROSS THE CONTINUUM OF SURGICAL TRAINING In the following paragraphs, we will address the application of surgical simulation in the training of medical students, surgical residents, as well as practicing surgeons.

Undergraduate Medical Education Basic Life Support (BLS) and Advanced Cardiac Life Support (ACLS) are two simulation-based certifications that are required of all advanced hospital personnel. Medical students are expected to pass and maintain renewed certification when they begin their clinical rotations. The courses include various combinations of in-person and online learning in which certified instructors provide a universal framework to guide students through the basics of life support, CPR, rescue breathing, and for ACLS, the interpretation and treatment of life-threatening heart rhythms. Implementation of BLS and ACLS guidelines has shown improved outcomes for

262

Human Factors in Simulation and Training

cardiac arrest patients in and out of the hospital (Honarmand et al., 2018; Kleinman et al., 2017). The goal of these programs is to eliminate discrepancy in the quality of care of critically ill patients by providing healthcare workers with a universal framework that can be utilized in high-stakes, high-stress situations. While medical students are expected to possess the medical knowledge required to safely navigate the wards with attending supervision, their role in the clinical healthcare setting pertaining to procedural skills is less defined. Though it is their goal to be ready to enter specialty training at the time of graduation, the reality is medical students are often excluded from performing common procedures such as IV placement, laceration repair, and splinting of fractures on live patients that were routinely performed by students in the past (Tan & Skye, 2009). This is often times due to medical-legal, patient safety, and patient consent concerns. Additionally, there is great variability in medical student teaching and autonomy between different medical schools and even within an individual medical school (Naeem et al., 2018). These hurdles provide an excellent role for simulation to fill the ever-growing, hands-on educational gap. The use of technical and nontechnical simulators to fill this gap has become widespread across US medical schools (Stefandis et al., 2019). Due to the ubiquity of use of simulators, students can practice skills on both low- and high-cost simulators, resulting in increased proficiency (Stefandis et al., 2019; Borggreve et al., 2017; Okuda et al., 2009; Olasky et al., 2019; Yeh et al., 2017). In response to patient safety initiatives and a need for standardization of medical student skill acquisition, the American College of Surgeons (ACS) and Association of Surgical Education (ASE) created a simulation-based skills curriculum with the aim of improving medical student procedural skills. Evidence has shown success in this curriculum and it has been praised for its cost-effectiveness and accessibility (Olasky et al., 2019). Furthermore, there has been an increase in the popularity of surgical residency “boot camps” over the last 10 years. These simulation-based electives for students and pre-residents allow trainees to hone their psychomotor, cognitive, and technical skills to best prepare them for the start of surgical internship (Yeh et al., 2017; Hudson, 2018a).

Graduate Medical Education Residency is the time when previously undifferentiated doctors with a broad knowledge base in medicine are honed into specialty providers. Those in the surgical specialties must be trained from novices to full-fledged surgeons in 4–7 years, depending on the specific subspecialty. Simulation allows these trainees to learn and practice operative tasks without posing risk to a live patient. A variety of simulationbased curricula and certifications have been established to ensure competency in tasks relevant to each subspecialty. The ACS and APDS developed the Surgery Resident Skills Curriculum with the goal of providing a structured, longitudinal 3-phase program that targets both technical and nontechnical skill development. The curriculum includes various modules ranging from beginner bedside procedures to more advanced operating room

Best Practices in Surgical Simulation

263

techniques (Bartlett et al., 2017). Phase 1 is titled “Core Skills,” and is designed to teach residents in the early stages of their training the 16 essential skills required to operate. Phase 2, “Advanced Procedures,” is designed to teach mid and seniorlevel residents 15 common surgical procedures using cadavers, animal models, and virtual reality. Lastly, Phase 3, “Team-Based Skills,” focuses on the development of nontechnical skills required to be an effective surgeon-leader through 10 simulated scenarios (Hudson, 2018b). The scenarios presented are common dilemmas encountered in the surgical management of patients pre-, intra-, and post-operatively (Bartlett et al., 2017). The Fundamentals of Laparoscopic Surgery , Fundamentals of Endoscopic Surgery, and Fundamentals of Robotic Surgery are three simulation-based certification programs used to train residents in the technology and basic skills required to perform common laparoscopic, endoscopic, and robotic operations. Each course uses specific simulator models, as the main teaching platform, allowing for hands-on experience in a risk-free environment. The technical skills assessed on each simulator mimic foundational maneuvers required to be competent on the actual platforms of laparoscopy, endoscopy, and robotic surgery (see Figures 9.6a and 9.6b). FLS and FES are now required for certification by the American Board of Surgery, and many practicing surgeons are advocating for the completion of FRS prior to certification as well (Fundamentals of Laparoscopic Surgery, 2021a, 2021b, 2021c). Advanced Trauma Life Support (ATLS), similar to BLS and ACLS, is a program designed to provide a standardized framework in the clinical evaluation and initial procedural management of the undifferentiated trauma patient. It involves a combination of online and in-person instruction aimed at familiarizing trauma providers

FIGURE 9.6  Laparoscopic box trainer (a) used for Fundamentals of Laparoscopic Surgery (FLS) skills practice and assessment. (b) GI mentor used for Fundamentals of Endoscopic Surgery skills practice and assessment.

264

Human Factors in Simulation and Training

with the algorithms that prioritize life-threatening injuries and their management followed by identification of non-life-threatening injuries. The in-person class relies heavily on simulation, and mannequins and standardized patients are used for both technical and scenario-based simulations. ATLS has been shown to improve outcomes in trauma and has become ubiquitous in trauma education (Mohammad & Abu-Zidan, 2014). Advanced Trauma Operative Management is yet another simulation-based trauma certification program, only it focuses more so on the operative management of traumatic injuries to the chest and abdomen. Developed in Hartford, Connecticut in 1998, ATOM uses an animal model to train and assess a learner’s ability to identify and repair life-threatening traumatic injuries. The benefit of this simulation model is that it can reproduce physiologic processes that are not feasible in cadaveric models, such as bleeding, bile leakage, urine production, and breathing (Advanced Trauma Operative Management, 2021). Studies have appreciated improvements in residents’ trauma knowledge and technical skills upon completion of ATOM (Ali et al., 2008). While such skills curricula have been implemented alongside the main surgery resident curriculum, some programs have dedicated an entire month of residency training to simulated procedural skills training. The University of Miami Department of Surgery has created a “Technical Skills Rotation” for general surgery residents whereby residents spend a month doing simulated skills in VR, laparoscopic trainers, and scenario simulation. The residents in the study group reported this rotation to be a positive experience overall (Gonzalez et al., 2010). Furthermore, some surgical programs in the US have integrated procedural simulation training as a prerequisite to clinical training. At Indiana University, the novel “Laparoscopic Cholecystectomy” rotation is one such experience. It utilizes the Simbionix LAP Mentor simulator to train residents in the fundamentals of laparoscopic cholecystectomy over a series of VR modules. Upon completion of the simulator modules, the residents then travel to various sites performing only laparoscopic cholecystectomies with supervising faculty for the remainder of the rotation. They record each procedure and review their performance with a faculty. At the end of the rotation, there is a post-test on critical steps of the procedure, relevant anatomy, and perioperative management of patients with cholecystitis. Faculty evaluation data of residents performing laparoscopic cholecystectomy demonstrated an improvement in technical proficiency among residents who completed the rotation compared to those who did not have this rotation in their curriculum (Huffman et al., 2021). A similar model has been implemented in surgical training programs for endoscopy, not only at Indiana University (Mizota et al., 2020) but at other institutions across the country. The University of Michigan Otolaryngology residency has a similar rotation in which residents perform simulated tasks on mannequins and VR models to develop skills in airway management. Theses simulation tasks are coupled with a dedicated anesthesia rotation, with the goal of maximizing endoscopic skill. Faculty and residents noticed an improvement in residents’ second-year preparedness and procedural skills (Kovatch et al., 2019).

Best Practices in Surgical Simulation

265

Postgraduate Training of Practicing Surgeons The average length of practice for a surgeon after the completion of training is 32 years (Jonasson & Kwakawa, 1996). With the rate of development of new medical devices and technologies, it is impossible to expect every practicing surgeon to have formal training during residency on newer devices that were not yet around. For this purpose, many medical device companies produce practice devices which can be used to orient practicing surgeons to the new technology in a simulation environment. While companies use these simulation sessions as marketing ploys to encourage the incorporation of their new technology into surgical practice, they also provide a safe environment for post-graduate surgeons to develop new skills. However, given the wide range of surgeon age and variability in practice patterns despite evidence for best practice, one could argue that simulation is currently underutilized. The ACS–AEI consortium has attempted to bridge this gap by affording training opportunities to practicing surgeons at participating sites.

HUMAN FACTORS AND SURGICAL SIMULATION The application of human factors to surgery is multifaceted. While there is no field with as great a breadth of research pertaining to human factors as aviation, surgery draws the most notable parallel. Similar to pilots, surgeons often function in “autopilot” with most cases following a predictable pattern. Even when things don’t go as planned, surgeons are trained to adapt and adjust the “flight” plan, so to speak, as needed. Furthermore, in surgery, the stakes are equally high and the cost of error is more than monetary, it’s life or death or complication associated decrease in patient quality of life. Many aspects of human factors directly relate to both the practice of surgery and the training of surgeons. These include surgical equipment design and ergonomics, performance optimization, surgical team dynamics, and subjectivity of trainee performance assessment. This intersection of biomedical engineering and performance psychology makes the field of human factors in surgery very relevant today.

Equipment Design and Ergonomics Simulation provides the optimal setting for integrating human factors concepts pertaining to device design and operative environment layout. One such example of device design and evidence-based revision involved the BD Odon Device used in the assistance of vaginal delivery in simulated operative births (Obrien et al., 2017). After multiple simulated assessments and user feedback sessions, biomedical engineers were able to redesign the BD Odon Device in a way that increased the percentage of practitioners able to successfully perform an operative vaginal delivery. Similarly, several new surgical instruments and equipment can be tested in the simulated environment and perfected based on surgeon feedback prior to their implementation in the high-stakes clinical environment.

266

Human Factors in Simulation and Training

Instrument design and its interplay with anthropometry and movement science not only impacts surgeon lifestyle and physical well-being, but it also drives surgical technology innovation and acceptance. Work-related musculoskeletal disorders are prevalent among surgeons and often times drive practice modification (Catanzarite et al., 2018). Surgeons and surgical trainees are at increased ergonomic risk as they operate for hours on end, often times standing in suboptimal positions due to difficult exposure, strenuous retraction, and/or use of loupes, headlamps, or microscopes (Athanasiadis et al., 2021). Developing methods that help identify surgeons at ergonomic risk and programs that aid the mitigation of this risk are highly desirable (Park et al., 2017). To that end, many surgeons are opting to integrate the robotic surgery platform into their practice due to the enhanced ergonomic profile compared to laparoscopic and open surgery (Wee et al., 2020). However, more human factors work and device redesign needs to be done as trunk, wrist and finger strain are still notable across all surgical platforms (Catanzarite et al., 2018).

Performance Optimization Optimizing performance involves defining a set proficiency goal and utilizing strategies to achieve consistent performance at that level (Wulf & Lewthwaite, 2016). The value of deliberate practice is undeniable (Macnamara et al., 2014). The pathway to developing expert surgical performance is a demanding process. It involves completing consecutive complex tasks with precision and accuracy, all the while maintaining clear and concise communication with members of the surgical team in high-pressure situations (Spanager et al., 2015). Mental Skills Optimization and Coaching Feedback comes in two different forms: intrinsic, which is feedback that arises from self-reflection, and augmented, which is feedback that comes from external sources, such as a coach. It may come as no surprise that beginner trainees benefit more from augmented feedback, while experts rely more on intrinsic feedback to improve performance. It can be up to the coach to provide this augmented feedback in an individualized, applicable, and encouraging way (Stefandis et al., 2019). When feedback and coaching are optimized, surgeon performance has been shown to improve (Greenberg et al., 2015, 2016). Additionally, beginners often suffer from the deterioration of skills when placed in high-pressure situations with inadequate coping mechanisms. The ability to maintain skillful performance under stress has been termed cognitive or mental skills, and such skills can be taught by a trained coach. These skills are used in professional athletics, the military, and aviation to help participants perform their best in all situations (Dashauer et al., 2019). These mental skills curricula have already found their way into surgical training and have shown positive outcomes in helping trainees mitigate the impact of stress on surgical performance (Anton et al., 2017, 2019). The body of evidence showing improvement in operating room performance with simulation dates back to 2002 and has been growing rapidly since (Seymour et al., 2002; Sroka et al., 2010). In such a structured coaching environment, quantitative

Best Practices in Surgical Simulation

267

assessment and repetition allow trainees to recognize their common mistakes and create personalized training regimens and mental maps to aid success. Developing Expertise Educational psychologists often reference the Dreyfus model of skill acquisition as the conceptual framework for developing expertise (Dreyfus, 1986). This framework details the progression through five stages of performance beginning with novice and progressing to advanced beginner, competent, proficient, and finally to expert performer. One prominent difference between proficient and expert performance lies in the ability to problem-solve based on prior experience. An expert surgeon recognizes potential errors that may arise due to the subtle complexities of the case at hand and utilizes a deeper understanding of surgery as a whole, which has been developed by prior experience, to arrive at a novel solution that avoids the error altogether (Bereiter & Scardamalia, 1993). While the value of simulation in teaching and assessing clinical skills through the first four stages has been clearly established, human factors pertaining to error recognition, distraction avoidance, and stress mitigation are an integral part in the transition from proficiency to expertise, and recent work has started to look at how to use simulation to explore these areas. Simulation-based error management training (EMT) has been promoted across multiple surgical disciplines (Franklin et al., 2021; Sternbach et al., 2017). These mastery learning-based EMT curricula utilize various low- and high-fidelity simulators to demonstrate pre-completed procedures containing a variable number of procedural errors. Trainee identification of errors is the crux of EMT learning objectives and implies an ability to not only do the procedure correctly but to also identify when things are not progressing in the appropriate manner. This skill also relies heavily on the ability to minimize distractions and focus on the task at hand. Stress mitigation intraoperatively is another human factor that distinguishes expert surgeons. While stress can enhance performance by improving concentration, alertness, and economy of motion, excessive levels of stress have been shown to impair judgment, decision-making, and communication intraoperatively (Wetzel et al., 2005). Furthermore, ineffective stress-coping strategies in surgical trainees correlate with poor performance on virtual-reality laparoscopic simulators (Hassan et al., 2006). Inspired by the success of stress-management programs in aviation, the military, and professional sports, surgical training programs have started to utilize simulation as a means to provide stress-management skills training and assess trainees in high-stress trauma simulations (Goldberg et al., 2018). In training residents to adopt stress-management strategies early on in their technical careers, the hope is that the path to achieving surgical expertise is expedited and patient outcomes are improved.

Surgical Team Dynamics Human Factors and how they impact surgical teams include not only communication style and situational awareness but also leadership dynamics and task coordination. Surgeons do not operate in isolation. Therefore, human factors pertaining to device

268

Human Factors in Simulation and Training

ergonomics and performance optimization mean nothing if not understood within the context of the surgical team (Flin et al., 2015). Most surgical cases involve an anesthesiologist who is responsible for the patient’s airway and sedation, a first assistant who can be a resident physician or physician assistant, a surgical technologist who passes surgical instruments directly into the hands of the surgeons, and a circulating nurse who is responsible for making sure all of the necessary supplies are in the operating room. Healthy interactions, modeled by the surgeon, are vital not only to the health and well-being of the patient but also to the members of operative and trauma teams. Multi-institutional work has been done to combine scenario and procedural-based training to trauma teams using the Advanced Modular Manikin (AMM), a novel simulation platform (Stefandis et al., 2021). Simulators such as these create the optimal environment for team training and allow for the study of human factors in team dynamics.

Utilizing Simulation to Minimize Subjectivity in Assessment Simulation provides a unique platform in its ability to quantitate performance and provide objective feedback and coaching during repetitive practice (Okuda et al., 2009). The utility of assessing surgical trainees in a quantitative manner is a fairly recent paradigm shift in surgical training. The traditional Halsted model of training has senior faculty assessing trainees only through direct observation inside and outside the OR. This model is wrought with bias and subjectivity and does not allow for objective comparisons between performances or performers. This subjectivity extends beyond the scope of an individual surgical faculty and is further illuminated by the diminished inter-rater reliability between faculty assessing the same performance or performer (Gawad et al., 2019; Andersen et al., 2021). These differing assessments have been shown to be universal and are due to a variety of factors, such as the underlying expertise and predetermined expectations of the observer. The inter-rater reliability in some cases can be trained away; however, an objective assessment system would bypass this need altogether (Pradarelli et al., 2020). Simulation technology enables the quantification of new, objective, and potentially more robust performance metrics such as grip strength, excess motion, and gaze direction (Stefandis et al., 2019). Further, newer non-traditional performance metrics that assess learner tissue handling during simulation may provide additional benefits to trainees as they acquire surgical skills (Witthaus et al., 2020; Huffman et al., 2020). These quantitative assessments inform and supplement feedback and have become crucial in the training of surgeons (Vaidya et al., 2020). As these quantifiable metrics increase in number and quality, they will provide a more discrete target to work toward, forcing the art of surgical education to evolve into the science of surgical education.

FUTURE OF SIMULATION AND HUMAN FACTORS IN SURGERY In order to discuss the future of simulation, one must understand its current utility and present trends. Simulation has clearly become the gold standard in fields

Best Practices in Surgical Simulation

269

where safety is a top priority. This explains the early integration of simulation into military and aviation training and resultant exemplar status (Aebersold, 2016). Using these fields as examples, one may safely assume the future of simulation in surgery contains higher-fidelity simulators that integrate higherlevel human factors into performance assessment. Despite its relative infancy, simulation in surgery offers many undeniable advantages. As the technology of machine learning algorithms (AI) progresses, the opportunity for more objective automated performance assessments is endless. Furthermore, the quantification of skill performance and the ability to practice complex tasks without risk to human lives have been revolutionized by simulation. As such, the next logical step was to create a variety of tests/certifications of proof of proficiency in many important tasks prior to gaining the privilege to perform the task in actual patients. This trend is likely to continue and may even expand to the realm of accreditation and recertification. In the medical student world, some predict that surgery “aptitude tests” may gain popularity and help students struggling with their decision to pursue surgery as a career. With regard to the future of human factors research pertaining to surgery, platforms involving wearable sensors that provide an assessment of surgeon ergonomic risk are currently being developed with the goal of mitigating that risk. As simulator technology and AI progress, tools that identify and mitigate surgical errors, minimize OR distractions, and integrate virtual coaching platforms into simulation training could be developed to ameliorate human factors in surgery and enhance technical and team-based training for surgeons. Lastly, it is expected that simulation will aid in the field of outcomes research. Outcomes research has already studied a variety of personal variables and their impact on outcomes in surgery. Such examples include surgeon age, years out of training, among others. It is not unfeasible that, as quantitative proficiency data becomes available, surgical outcomes will be studied using this variable (Aebersold, 2016; Stefandis et al., 2019). Using clinical performance and patient outcomes for practicing surgeons to inform simulation training and recertification and vice versa is vital to quality improvement in the field of surgery moving forward.

CONCLUSION Simulation has clearly brought a variety of advantages to surgical training. Among these are the ability to practice complex surgical skills without risking harm to living people, the ability to quantitate performance and therefore optimize performance and reduce bias, the ability to hone skills in a low-stress environment, and the ability to gain competency via standardized curricula, among others. Simulation in surgery is in its growing stages and still has room to advance and become an even more integral part of surgical education. While limitations exist in the fidelity of simulators due to the inherent complexity of simulating living systems reliably, these limitations are constantly being superseded. With this in mind, we may soon enter an era where the greatest limitation of a simulator is the imagination to create it.

270

Human Factors in Simulation and Training

REFERENCES Advanced Trauma Operative Management. 2021; Available from: https://www​.facs​.org​/ quality​-programs​/trauma ​/education ​/atom. Aebersold, M., The history of simulation and its impact on the future. AACN Adv Crit Care, 2016. 27(1): p. 56–61. Ali, J., et al., The advanced trauma operative management course in a canadian residency program. Can J Surg, 2008. 51(3): p. 185–189. Andersen, S.A.W., et al., Use of generalizability theory for exploring reliability of and sources of variance in assessment of technical skills: A systematic review and meta-analysis. Acad Med, 2021. 96(11): p. 1609–1619. Anton, N.E., et al., Application of mental skills training in surgery: A review of its effectiveness and proposed next steps. J Laparoendosc Adv Surg Tech A, 2017. 27(5): p. 459–469. Anton, N.E., et al., Mental skills training limits the decay in operative technical skill under stressful conditions: Results of a multisite, randomized controlled study. Surgery, 2019. 165(6): p. 1059–1064. Athanasiadis, D.I., et al., An analysis of the ergonomic risk of surgical trainees and experienced surgeons during laparoscopic procedures. Surgery, 2021. 169(3): p. 496–501. Bartlett, J., et al., ACS/APDS surgery resident skills curriculum. 2017 [cited 2021 4/14/21]; Available from: https://www​.facs​.org​/education​/program​/resident​-skills. Bereiter, C. and M. Scardamalia, Surpassing Ourselves: An Inquiry into the Nature and Implications of Expertise. 1993. Chicago: Open Court. Borggreve, A.S., et al., Simulation-based trauma education for medical students: A review of literature. Med Teach, 2017. 39(6): p. 631–638. Catanzarite, T., et al., Ergonomics in surgery: A review. Female Pelvic Med Reconstr Surg, 2018. 24(1): p. 1–12. Cook, D.A. and R. Hatala, Validation of educational assessments: A primer for simulation and beyond. Advances in Simulation, 2016. 1(1): p. 31. Deshauer, S., et al., Mental skills in surgery: Lessons learned from virtuosos, olympians, and navy seals. Ann Surg, 2019. 274(1): p. 195–198 Dreyfus, H.L. and S. Dreyfus. Mind Over Machine: The Power of Human Intuition and Expertise in the Era of the Computer. 1986. New York: Free Press. p. 250. Flin, R., G.G. Youngson, and S. Yule, Enhancing Surgical Performance: A Primer in Nontechnical Skills. 2015. Boca Raton, FL: CRC Press. Franklin, B.R., et al., Piloting the FIRE: A novel error management training simulation curriculum for fasciotomy instruction. J Surg Educ, 2021. 78(2): p. 655–664. Fundamentals of Laparoscopic Surgery. 2021a [cited 2021 4/14/21]; Available from: https:// www​.flsprogram​.org/. Fundamentals of Robotic Surgery. 2021c [cited 2021 4/14/21]; Available from: https:// frsurgery​.org/. Gala, S.G., et al., Use of animals by NATO countries in military medical training exercises: An international survey. Mil Med, 2012. 177(8): p. 907–10. Gawad, N., et al., The inter-rater reliability of technical skills assessment and retention of rater training. J Surg Educ, 2019. 76(4): p. 1088–1093. Goldberg, M.B., et al., Optimizing performance through stress training - An educational strategy for surgical residents. Am J Surg, 2018. 216(3): p. 618–623. Gomez, P.P., et al., External validation and evaluation of an intermediate proficiency-based knot-tying and suturing curriculum. J Surg Educ, 2014. 71(6): p. 839–845. Gonzalez, R.I., et al., Technical skills rotation for general surgery residents. J Surg Res, 2010. 161(2): p. 179–82.

Best Practices in Surgical Simulation

271

Greenberg, C.C., et al., Surgical coaching for individual performance improvement. Ann Surg, 2015. 261(1): p. 32–4. Greenberg, C.C., J. Dombrowski, and J.B. Dimick, Video-based surgical coaching: An emerging approach to performance improvement. JAMA Surg, 2016. 151(3): p. 282–283. Hassan, I., et al., Negative stress-coping strategies among novices in surgery correlate with poor virtual laparoscopic performance. Br J Surg, 2006. 93(12): p. 1554–1559. Holmboe, E.S., et al., The role of assessment in competency-based medical education. Med Teach, 2010. 32(8): p. 676–82. Honarmand, K., et al., Adherence to advanced cardiovascular life support (ACLS) guidelines during in-hospital cardiac arrest is associated with improved outcomes. Resuscitation, 2018. 129: p. 76–81. Hudson, K. Prepare Your Graduating Students for Their New Responsibilities in Surgical Care. ACS/APDS/ASE Resident Prep Curriculum 2018a [cited 2021 September 3]; Available from: https://www​.facs​.org​/education​/program​/resident​-prep. Hudson, K. ACS/APDS Surgery Resident Skills Curriculum. 2018b [cited 2021 September 3]; Available from: https://www​.facs​.org​/education​/program​/resident​-skills. Huffman, E., et al., Optimizing assessment of surgical knot tying skill. J Surg Educ, 2020. 77(6): p. 1577–1582. Huffman, E.M., et al., A competency-based laparoscopic cholecystectomy curriculum significantly improves general surgery residents’ operative performance and decreases skill variability: Cohort study. Ann Surg, 2021. 276(6): e1083–e1088. Johnston, M.J., et al., An overview of research priorities in surgical simulation: What the literature shows has been achieved during the 21st century and what remains. Am J Surg, 2016. 211(1): p. 214–225. Jonasson, O. and F. Kwakawa, Retirement age and the work force in general surgery. Ann Surg, 1996. 224(4): p. 574–582. Kleinman, M.E., et al., 2017  American heart association focused update on adult basic life support and cardiopulmonary resuscitation quality: An update to the American heart association guidelines for cardiopulmonary resuscitation and emergency cardiovascular care. Circulation, 2018. 137(1): p. e7–e13. Korndorffer, J.R. Jr., D. Stefanidis, and D.J. Scott, Laparoscopic skills laboratories: Current assessment and a call for resident training standards. Am J Surg, 2006. 191(1): p. 17–22. Korndorffer, J.R. Jr., et al., The American college of surgeons/association of program directors in surgery national skills curriculum: Adoption rate, challenges and strategies for effective implementation into surgical residency programs. Surgery, 2013. 154(1): p. 13–20. Kovatch, K.J., et al., Integrated otolaryngology-anesthesiology clinical skills and simulation rotation: A novel 1-month intern curriculum. Ann Otol Rhinol Laryngol, 2019. 128(8): p. 715–720. Macnamara, B.N., D.Z. Hambrick, and F.L. Oswald, Deliberate practice and performance in music, games, sports, education, and professions: A meta-analysis. Psychol Sci, 2014. 25(8): p. 1608–1618. McGaghie, W.C., et al., A critical review of simulation-based medical education research: 2003–2009. Med Educ, 2010. 44(1): p. 50–63. McLaughlin, C., et al., Multidisciplinary simulation-based team training for trauma resuscitation: A scoping review. J Surg Educ, 2019. 76(6): p. 1669–1680. Mizota, T., et al., Development of a fundamentals of endoscopic surgery proficiency-based skills curriculum for general surgery residents. Surg Endosc, 2020. 34(2): p. 771–778. Mohammad, A., F. Branicki, and F.M. Abu-Zidan, Educational and clinical impact of Advanced Trauma Life Support (ATLS) courses: A systematic review. World J Surg, 2014. 38(2): p. 322–329.

272

Human Factors in Simulation and Training

Naeem, N., et al., Exploring variability of teaching & supervision at clinical clerkship teaching sites. Pak J Med Sci, 2018. 34(2): p. 368–373. Norcini, J. and V. Burch, Workplace-based assessment as an educational tool: AMEE Guide No. 31. Med Teach, 2007. 29(9): p. 855–871. O’Brien, S.M., et al., Design and development of the BD Odon Device(TM): A human factors evaluation process. Bjog, 2017. 124(Suppl 4): p. 35–43. Okuda, Y., et al., The utility of simulation in medical education: What is the evidence? Mt Sinai J Med, 2009. 76(4): p. 330–343. Olasky, J., et al., ACS/ASE medical student simulation-based skills curriculum study: Implementation phase. J Surg Educ, 2019. 76(4): p. 962–969. Parham, G., et al., Creating a low-cost virtual reality surgical simulation to increase surgical oncology capacity and capability. Ecancermedicalscience, 2019. 13: p. 910. Park, A.E., et al., Intraoperative “micro breaks” with targeted stretching enhance surgeon physical function and mental focus: A multicenter cohort study. Ann Surg, 2017. 265(2): p. 340–346. Pena, G., et al., Nontechnical skills training for the operating room: A prospective study using simulation and didactic workshop. Surgery, 2015. 158(1): p. 300–309. Pradarelli, J.C., et al., Assessment of the Non-Technical Skills for Surgeons (NOTSS) framework in the USA. Br J Surg, 2020. 107(9): p. 1137–1144. Pucher, P.H., et al., Ward simulation to improve surgical ward round performance: A randomized controlled trial of a simulation-based curriculum. Ann Surg, 2014. 260(2): p. 236–243. Robertson, J.M., et al., Operating room team training with simulation: A systematic review. J Laparoendosc Adv Surg Tech A, 2017. 27(5): p. 475–480. Scott, D.J. and G.L. Dunnington, The new ACS/APDS skills curriculum: Moving the learning curve out of the operating room. J Gastrointest Surg, 2008. 12(2): p. 213–221. Seymour, N.E., et al., Virtual reality training improves operating room performance: Results of a randomized, double-blinded study. Ann Surg, 2002. 236(4): p. 458–463; discussion 463–464. Sloan, D.A., et al., Use of an Objective Structured Clinical Examination (OSCE) to measure improvement in clinical competence during the surgical internship. Surgery, 1993. 114(2): p. 343–350; discussion 350–351. Spanager, L., et al., Comprehensive feedback on trainee surgeons’ non-technical skills. Int J Med Educ, 2015. 6: p. 4–11. Sroka, G., et al., Fundamentals of laparoscopic surgery simulator training to proficiency improves laparoscopic performance in the operating room-a randomized controlled trial. Am J Surg, 2010. 199(1): p. 115–120. Stefanidis, D., et al., Simulation in surgery: What’s needed next? Ann Surg, 2015. 261(5): p. 846–853. Stefanidis, D., J.R. Korndorffer, and R. Sweet, Comprehensive Healthcare Simulation: Surgery and Surgical Subspecialties. 2019. Cham, Switzerland: Springer. Stefanidis, D., et al., Advanced modular manikin and surgical team experience during a trauma simulation: Results of a single-blinded randomized trial. J Am Coll Surg, 2021. 233(2): p. 249–260.e2. Sternbach, J.M., et al., Measuring error identification and recovery skills in surgical residents. Ann Thorac Surg, 2017. 103(2): p. 663–669. Sudan, R., et al., American college of surgeons resident objective structured clinical examination: A national program to assess clinical readiness of entering postgraduate year 1 surgery residents. Ann Surg, 2014. 260(1): p. 65–71. Tang, T.S. and E.P. Skye, When patients decline medical student participation: The preceptors’ perspective. Adv Health Sci Educ Theory Pract, 2009. 14(5): p. 645–653.

Best Practices in Surgical Simulation

273

Fundamentals of Endoscopic Surgery. 2021b [cited 2021 4/14/21]; Available from: https:// www​.fesprogram​.org/. Vaidya, A., et al., Current status of technical skills assessment tools in surgery: A systematic review. J Surg Res, 2020. 246: p. 342–378. Wee, I.J.Y., L.J. Kuo, and J.C. Ngu, A systematic review of the true benefit of robotic surgery: Ergonomics. Int J Med Robot, 2020. 16(4): p. e2113. Wetzel, C.M., et al., The effects of stress on surgical performance. Am J Surg, 2005. 191(1): p. 5–10. Witthaus, M.W., et al., Incorporation and validation of clinically relevant performance metrics of simulation (CRPMS) into a novel full-immersion simulation platform for nervesparing robot-assisted radical prostatectomy (NS-RARP) utilizing three-dimensional printing and hydrogel casting technology. BJU Int, 2020. 125(2): p. 322–332. Wright, Jr, J.R. and N.S. Schachar, Necessity is the mother of invention: William Stewart Halsted’s addiction and its influence on the development of residency training in North America. Can J Surg, 2020. 63(1): p. E13–e19. Wulf, G. and R. Lewthwaite, Optimizing performance through intrinsic motivation and attention for learning: The optimal theory of motor learning. Psychon Bull Rev, 2016. 23(5): p. 1382–1414. Yeh, D.H., K. Fung, and S. Malekzadeh, Boot camps: Preparing for residency. Otolaryngol Clin North Am, 2017. 50(5): p. 1003–1013. Zendejas, B., R.K. Ruparel, and D.A. Cook, Validity evidence for the fundamentals of Laparoscopic Surgery (FLS) program as an assessment tool: A systematic review. Surg Endosc, 2016. 30(2): p. 512–520. Ziv, A., et al., Simulation-based medical education: An ethical imperative. Acad Med, 2003. 78(8): p. 783–788.

10

Healthcare Simulation Methods A Multifaceted Approach Amy L. Hanson and Aaron W. Calhoun

CONTENTS Introduction............................................................................................................. 275 What is Healthcare Simulation?.................................................................... 277 Concepts in Healthcare Simulation......................................................................... 277 Simulation Fidelity........................................................................................ 277 Psychological Safety...................................................................................... 279 Locations and Modes of Healthcare Simulation.....................................................280 Applications of Healthcare Simulation................................................................... 282 Simulation for Practice and Learning............................................................ 282 Simulations with Reflective Debriefing............................................. 282 Rapid Cycle Deliberate Practice........................................................284 Mastery Learning............................................................................... 286 Just-in-Time Training........................................................................ 287 Choice of Training Type................................................................................ 288 Simulation for Evaluation and Testing.......................................................... 288 Systems Testing............................................................................................. 289 Conclusion and Future Directions.......................................................................... 289 References............................................................................................................... 290

INTRODUCTION Since ancient times, the art and practice of medicine has been grounded in the concept of primum non nocere: “first, do no harm” (Aggarwal et al., 2010). This phrase enjoins practitioners to weigh the possible harm of any intervention against its potential benefits. Despite the general acceptance of this moral adage, in 2000 the Institute of Medicine reported that up to 98,000 patient deaths per year in US hospitals were due to medical error (Kohn et al., 2000). The “see one, do one” approach to medical training that so long formed the core of educational practice in medicine was called into question by this data, forcing a reexamination of the degree to which inexperienced healthcare trainees practiced their craft directly on patients (Aggarwal et al., 2010). Here, simulation-based training entered as a logical solution, enabling DOI: 10.1201/9781003401353-10

275

276

Human Factors in Simulation and Training

trainees to gain needed skills while reducing the exposure of patients to risk and preventable harm. In fact, the idea of practicing medical procedures on inanimate objects before caring for human beings dates back thousands of years (Owen, 2012). During the Song dynasty in China in 1027, the imperial physician, Wang Wei-Yi, created bronze statues covered with small holes in the surface that were used to teach students of acupuncture about surface anatomy (Owen, 2012). The statues were then covered in wax and filled with liquid so that a drip off the end of a removed needle could help confirm appropriate anatomical placement of the acupuncture needle (Owen, 2012). In the late seventeenth century, fueled by sufficient concern over infant and maternal childbirth mortality rates and a need for increased training among midwives, Drs. Grégoires, a father and son team, produced a manikin of the pregnant female abdomen, then referred to as a “phantom” (Buck, 1991). Made of basket-weave and covered with oil-skin and cloths, the simulator was used to train midwives on birthing techniques after lecture education alone failed to translate to improved childbirth outcomes (Buck, 1991; Gardner & Raemer, 2008). Concern over the risk of harm and prioritization of patient safety sparked a resurgence of this practice in modern times. As this transition accelerated, changes also took place in our understanding of adult learning theory (Kolb, 1984). These theoretical developments argued that lectures (i.e., “telling people what they should know”) were not robust as an educational method in this population. Instead, the adult learner needs immersive and experiential learning that correlates with relevant real-world problems they have been charged to address. In 1984, Kolb, an American educational theorist, detailed how adult learning is an “experiential, conflict-filled process out of which the development of insight, understanding and skills come.” Kolb described a recurring cycle of learning that begins with the learner first having a new, concrete “real-world” experience that they wish to learn more about. This naturally leads to a second phase in which the experience is reviewed and reflected upon. Kolb suggests that this period of reflection is critical to learning, as it is here where learner “breakthroughs” can occur and mindsets can change. In the third phase, the learner generates new concepts and conclusions based on this reflection, solidifying what they have learned. Finally, in the fourth phase, the learner actively experiments with these conclusions, applying them via active experimentation. This, in turn, generates new experiences and observations, and the cycle is repeated in an ongoing fashion. Simulation was quickly recognized by the healthcare community as a training modality that naturally fit into this approach and provided needed opportunities for experiential learning (Aggarwal et al., 2010). By providing a simulated experience followed by a facilitated debriefing, learners are exposed to new situations (Phase 1 of the Kolb Cycle) and actively encouraged to reflect on them (Phase 2 of the Kolb Cycle; Kolb, 1984). Thus, learners can more carefully examine their own actions in a guided way and think critically about what was done well and how their actions may have differed from what needed to be done. Simulation also provides an environment in which active experimentation (Phase 4 of the Kolb Cycle) can occur at no risk to patients. By directly facilitating these aspects of the adult learning process, simulation has the potential to enhance learning far beyond what can be accomplished via

Healthcare Simulation Methods

277

traditional didactic approaches. The uptake of healthcare simulation has been extensive, and there are now more than 825 healthcare institutions formally providing simulation-based training worldwide (ssih​.o​rg). Technological advancements also formed a crucial part of this developmental process (Aggarwal et al., 2010). The earliest full-body manikin simulators were used in the 1960s for anesthesia (SimOne) and cardiology (Harvey) and for cardiopulmonary resuscitation and mouth-to-mouth training (Resusci-Anne; Cooper & Taqueti, 2008). In the 1980s, personal computer technology became more affordable and software more accessible to industries like aviation, space, military, and nuclear power. This paved the way for the extensive development of the techniques that later took place. In the 1990s, Dr. David Gaba, an important initial pioneer in the field, and colleagues created the first comprehensive anesthesia simulation modules known as CASE 1.2 (Comprehensive Anesthesia Simulation Environment). With this platform, what is now commonly referred to as high-fidelity healthcare simulation entered the field of medicine. As the field of anesthesia, and later other medical subspecialties, adopted simulation, “crew resource management” training concepts that had long been used in aviation were adapted to the healthcare environment. Renamed “crisis resource management” to better fit the healthcare environment, this educational approach was used in concert with high-fidelity simulation to improve teamwork during medical emergencies by providing practitioners with a series of conceptual steps that could be used to maximize teamwork in response to a simulated crisis. Since that time, healthcare simulation has further grown and evolved, and now embraces a wide array of technology and techniques that address almost every aspect of healthcare education.

What is Healthcare Simulation? While many definitions of healthcare simulation have been developed and/or used over the past decades, the Society for Simulation in Healthcare (SSH) defines healthcare simulation as “a technique that creates a situation or environment to allow persons to experience a representation of a real health care event for the purpose of practice, learning, evaluation, testing, or to gain understanding of systems or human actions” (Lioce et al., 2020). This generally agreed-upon definition is purposefully broad, deliberately reflecting the now widespread use of simulation-based methods in healthcare. The remainder of this chapter is structured to provide an overview of the concepts of fidelity and psychological safety, which are critical aspects of healthcare simulation, followed by a review of both the physical characteristics of healthcare simulation environments and how simulation is specifically applied in the healthcare environment for both teaching and assessment.

CONCEPTS IN HEALTHCARE SIMULATION Simulation Fidelity Fidelity refers to “the degree to which a simulation replicates the real-event and/or workplace” and is a universal aspect of all simulated experiences (Lioce et al., 2020).

278

Human Factors in Simulation and Training

As a construct, fidelity can be further divided into physical, cognitive and emotional elements of realism (Lioce et al., 2020). Physical realism refers to the physical properties of the manikin or equipment itself. For example, the manikin will have better physical realism if its weight is similar to that of a similarly sized human being (Dieckmann et al., 2007). Likewise, other physical aspects of the manikin, such as the force generated by moving its chest wall during compressions, the appearance of the materials composing its airway or the haptics (i.e. tactile perceptions) felt when placing a breathing tube into the trachea help to determine the physical realism for the learner. Despite their human shape and ever-improving technologies, existing manikins still possess unrealistic qualities – such as breath sounds that are still distinguishable from actual breath sounds and the obviously synthetic materials that comprise its “skin.” Physical realism also applies to the environment in which the simulation is conducted – whether in an actual medical workplace or a simulation lab – and how well the physical space replicates the actual work environment. If a simulation takes place in an emergency room but in an infrequently used back room as opposed to the actual resuscitation bay where people resuscitate actual patients, physical realism suffers to some extent. Cognitive realism concerns concepts, care decisions, and their relationships within the simulation (Dieckmann et al., 2007). For example, if severe hemorrhage occurs in a living patient, then a high heart rate and low blood pressure will follow over a fairly well-defined time course. If this anticipated time course is not roughly followed during a simulation, learners will perceive a gap in realism. It is important to note that the actual delivery mechanism of this information is irrelevant to cognitive realism. This means that the mode by which the team learns of the hemorrhage – whether through a labor-intensive process of concocting and applying simulated blood to the manikin versus viewing an image of a bleeding patient, or simply being told by the facilitator – has little bearing on the cognitive realism of the simulation. As long as the manikins’ response as it relates to vital signs, physical findings, or behavior follows what would physiologically occur in response to provided interventions, cognitive realism will be preserved. Emotional realism refers to the emotions, beliefs and self-awareness that participants directly experience during the simulation (Dieckmann et al., 2007). It represents the degree to which the simulation evokes the feelings or emotions that learners would expect to experience in a real situation (Rudolph et al., 2014). This aspect of fidelity is largely independent of the knowledge content of the case but can have significant impact on the learning that occurs. For this reason, it is often addressed upfront during debriefing in what is often termed a “reactions” phase, which explores the emotional experiences and reactions of participants (Eppich et al., 2015). During the initial uptake of simulation by the healthcare field, the fidelity of a simulation was felt to unilaterally correspond to the overall “realism” of a case and was seen as one of the most critical elements for effective learning. Under this assumption, healthcare simulation used the close replication of reality as a gold standard (Dieckmann et al., 2007). This view has been repeatedly and consistently challenged, however, over the intervening decades (PW, 1973; Hays & Singer, 1989; Jentsch et al., 2011). Studies have consistently failed to show a benefit to higher

Healthcare Simulation Methods

279

fidelity in terms of training outcomes (Issenberg et al., 2005). Furthermore, data also suggests that relatively low-fidelity simulations can also lead to effective learning (Beaubien & Baker, 2004; Salas & Burke, 2002). It has been noted that participants experience a simulated scenario both as a complex real-time situation in which they interact with specific equipment, human actors and aspects of the environment, and as an educational event intended to approximate an actual clinical encounter. If participants appreciate how the simulated scenarios apply to clinical practice, they are likely to accept lower physical, cognitive, and emotional fidelity while still deriving educational benefit from the experience. Furthermore, it appears that the success of a simulation also depends on a host of other factors that extend past the fidelity or realism of the simulator or simulation event. The “social practice” of simulation, for example, plays an integral role in the process (Dieckmann, 2020). Social practice is defined as a “contextual event in time and space, conducted for one or more purposes, in which people interact in a goaloriented fashion with each other, with technical artifacts (the simulator) and with the environment (and relevant devices).” In terms of healthcare simulation, this term does not refer to team interactions within the simulation, but rather the larger scale interactions between learners and the entire simulation process. This includes the explicit learning objectives for the simulation as well as the learners’ understanding of them and affects how they choose to interpret what has transpired during the session (Dieckmann et al., 2007). As an example, prior to a simulation imagine one learner is overheard saying to another, “if we just get the patient intubated and don’t screw it up, maybe they’ll be satisfied.” In this context, the learner is aimed at pleasing the simulation facilitator and is underappreciating the true goal of the exercise, which is for their own professional learning and development. Simulation is truly a complex social endeavor (Dieckmann, 2020).

Psychological Safety Necessary to the appropriate social functioning of healthcare simulation is the establishment of a safe learning environment prior to engagement in the simulation exercise. Historically, most simulations address this in a “pre-briefing,” which is an “orientation session held prior to the start of a simulation activity in which instructions and preparatory information are given to the participants” (Lioce et al., 2020). During the pre-briefing, the stage is set for the learning experience by clarifying the goals and expectations for the session and attending to logistical details (Rudolph et al., 2014). The pre-brief may also include an explanation of the strengths and weaknesses of simulation and what participants can do to get the most out of the simulated clinical experiences. This commonly involves invoking a “fiction contract,” whereby participants are asked to behave as if the situation is real (Gardner & Raemer, 2008; Rudolph et al., 2014). In doing so, the merits of simulation may be more fully realized. While these are some well-attested ways of addressing these simulation issues in pre-briefing, other practical approaches do exist. In the final portion of the pre-briefing (or, in some approaches, just prior to the debriefing), the facilitator commits to respecting the learners and their psychological

280

Human Factors in Simulation and Training

safety and confidentiality (Rudolph et al., 2014). Here the participants are encouraged to share their thoughts and questions about the simulation and debriefing and are reassured that they will not be chastened or humiliated in the process. This sets up the team for risk-taking in the name of learning, which is vital given that their professional skills are in many ways on display during the event. The facilitator will often also concede that the simulation can only mimic reality to a certain point and acknowledges the limitations of the simulation modality being used, which can help normalize any fidelity-related issues that may arise. Clearly voicing a commitment to the participants can also help curb counterproductive defense mechanisms, such as blaming perceived lapses in simulation realism as a reason for poor performance. When participants perceived the ground rules are fair, they are often more willing both to engage with the learning objectives and critically reflect on their own performance (Calhoun et al., 2020a, 2020b). The dual issues of fidelity and psychological safety can intersect in complex ways when simulations containing emotionally difficult subject matter, in particular patient death, are in view (Calhoun et al., 2015; Truog & Meyer, 2013). Leighton et al. propose a helpful heuristic that distinguished between situations in which manikin death is both planned by facilitators and explicitly revealed to learners prior to the case, situations in which the death is planned but learners are unaware of this, and situations in which manikin death is not initially planned but emerges instead as a consequence of learner action or inaction (Leighton, 2009). Each situation requires a different approach to assure psychological safety. Manikin death due to learner action or inaction remains somewhat controversial. Much of the current literature on this subject focuses on the interaction between the stress of the event and eventual knowledge and skill retention, with mixed results (Heller et al., 2016; Fraser et al., 2014; Bryson & Levine, 2008; Demaria et al., 2010; Lizotte et al., 2015; Phrampus & Cole, 2005). Some of this literature, noting that learning seems in some cases to improve when a close relationship exists between learner actions and outcome, further suggest that manikin  death due to learner actions may enhance both fidelity and the learner’s sense of agency within the simulation (Goldberg et al., 2017; Calhoun & Gaba, 2017). Learner experience also appears to play a role in the appropriateness of the technique. A number of cognitive models have recently been proposed that attempt to more comprehensively represent both learning and emotional stress under these conditions (Tripathy et al., 2016; McBride et al., 2017).

LOCATIONS AND MODES OF HEALTHCARE SIMULATION Simulation centers became popular and were invested in greatly starting in the 1990s as “schools” that offer simulation training as the primary teaching modality (Lateef et al., 2021). These centers may be particularly useful in the training and testing of newer trainees. With time, however, the benefits of in-situ simulation, or “simulations that take place in the actual patient care settings/environment in an effort to achieve a high level of fidelity and realism” (Lioce et al., 2020) have been increasingly appreciated. In-situ simulation is also valuable when the goal is not primarily

Healthcare Simulation Methods

281

to educate, but instead to assess, troubleshoot, or develop new systems processes, as the actual work environment is on display. This offers the ability to detect latent safety threats and alter logistical details or system operations. Medicine makes use of a variety of types of simulator modalities, including manikin-based, task trainers, standardized patients, screen-based simulation (i.e., distance simulation), and virtual reality. Manikin-based simulation was the earliest adopted way of conducting healthcare simulation. Low-fidelity manikins are static replicas of the human form that are capable of only the most basic movements, such as chest rise when ventilated, or chest recoil during cardiopulmonary resuscitation. Despite this simplicity, they can be quite effective tools on which to practice rescue breathing, bag-valve mask ventilation, or cardiopulmonary resuscitation. Higherfidelity manikins, in contrast, contain computer-driven mechanisms that can dynamically display heart sounds, lung sounds, pulses, skin perfusion assessments, pupil responsiveness, and electrocardiographic waveforms. Newer manikins can even speak with increasing realism, exhibit varied facial expressions, cry, mimic stroke symptoms, and be realistically intubated. An additional simulator type that is closely related to the manikin  is the task trainer. These replicate only one part of the body (such as an arm, wrist, or spinal column) and are typically used for procedural skills training (surgical technique, lumbar puncture, central line placement, etc.). Task trainers are often simple in construction and rely on specialized materials to replicate the feel and pliability of specific human tissues. As point-of-care ultrasound has grown in popularity, many task trainers also incorporate materials capable of transmitting ultrasound waves to allow practice of this diagnostic modality as well. A third modality is the standardized patient (SP). SP simulations are simulations “using a person or persons trained to portray a patient scenario or actual patient(s) for health care education” (Lioce et al., 2020). An SP may also be called upon to act as the parent of a patient during a simulation. SPs, as living humans, can provide direct feedback to learners in a way that artificial manikins cannot, especially regarding relational skills such as history-taking and physical examination. SPs receive training to assure that this feedback is focused and of high quality, and their commentary can thus significantly surpass the feedback given during face-to-face interaction with an actual patient in quality. SPs have been shown to be effective in the rehearsal of difficult conversations such as disclosing a medical error or breaking bad news (Borghi et al., 2021; Bell et al., 2014; Meyer et al., 2009; Peterson et al., 2012, 2021). Other simulator modalities include screen-based forms of healthcare simulation such as serious-games and virtual hospitals (Bracq et al., 2019). Virtual reality makes use of specialized goggles and haptic feedback devices that allow the learner to engage in a fully virtual setting where medical care or procedures can be performed. While virtual reality can take significant initial investment, ongoing costs are relatively low when compared to manikin-based simulation, and it is thus increasingly seen as a lower-cost alternative to manikin-based simulation. Medicine is also making use of augmented reality, where virtual constructs are visually overlaid on a real environment to assist in the educational process. An example of this is a recently developed birthing simulator manikin which can overlay virtual reality

282

Human Factors in Simulation and Training

components onto the manikin using virtual reality goggles. This allows the provider to “see inside” the pregnant woman’s abdomen and directly observe the effects of the mother’s condition, and their actions, on the baby within (CAE, 2021). Finally, the Covid-19 pandemic has posed significant challenges to current simulation practices, which had previously required people to gather in close groups for education. New social distancing requirements drove a proliferation of tele-simulation, or distance simulation techniques, in which groups of learners and facilitators were brought together using video conferencing platforms to focus on the care of a simulated patient (Wagner et al., 2020). This is typically accomplished by either streaming live, on-site manikin interactions to remote participants, or by creating an entirely computer-based educational environment via photographs, videos, and vital sign display software that can be displayed using remote conferencing software (Gross et al., 2020). The lessons learned from this have opened up an array of new educational possibilities, especially for remote or low-resource environments, or for situations where physical presence or space is not necessary for learning to occur (Patel et al., 2020). One notable challenge involved in carrying out tele-simulation is the constraint that is imposed on learner engagement by physical separation (Cheng et al., 2020). Debriefing, in particular, relies heavily on the ability to form and maintain relationships, which can be difficult to do under these circumstances. Debriefing successfully under these conditions requires the establishment of a culture of safety, confidentiality, and openness to self-reflection within the learner group. The educational theories and approaches that can better define the best practices involved in doing tele-simulation optimally are actively being studied and tested.

APPLICATIONS OF HEALTHCARE SIMULATION Healthcare simulation is typically conducted either for practice and learning or for evaluation and assessment. Simulation for practice and learning embraces four basic educational approaches: Reflective Debriefing (i.e., a standard sequence of pre-briefing, simulation and debriefing), Rapid Cycle Deliberate Practice, Mastery Learning, and Just-in-Time training. Key approaches to simulation for evaluation and assessment focus primarily on the object of assessment: the evaluation and assessment of learners or the evaluation/assessment of system operations or processes. The remainder of the chapter will address each of these in turn.

Simulation for Practice and Learning Simulations with Reflective Debriefing The majority of this chapter thus far has focused on “traditional” simulations that begin with a pre-brief phase, proceed to an immersive simulated patient event, and end with a subsequent facilitated team debriefing. Given that a great deal of this has already been discussed, we will focus here on the debriefing process. Debriefing is a critical component of this approach, as it allows for the clarification and consolidation of insights and lessons learned from the simulation. Debriefing is a “conversation between two or more people to review a simulated event or activity

Healthcare Simulation Methods

283

in which participants explore, analyze, and synthesize their actions and thought processes, emotional states and other information to improve performance in real situations” (Brett-Fleegler et al., 2012). Using this approach, debriefing is held in a separate space from the simulated patient event itself to allow for adequate psychological distance from the event. Participants are made to feel comfortable and psychologically safe to share their observations and personal analysis, and to give and receive feedback. High participant engagement is necessary for transfer of knowledge and skills to the real clinical setting, and the facilitator plays a key role in assuring this. A well-conducted debriefing directs learners to consider the frames, or “mental models,” that shape their actions. For example, if an anesthesiologist holds the frame that “I must have both a bag-valve mask and oxygen source to ventilate this patient” (the standard way of providing assisted respirations in a hospital environment), they may, when presented with a scenario where a patient stops breathing and there is no oxygen source readily available, search relentlessly for an oxygen hookup or tank, during which time the patient may experience severe hypoxemia and eventual cardiac arrest. Exploring the mental model that led to this action can help uncover opportunities for changing practice and facilitate troubleshooting in real time. In this example, that might include expanding this frame to explicitly include providing bagvalve mask ventilation with room air (a physiologically viable option) while another provider locates oxygen for the patient. This may also bring up opportunities to discuss what to do in the event that no bag-valve mask is available, such as providing mouth-to-mask ventilation or passive oxygenation, thereby expanding the learners’ frames of reference even further. Various debriefing tools have been developed to aid the facilitator in hosting a successful debriefing. The PEARLS tool is one such tool that focuses on a few main components (Bajaj et al., 2018). The initial reactions phase begins by allowing learners to express their emotions and feelings regarding the case. This both initiates the conversation and provided facilitators with information what should be debriefed and how it might best be approached. Then comes a descriptive phase in which facts of the case are clarified and the team develops a shared understanding of the diagnosis. This is particularly important as it sets the “ground truth” regarding the actual physiologic process that the simulator’s actions were intended to represent. The debriefing next proceeds to the analysis phase, in which the team discusses aspects of the case that were managed well and aspects of the case the team might want to change, along with accompanying rationales (the “plus/delta” approach). Learning points and takeaways are then summarized in the final application phase. Advocacy/Inquiry is another key technique used in the analysis phase. This approach uses probing questions to encourage learners to move beyond performance assessment to a deeper consideration of the overall frames of reference that contributed to their actions during the simulation. For example, the facilitator may say, “I noticed there was a 3-minute delay in recognizing that the patient had a shockable rhythm. To me it seemed like the teams’ attention was focused solely on intubating the patient and it was not clear that anyone noticed the rhythm change. How did the team see it?” By doing this, learners are encouraged to examine their overall approach to care, a process that can lead to ongoing reflection in real practice (Rudolph, 2014).

284

Human Factors in Simulation and Training

Rapid Cycle Deliberate Practice Rapid Cycle Deliberate Practice (RCDP) is an alternative educational approach that builds on the concept of “deliberate practice” described by Ericsson et al. (Perretta et al., 2020; Hunt et al., 2014; Ericsson, 2006). In medicine, physicians aspire to be “experts” in their chosen field, which Ericsson defines as someone “able to perform at virtually any time with relatively limited preparation.” Deliberate Practice is a “systematically designed activity that has been created specifically to improve an individual’s performance in a given domain” (Ericsson & Harwell, 2019). Traditional notions of professional expertise pre-supposed it was directly linked to length of experience, reputation, and perceived mastery (Ericsson et al., 1993). Research, however, has subsequently demonstrated that only weak relationships exist between these factors and actual, observed performance and skill (Ericsson, 2008). While deliberate practice was initially studied with regard to the purposeful, intentional training and coaching of musicians, athletes and chess players, the merits of the technique translate well to many other areas, including medicine. Hunt et al. coined the term RCDP in 2014, describing it as a “learner-centered simulation instructional strategy that identifies performance gaps and targets feedback to improve individual or team deficiencies” (Perretta et al., 2020; Hunt et al., 2014). The first principal of RCDP is to maximize the time learners spend deliberately practicing an activity or procedure, which allows for the timely correction of “bad habits/mistakes.” Participants are given multiple opportunities to “do it right,” through repeated rehearsals and overlearning to create procedural memories (i.e., “muscle memory”). of what things feel like when performed correctly. The second principle of RCDP is the opportunity for problem-solving and evaluation, aided by the instructors providing specific evidence-based or expert-derived solutions for common problems. New information is presented in smaller, bite-sized chunks for incorporation and reinforcement of learning in real time. The third principle involves fostering psychological safety so learners can embrace direct feedback and incorporate it into the next iteration of practice while refining their skill and avoiding defensiveness that can limit their growth and learning (Hunt et al., 2014). Confidence is instilled in a safe learning environment where anxiety is reduced, open communication is promoted, and confidence built. Elements of Deliberate Practice are listed in Table 10.1 (McGaghie et al., 2011). In healthcare, RCDP has been successfully used to improve procedural competence, educate learners in crisis resource management principles, and teach the skills needed to manage complex medical and traumatic emergencies (Perretta et al., 2020). RCDP now forms the conceptual backbone of many simulationbased learning curricula in emergency medicine, pediatric and adult medicine, neonatology, and critical care. RCDP has also been shown to measurably improve the quality and timeliness of cardiopulmonary resuscitation (CPR) and defibrillation, in novice providers’ adherence to an intubation checklist, and the ability of emergency physicians to compassionately engage in difficult conversations. Table 10.2 lists situations in which RCDP may be of particular utility (Perretta et al., 2020).

Healthcare Simulation Methods

285

TABLE 10.1 The Elements of Deliberate Practice

1. Highly motivated learners 2. Well-defined learning objectives 3. Appropriate level of difficulty 4. Focused, repetitive practice 5. Rigorous measurements 6. Informative feedback 7. Monitoring, error correction and more deliberate practice 8. Performance evaluation toward mastery standards 9. Advancement toward the next task

Source: Barsness, K.A. Achieving expert performance through simulation-based education and application of mastery learning principles. (2020). Seminars in Pediatric Surgery, 29(2), 150904.

TABLE 10.2 Circumstances in Which RCDP May Have a Favorable Impact on Clinical Performance 1. Existing, well-established performance guidelines: The Institute of Medicine recommends establishing performance standards to minimize risk of harm to patients. These can originate from national guidelines, institutional protocols, or expert consensus. Using standards provides an objective way to measure learner performance and an ability to build prescriptive feedback. 2. A need for learners to master key behaviors: RCDP continuously provides formative testing of learner performance against established standards. Instructors will not advance to the next learning objective until learners achieve the current objective. This describes several of the complementary features of mastery learning incorporated into RCDP. 3. Limited teaching time: Published RCDP studies show that it allows learners to master a large amount of content within one standard time frame, making it a good option when learners have a short time to master a topic, or if it is a stand-alone or self-sustaining course. 4. Low-volume, high-risk, time-sensitive events: RCDP has been associated with improved performance during simulated low-volume, high-risk, time-sensitive events. 5. Team situations requiring or benefiting from specific scripting and/or choreography: RCDP can facilitate utilization of shared mental models for patient assessment, explicit choreography for patient management, role delineation, shared language, and interdisciplinary procedural training. Source: Perretta, J. S., Duval-Arnould, J., Poling, S., Sullivan, N., Jeffers, J. M., Farrow, L., Shilkofski, N. A., Brown, K. M., & Hunt, E. A. (2020). Best practices and theoretical foundations for simulation instruction using Rapid-Cycle Deliberate Practice. Simul Healthc, 15(5), 356–362.

286

Human Factors in Simulation and Training

A commonly used strategy in RCDP is to present an initial scenario that unfolds in an uninterrupted manner (Perretta et al., 2020). This allows facilitators to identify performance gaps that they can then focus on during subsequent parts of the session. For example, in the case of a sudden cardiac arrest, the team can be assessed on their ability to assess for a pulse, initiate CPR, place pads, and defibrillate for a shockable rhythm during an initial simulation. This data can then be used to identify gaps in clinical skill that can shape the focus of subsequent iterative simulations conducted during the same session. Mastery Learning Mastery learning is founded on John Carroll’s 1963 work on educational theory (Carroll, 1963 1). Mastery learning is a curriculum style grounded on two fundamental beliefs (Eppich, 2015). First, all learners can and will achieve a uniform performance goal. Second, it will take some learners longer than others to achieve these goals. Given enough time, a mastery learning approach believes that every learner can achieve proficiency standards and competence in any given skill. While one learner may achieve these objectives in 20 minutes, it may take another learner many hours to accomplish the same task. Both, however, can and will reach this endpoint given sufficient opportunity. Mastery learning is thus a logical development of the deliberate practice approach, as it pairs repetitive deliberate practice and robust feedback that compares learner performance with an externally determined standard (Block & Airasian, 1971). Here the curriculum and learner performance are fixed and time to achieving mastery performance is variable. This can be challenging to accommodate in the field of medicine, however, as educational sessions often have durations that are predetermined by the ability of the instructors and learners to free themselves from clinical duties. Table 10.3 lists key elements of mastery learning (McGaghie et al., 2015). Paramount to achieving mastery learning goals is understanding how to structure the feedback and debriefing that the learner receives (Eppich & Cheng, 2015). For mastery learning to work, the feedback and debriefing given to learners needs

TABLE 10.3 Elements of Mastery Learning

1. Baseline testing/assessments 2. Clear learning objectives, sequenced as units of increasing difficulty 3. Engagement in educational activities that are focused on the learning objectives 4. A set of proficiency standards for each educational unit 5. Formative testing with feedback to assess completion of each unit 6. Advancement to the next educational unit once mastery standard achieved in current unit 7. Continued practice until mastery standards achieved for all units

Source: McGaghie, W.C. (2015). Mastery learning: It is time for medical education to join the 21st century. Acad Med, 90(11), 1438–1441.

Healthcare Simulation Methods

287

to be interspersed throughout the educational session so that it can inform practice and guide ongoing performance improvement efforts. While traditionally structured simulation tends to make use of longer debriefing sessions, mastery learning instead focuses on “micro-debriefings” either within the event (reflecting in-action) or after a short event (reflecting on-action). This approach allows for quick diagnosis of skills that require improvement so that they can be highlighted and focused on in subsequent simulation cycles. There is increasing evidence that mastery learning has the ability to improve clinician performance (Barsuk et al., 2016; Eppich et al., 2015). It has shown benefit in advanced cardiac life-support and procedural skills training for procedures such as central line placement and boot camps for training newly graduated intern physicians (Cook et al., 2013; McGaghie et al., 2014; Wayne et al., 2006; Barsuk et al., 2009a, 2009b; Cohen et al., 2013). Mastery learning lends itself well to all types of procedures, such as suturing and surgical technique, intubation, endoscopy, and lumbar puncture, among others (Ritter et al., 2018; Gabrysz-Forget et al., 2020; Franklin et al., 2018; Dyke et al., 2021; Barsuk et al., 2012). The uptake of mastery learning techniques is likely to grow as it fits in well with competency-based medical education. Mastery learning also offers opportunities for stress inoculation training. In stress inoculation training, learners are exposed to graded levels of challenges and stresses that mimic a real environment (Chang et al., 2020). This concept was borrowed from military training and is based on the idea that repeated exposure at higher levels of stress and fidelity, growing closer at each phase to the level of stress that might be experienced in reality, can improve downstream performance (Lauria et al.2017). For example, suturing silicone skin in a quiet classroom is quite different from repairing a lip laceration on a wiggling, crying, biting toddler, but by providing sequential simulations of the laceration repair process with ever-escalating levels of environmental stress, the clinician will ultimately be better able to care for such a child in a busy emergency department with parents and other medical staff observing. Just-in-Time Training Just-in-time training (JITT) is “a method of training that is conducted directly prior to a potential intervention” (Kamdar et al., 2013). The basic concept is that by providing simulation-based practice immediately prior to the performance of a necessary procedure, the ultimate success of that procedure can be enhanced. This approach has been shown to be successful in such disparate procedures as CPR and lumbar puncture (Niles et al., 2017; Niles et al., 2009; Kessler et al., 2015). The benefits of JITT include a review of appropriate anatomic landmarks, an opportunity to rehearse the procedure, and ask questions. While this approach can occasionally be difficult to implement given the time constraints often present in healthcare environments, the popularity of this approach continues to grow. During the Covid-19 pandemic, JITT training was successfully used to guide workflow design for difficult airway management for experienced providers and has also been paired with RCDP to improve pediatric intensive care unit team proficiency in identifying and managing postoperative shock in congenital heart disease (Daly Guris et al., 2020; Brown, 2021).

288

Human Factors in Simulation and Training

Choice of Training Type Given these options, simulation-based medical educators are then faced with the decision of which to use, and when. While the right choice will be highly dependent on local circumstances, a consideration of the learning objectives, available time, and needed level of post-simulation skill can be helpful. For simulations that focus primarily on critical thinking during complex situations or organizing large, functional teams, the reflective debriefing approach may be best given its focus on uncovering deep frames of reference. When, on the other hand, procedural skills or more straightforward psychomotor tasks (such as cardiopulmonary resuscitation) are in view, some combination of RCDP and mastery learning may be best. The degree to which the mastery learning paradigm can be fully implemented will also depend, however, on the available time. Finally, JITT is best chosen when the need for swift, education of immediate clinical relevancy (such as the hour prior to the arrival of a postoperative cardiac patient with a high risk for physiologic decompensation) is in view.

Simulation for Evaluation and Testing Before simulation became readily available, new techniques and procedures were studied and then adopted and performed using consenting patients in a specialized testing environment (Aggarwal et al., 2010). An example of this occurred during the rapid uptake of laparoscopic surgery for cholecystectomy. Despite virtually no prior training in laparoscopic technique, many surgeons were performing this new surgical approach directly on their patients, leading to an unintentional increase in morbidity and mortality. The use of simulated environments now provides an opportunity, however, to evaluate and test learners on their competencies prior to employing them on real patients. Simulation-based learner assessment, like all forms of evaluation, can be divided into two basic subtypes: Formative and Summative (Calhoun et al., 2016; Watling, 2019; Boulet, 2008). Formative assessment often occurs during a course of study, and enables facilitators to provide focused feedback that learners can use to alter their future practice. Summative assessment, on the other hand, typically occurs after a course of study and is used to determine whether a learner is capable of practicing a certain skill independently or requires remediation. A growing number of institutions are now using simulation to credential providers in procedures that are rarely encountered in clinical practice but that they must be able to perform should the need arise. In addition, the United States Medical Licensing Exam included, until recently, a simulation-based summative assessment intended to evaluate doctor-patient interactions (Scott et al., 2019; Ali, 2020). Finally, it is important to note that simulation-based learner assessment, like all forms of learner assessment, must be shown to be both valid and reliable for the decision it is intended to assist. A number of validity frameworks currently exist that can assist in this process, but their exploration is beyond the scope of this chapter (Downing, 2003; Cook et al., 2015; Tavares et al., 2018; Calhoun & Scerbo, 2022).

Healthcare Simulation Methods

289

Systems Testing In addition to assessing individual providers, simulation can be used for the evaluation of systems of care. The Institute of Medicine has emphasized that many medical errors are system-related and not attributable to individual negligence or misconduct. Systems testing using simulation-based evaluative methods has been shown to reduce error rates, improve the quality of care provided, and drive changes in the healthcare environment that have tangible effects on patient outcomes (Imach et al., 2020). Given that this approach evaluates a clinical environment in real time, the simulation most often takes place in situ (i.e., within actual patient care area). This enables simulation facilitators to directly assess how provider teams and systems of care interact. Applications of simulation for systems testing include the use of simulation to diagnose potential safety issues in a new hospital or medical facility prior to opening, or to identify latent safety issues in currently operational units and correct them before they have an impact on patient care (Hebbar et al., 2018). As an example, consider an in-situ code blue simulation conducted in a new hospital ward that is about to open. Prior to engaging in the simulation, the care team was instructed to utilize all systems of care, as they would in an actual emergency, in order to be sure that the space was safe to receive patients. During the case, a nurse discovers that a button on the wall used to activate the code blue process is not working correctly. Systems-testing simulations can also be used to establish the point prevalence of care patterns across institutions or trial new care processes (Maa et al., 2020). One recent study utilized this methodology to assess the rates of common errors in anaphylaxis management across an array of pediatric institutions, while others have described the use of this approach to prepare institutions to care for patients with highly infectious conditions such as Ebola and, more recently, Covid-19 (ShararaChami et al., 2020; Gaba, 2014; Biddell et al., 2016; Phrampus, 2016; Lie et al., 2020). Finally, this modality can also be used to evaluate the protocols and procedures for disaster preparedness and mass casualty protocols (Gardner, 2016; Jung et al., 2016; Castoldi et al., 2020; Jorm et al., 2016). The use of simulation as a means of diagnosing systemic issues in healthcare practice has a wide range of potential applications and is in a state of rapid growth. In the future, we expect this approach to have a broad, positive impact on patient safety efforts worldwide.

CONCLUSION AND FUTURE DIRECTIONS Once a relatively limited niche within healthcare education, simulation has now become an integral aspect of the modern healthcare professionals’ educational and patient safety armamentarium. Significant evidence demonstrates that its use can help achieve higher provider competence and safer care. The recent Covid-19 pandemic has also spurred a rapid shift to distance simulation modalities in a relatively short time, and the healthcare simulation community is currently working to better understand how this family of approaches functions both educationally and as an assessment approach. Finally, the advent of (relatively) lower-cost virtual and

290

Human Factors in Simulation and Training

augmented reality approaches promises to radically transform the ways in which we integrate simulation-based approaches into healthcare education and practice.

REFERENCES Aggarwal, R., Mytton, O. T., Derbrew, M., Hananel, D., Heydenburg, M., Issenberg, B., MacAulay, C., Mancini, M. E., Morimoto, T., Soper, N., Ziv, A., & Reznick, R. (2010). Training and simulation for patient safety. Qual Saf Health Care, 19 Suppl 2, i34–i43. https://doi​.org​/10​.1136​/qshc​.2009​.038562 Ali, J. M. (2020). The USMLE step 2 clinical skills exam: A model for OSCE examinations? Acad Med, 95(5), 667. https://doi​.org​/10​.1097​/ACM​.0000000000003183 Bajaj, K., Meguerdichian, M., Thoma, B., Huang, S., Eppich, W., & Cheng, A. (2018). The PEARLS healthcare debriefing tool. Acad Med, 93(2), 336. https://doi​.org​/10​.1097​/ ACM​.0000000000002035 Barsuk, J. H., McGaghie, W. C., Cohen, E. R., Balachandran, J. S., & Wayne, D. B. (2009a). Use of simulation-based mastery learning to improve the quality of central venous catheter placement in a medical intensive care unit. J Hosp Med, 4(7), 397–403. https:// doi​.org​/10​.1002​/jhm​.468 Barsuk, J. H., McGaghie, W. C., Cohen, E. R., O’Leary, K. J., & Wayne, D. B. (2009b). Simulation-based mastery learning reduces complications during central venous catheter insertion in a medical intensive care unit. Crit Care Med, 37(10), 2697–2701. https://www​.ncbi​.nlm​.nih​.gov​/pubmed​/19885989 Barsuk, J. H., Cohen, E. R., Caprio, T., McGaghie, W. C., Simuni, T., & Wayne, D. B. (2012). Simulation-based education with mastery learning improves residents’ lumbar puncture skills. Neurology, 79(2), 132–137. https://doi​.org​/10​.1212​/ WNL​.0b013e31825dd39d Barsuk, J. H., Cohen, E. R., Wayne, D. B., Siddall, V. J., & McGaghie, W. C. (2016). Developing a simulation-based mastery learning curriculum: Lessons from 11 years of advanced cardiac life support. Simul Healthc, 11(1), 52–59. https://doi​.org​/10​.1097​/SIH​ .0000000000000120 Beaubien, J. M., & Baker, D. P. (2004). The use of simulation for training teamwork skills in health care: How low can you go? Qual Saf Health Care, 13 Suppl 1, i51–56. https://doi​ .org​/10​.1136​/qhc​.13​.suppl​_1​.i51 Bell, S. K., Pascucci, R., Fancy, K., Coleman, K., Zurakowski, D., & Meyer, E. C. (2014). The educational value of improvisational actors to teach communication and relational skills: Perspectives of interprofessional learners, faculty, and actors. Patient Educ Couns, 96(3), 381–388. https://doi​.org​/10​.1016​/j​.pec​.2014​.07​.001 Biddell, E. A., Vandersall, B. L., Bailes, S. A., Estephan, S. A., Ferrara, L. A., Nagy, K. M., O’Connell, J. L., & Patterson, M. D. (2016). Use of simulation to gauge preparedness for Ebola at a free-standing children’s hospital. Simul Healthc, 11(2), 94–99. https://doi​ .org​/10​.1097​/SIH​.0000000000000134 Block, J. H., & Airasian, P. W. (1971). Mastery learning: Theory and practice. Holt. Borghi, L., Meyer, E. C., Vegni, E., Oteri, R., Almagioni, P., & Lamiani, G. (2021). Twelve years of the Italian Program to Enhance Relational and Communication Skills (PERCS). Int J Environ Res Public Health, 18(2). https://doi​.org​/10​.3390​/ijerph18020439 Boulet, J. R. (2008). Summative assessment in medicine: The promise of simulation for highstakes evaluation. Acad Emerg Med, 15(11), 1017–1024. https://doi​.org​/10​.1111​/j​.1553​ -2712​.2008​.00228.x Bracq, M.-S., Michinov, E., & Jannin, P. (2019). Virtual reality simulation in nontechnical skills training for healthcare professionals: A systematic review. Simulation in

Healthcare Simulation Methods

291

Healthcare: The Journal of the Society for Simulation in Healthcare, 14(3), 188–194. 10.1097/SIH.0000000000000347 Brett-Fleegler, M., Rudolph, J., Eppich, W., Monuteaux, M., Fleegler, E., Cheng, A., & Simon, R. (2012). Debriefing assessment for simulation in healthcare: Development and psychometric properties. Simul Healthc, 7(5), 288–294. https://doi​.org​/10​.1097​/SIH​ .0b013e3182620228 Brown, K. M., Mudd, S. S., Perretta, J. S., Dodson, A., Hunt, E. A., & McMillan, K. N. (2021). Rapid cycle deliberate practice to facilitate “Nano” in situ simulation: An interprofessional approach to just-in-time training. Crit Care Nurse, 41(1), e1–e8. https://doi​.org​/10​.4037​/ccn2021552 Bryson, E. O., & Levine, A. I. (2008). The simulation theater: A theoretical discussion of concepts and constructs that enhance learning. J Crit Care, 23(2), 185–187. https://doi​ .org​/10​.1016​/j​.jcrc​.2007​.12​.003 Buck, G. H. (1991). Development of simulators in medical education. Gesnerus, 48(1), 7–28. https://www​.ncbi​.nlm​.nih​.gov​/pubmed​/1855669 CAE, H. (2021). CAE Lucina Validated High-Fidelity Maternal/Fetal Training. https://www​ .caehealthcare​.com ​/patient​-simulation ​/ lucina/ Calhoun, A., Bhanji, F., Sherbino, J., & Hatala, R. (2016). Simulation for high-stakes assessment in pediatric emergency medicine. Clin Pediatr Emerg Med, 13(September), 212–223. Calhoun, A. W., & Gaba, D. M. (2017). Live or let die: New developments in the ongoing debate over mannequin death. Simul Healthc, 12(5), 279–281. https://doi​.org​/10​.1097​/ SIH​.0000000000000256 Calhoun, A. W., Pian-Smith, M., Shah, A., Levine, A., Gaba, D., DeMaria, S., Goldberg, A., & Meyer, E. C. (2020b). Guidelines for the responsible use of deception in simulation: Ethical and educational considerations. Simul Healthc, 15(4), 282–288. https://doi​.org​ /10​.1097​/SIH​.0000000000000440 Calhoun, A. W., Pian-Smith, M. C., Truog, R. D., Gaba, D. M., & Meyer, E. C. (2015). The importance of deception in simulation: A response. Simul Healthc, 10(6), 387–390. https://doi​.org​/10​.1097​/SIH​.0000000000000127 Calhoun, A. W., Pian-Smith, M. C., Truog, R. D., Gaba, D. M., & Meyer, E. C. (2015). Deception and simulation education: Issues, concepts, and commentary. Simul Healthc, 10(3), 163–169. https://doi​.org​/10​.1097​/SIH​.0000000000000086 Calhoun, A. W., & Scerbo, M. W. (2022). Preparing and presenting validation studies: A guide for the perplexed. Simul Healthc, 17(6), 357–365. Calhoun, A. P.-S. M., Shah, A., Levine, A., Gaba, D., DeMaria, S., Goldberg, A., & Meyer, E. (2020a). Exploring the boundaries of deception in simulation: A mixed methods study. Clin Simul Nurs, 40(March), 7–16. Caro, P. W. (1973). Aircraft simulators and pilot training. Hum Factors, 15, 502–509. Carroll, J. B. (1963). A model of school learning. Teach Coll Rec, 64, 723–733. Castoldi, L., Greco, M., Carlucci, M., Lennquist Montan, K., & Faccincani, R. (2020). Mass Casualty Incident (MCI) training in a metropolitan university hospital: Short-term experience with MAss Casualty SIMulation system MACSIM((R)). Eur J Trauma Emerg Surg https://doi​.org​/10​.1007​/s00068​- 020​- 01541-8 Chang, T. P., Hollinger, T., Dolby, T., & Sherman, J. M. (2020). Development and considerations for virtual reality simulations for resuscitation training and stress inoculation. Simul Healthc. https://doi​.org​/10​.1097​/SIH​.0000000000000521 Cheng, A., Kolbe, M., Grant, V., Eller, S., Hales, R., Symon, B., Griswold, S., & Eppich, W. (2020). A practical guide to virtual debriefings: Communities of inquiry perspective. Adv Simul (Lond), 5, 18. https://doi​.org​/10​.1186​/s41077​- 020​- 00141-1

292

Human Factors in Simulation and Training

Cohen, E. R., Barsuk, J. H., Moazed, F., Caprio, T., Didwania, A., McGaghie, W. C., & Wayne, D. B. (2013). Making July safer: Simulation-based mastery learning during intern boot camp. Acad Med, 88(2), 233–239. https://doi​.org​/10​.1097​/ACM​.0b013e31827bfc0a Cook, D. A., Brydges, R., Ginsburg, S., & Hatala, R. (2015). A contemporary approach to validity arguments: A practical guide to Kane’s framework. Med Educ, 49(6), 560–575. https://doi​.org​/10​.1111​/medu​.12678 Cook, D. A., Brydges, R., Zendejas, B., Hamstra, S. J., & Hatala, R. (2013). Mastery learning for health professionals using technology-enhanced simulation: A systematic review and meta-analysis. Acad Med, 88(8), 1178–1186. https://doi​.org​/10​.1097​/ACM​ .0b013e31829a365d Cooper, J. B., & Taqueti, V. R. (2008). A brief history of the development of mannequin simulators for clinical education and training. Postgrad Med J, 84(997), 563–570. https://doi​.org​/10​.1136​/qshc​.2004​.009886 Daly Guris, R. J., Doshi, A., Boyer, D. L., Good, G., Gurnaney, H. G., Rosenblatt, S., McGowan, N., Widmeier, K., Kishida, M., Nadkarni, V., Nishisaki, A., & Wolfe, H. A. (2020). Just-in-Time simulation to guide workflow design for coronavirus disease 2019 difficult airway management. Pediatr Crit Care Med, 21(8), e485–e490. https://doi​.org​ /10​.1097​/ PCC​.0000000000002435 Demaria, S. Jr., Bryson, E. O., Mooney, T. J., Silverstein, J. H., Reich, D. L., Bodian, C., & Levine, A. I. (2010). Adding emotional stressors to training in simulated cardiopulmonary arrest enhances participant performance. Med Educ, 44(10), 1006– 1015. https://doi​.org​/10​.1111​/j​.1365​-2923​.2010​.03775.x Dieckmann, P. (2020). The unexpected and the non-fitting - Considering the edges of simulation as social practice, Adv Simul (Lond), 5, 2. https://doi​.org​/10​.1186​/s41077​ -020​- 0120-y Dieckmann, P., Gaba, D., & Rall, M. (2007). Deepening the theoretical foundations of patient simulation as social practice. Simul Healthc, 2(3), 183–193. https://doi​.org​/10​.1097​/SIH​ .0b013e3180f637f5 Downing, S. M. (2003). Validity: On meaningful interpretation of assessment data. Med Educ, 37(9), 830–837. https://doi​.org​/10​.1046​/j​.1365​-2923​.2003​.01594.x Dyke, C., Franklin, B. R., Sweeney, W. B., & Ritter, E. M. (2021). Early implementation of fundamentals of endoscopic surgery training using a simulation-based mastery learning curriculum. Surgery, 169(5), 1228–1233. https://doi​.org​/10​.1016​/j​.surg​.2020​.12​.005 Eppich, W., & Cheng, A. (2015). Promoting Excellence and Reflective Learning in Simulation (PEARLS): Development and rationale for a blended approach to health care simulation debriefing. Simul Healthc, 10(2), 106–115. https://doi​.org​/10​.1097​/SIH​ .0000000000000072 Eppich, W. J., Hunt, E. A., Duval-Arnould, J. M., Siddall, V. J., & Cheng, A. (2015). Structuring feedback and debriefing to achieve mastery learning goals. Acad Med, 90(11), 1501–1508. https://doi​.org​/10​.1097​/ACM​.0000000000000934 Ericsson, K. A. (2006). The Cambridge handbook of expertise and expert performance. Cambridge University Press. Ericsson, K. A. (2008). Deliberate practice and acquisition of expert performance: A general overview. Acad Emerg Med, 15(11), 988–994. https://doi​.org​/10​.1111​/j​.1553​-2712​.2008​ .00227.x Ericsson, K. A., & Harwell, K. W. (2019). Deliberate practice and proposed limits on the effects of practice on the acquisition of expert performance: Why the original definition matters and recommendations for future research. Front Psychol, 10, 2396. https://doi​ .org​/10​.3389​/fpsyg​.2019​.02396 Ericsson, K. A. K., Krampe, R. T., & Tesch-Romer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363–406.

Healthcare Simulation Methods

293

Franklin, B. R., Placek, S. B., Gardner, A. K., Korndorffer, J. R. Jr., Wagner, M. D., Pearl, J. P., & Ritter, E. M. (2018). Preparing for the American board of surgery flexible endoscopy curriculum: Development of multi-institutional proficiency-based training standards and pilot testing of a simulation-based mastery learning curriculum for the endoscopy training system. Am J Surg, 216(1), 167–173. https://doi​.org​/10​.1016​/j​.amjsurg​.2017​.09​ .010 Fraser, K., Huffman, J., Ma, I., Sobczak, M., McIlwrick, J., Wright, B., & McLaughlin, K. (2014). The emotional and cognitive impact of unexpected simulated patient death: A randomized controlled trial. Chest, 145(5), 958–963. https://doi​.org​/10​.1378​/chest​.13​-0987 Gaba, D. M. (2014). Simulation as a critical resource in the response to Ebola virus disease. Simul Healthc, 9(6), 337–338. https://doi​.org​/10​.1097​/SIH​.0000000000000068 Gabrysz-Forget, F., Bonds, M., Lovett, M., Alseidi, A., Ghaderi, I., & Nepomnayshy, D. (2020). Practicing on the Advanced Training in Laparoscopic Suturing Curriculum (ATLAS): Is mastery learning in residency feasible to achieve expert-level performance in laparoscopic suturing? J Surg Educ, 77(5), 1138–1145. https://doi​.org​/10​.1016​/j​.jsurg​ .2020​.02​.026 Gardner, A. K., DeMoya, M. A., Tinkoff, G. H., Brown, K. M., Garcia, G. D., Miller, G. T., Zaidel, B. W., Korndorffer, J. R. Jr., Scott, D. J., & Sachdeva, A. K. (2016). Using simulation for disaster preparedness. Surgery, 160(3), 565–570. https://doi​.org​/10​.1016​ /j​.surg​.2016​.03​.027 Gardner, R., & Raemer, D. B. (2008). Simulation in obstetrics and gynecology. Obstet Gynecol Clin North Am, 35(1), 97–127, ix. https://doi​.org​/10​.1016​/j​.ogc​.2007​.12​.008 Goldberg, A., Samuelson, S., Khelemsky, Y., Katz, D., Weinberg, A., Levine, A., & Demaria, S. (2017). Exposure to simulated mortality affects resident performance during assessment scenarios. Simul Healthc, 12(5), 282–288. https://doi​.org​/10​.1097​/SIH​ .0000000000000257 Gross, I. T., Whitfill, T., Redmond, B., Couturier, K., Bhatnagar, A., Joseph, M., Joseph, D., Ray, J., Wagner, M., & Auerbach, M. (2020). Comparison of two telemedicine delivery modes for neonatal resuscitation support: A simulation-based randomized trial. Neonatology, 117(2), 159–166. https://doi​.org​/10​.1159​/000504853 Hays, R. T., & Singer, M. J. (1989). Simulation fidelity in training system design: Bridging the gap between reality and training. Springer-Verlag. Hebbar, K. B., Colman, N., Williams, L., Pina, J., Davis, L., Bost, J. E., Jones, H., & Frank, G. (2018). A quality initiative: A system-wide reduction in serious medication events through targeted simulation training. Simul Healthc, 13(5), 324–330. https://doi​.org​/10​ .1097​/SIH​.0000000000000321 Heller, B. J., DeMaria, S., Katz, D., Heller, J. A., & Goldberg, A. T. (2016). Death during simulation: A literature review. J Contin Educ Health Prof, 36(4), 316–322. https://doi​ .org​/10​.1097​/CEH​.0000000000000116 Hunt, E. A., Duval-Arnould, J. M., Nelson-McMillan, K. L., Bradshaw, J. H., Diener-West, M., Perretta, J. S., & Shilkofski, N. A. (2014). Pediatric resident resuscitation skills improve after “rapid cycle deliberate practice” training. Resuscitation, 85(7), 945–951. https://doi​.org​/10​.1016​/j​.resuscitation​.2014​.02​.025 Imach, S., Eppich, W., Zech, A., Kohlmann, T., Pruckner, S., & Trentzsch, H. (2020). Applying principles from aviation safety investigations to root cause analysis of a critical incident during a simulated emergency. Simul Healthc, 15(3), 193–198. https://doi​.org​/10​.1097​/ SIH​.0000000000000457 Issenberg, S. B., McGaghie, W. C., Petrusa, E. R., Lee Gordon, D., & Scalese, R. J. (2005). Features and uses of high-fidelity medical simulations that lead to effective learning: A BEME systematic review. Med Teach, 27(1), 10–28. https://doi​.org​/10​.1080​ /01421590500046924

294

Human Factors in Simulation and Training

Jentsch, F., Curtis, M., & Salas, E. (2011). Simulation in aviation training. Ashgate Pub. Jorm, C., Roberts, C., Lim, R., Roper, J., Skinner, C., Robertson, J., Gentilcore, S., & Osomanski, A. (2016). A large-scale mass casualty simulation to develop the nontechnical skills medical students require for collaborative teamwork. BMC Med Educ, 16, 83. https://doi​.org​/10​.1186​/s12909​- 016​- 0588-2 Jung, D., Carman, M., Aga, R., & Burnett, A. (2016). Disaster preparedness in the emergency department using in situ simulation. Adv Emerg Nurs J, 38(1), 56–68. https://doi​.org​/10​ .1097​/ TME​.0000000000000091 Kamdar, G., Kessler, D. O., Tilt, L., Srivastava, G., Khanna, K., Chang, T. P., Balmer, D., & Auerbach, M. (2013). Qualitative evaluation of just-in-time simulation-based learning: The learners' perspective. Simul Healthc, 8(1), 43–48. https://doi​.org​/10​.1097​/SIH​ .0b013e31827861e8 Kessler, D., Pusic, M., Chang, T. P., Fein, D. M., Grossman, D., Mehta, R., White, M., Jang, J., Whitfill, T., Auerbach, M., & Investigators, I. L. (2015). Impact of just-in-time and justin-place simulation on intern success with infant lumbar puncture. Pediatrics, 135(5), e1237–1246. https://doi​.org​/10​.1542​/peds​.2014​-1911 Kohn, L. T., Corrigan, J., & Donaldson, M. S. (2000). To err is human: Building a safer health system. National Academy Press. Kolb, D. A. (1984). Experiential learning: Experience as the source of learning and development. Prentice-Hall. Lateef, F., Suppiah, M., Chandra, S., Yi, T. X., Darmawan, W., Peckler, B., Tucci, V., Tirado, A., Mendez, L., Moreno, L., & Galwankar, S. (2021). Simulation centers and simulationbased education during the time of COVID-19: A multi-center best practice position paper by the world academic council of emergency medicine. J Emerg Trauma Shock, 14(1), 3–13. https://doi​.org​/10​.4103​/JETS​.JETS​_185​_20 Lauria, M. J., Gallo, I. A., Rush, S., Brooks, J., Spiegel, R., & Weingart, S. D. (2017). Psychological skills to improve emergency care providers’ performance under stress. Ann Emerg Med, 70(6), 884–890. https://doi​.org​/10​.1016​/j​.annemergmed​.2017​.03​.018 Leighton, K. (2009). Death of a simulator. Clin Simul Nurs, 5(2), E59–E62. Lie, S. A., Wong, L. T., Chee, M., & Chong, S. Y. (2020). Process-oriented in situ simulation is a valuable tool to rapidly ensure operating room preparedness for COVID-19 outbreak. Simul Healthc, 15(4), 225–233. https://doi​.org​/10​.1097​/SIH​.0000000000000478 Lioce, L. (Ed.), Loprieato, J. (Founding Ed), Downing, D., Chang, T. P., Robertson, J. M., Anderson, M., Diaz, D. A., & Spain, A. E. (Assoc. Eds.) and the Terminology and Concepts Working Group. (2020). Healthcare simulation dictionary –Second Edition. https://doi​.org​/10​.23970​/simulationv2 Lizotte, M. H., Latraverse, V., Moussa, A., Lachance, C., Barrington, K., & Janvier, A. (2015). Trainee perspectives on Manikin death during mock codes. Pediatrics, 136(1), e93–e98. https://doi​.org​/10​.1542​/peds​.2014​-3910 Maa, T., Scherzer, D. J., Harwayne-Gidansky, I., Capua, T., Kessler, D. O., Trainor, J. L., Jani, P., Damazo, B., Abulebda, K., Diaz, M. C. G., Sharara-Chami, R., Srinivasan, S., Zurca, A. D., Deutsch, E. S., Hunt, E. A., & Auerbach, M., Peak investigators of the International Network for Simulation-based Pediatric Innovation, R., & Education. (2020). Prevalence of Errors in Anaphylaxis in Kids (PEAK): A multicenter simulationbased study. J Allergy Clin Immunol Pract, 8(4), 1239–1246 e1233. https://doi​.org​/10​ .1016​/j​.jaip​.2019​.11​.013 McBride, M. E., Schinasi, D. A., Moga, M. A., Tripathy, S., & Calhoun, A. (2017). Death of a simulated pediatric patient: Toward a more robust theoretical framework. Simul Healthc, 12(6), 393–401. https://doi​.org​/10​.1097​/SIH​.0000000000000265

Healthcare Simulation Methods

295

McGaghie, W. C., Siddall, V. J., Mazmanian, P. E., & Myers, J. (2009). Lessons for continuing medical education from simulation research in undergraduate and graduate medical education. Effectiveness ofcontinuing medical education: American College of Chest Physicians evidence-based educational guidelines. Chest, 135 Suppl, 62S–68S. McGaghie, W. C., Issenberg, S. B., Cohen, E. R., Barsuk, J. H., & Wayne, D. B. (2011). Does simulation-based medical education with deliberate practice yield better results than traditional clinical education? A meta-analytic comparative review of the evidence. Acad Med, 86(6), 706–711. McGaghie, W. C., Issenberg, S. B., Barsuk, J. H., & Wayne, D. B. (2014). A critical review of simulation-based mastery learning with translational outcomes. Med Educ, 48(4), 375–385. https://doi​.org​/10​.1111​/medu​.12391 Meyer, E. C., Sellers, D. E., Browning, D. M., McGuffie, K., Solomon, M. Z., & Truog, R. D. (2009). Difficult conversations: Improving communication skills and relational abilities in health care. Pediatr Crit Care Med, 10(3), 352–359. https://doi​.org​/10​.1097​ /PCC​.0b013e3181a3183a Niles, D., Sutton, R. M., Donoghue, A., Kalsi, M. S., Roberts, K., Boyle, L., Nishisaki, A., Arbogast, K. B., Helfaer, M., & Nadkarni, V. (2009). “Rolling refreshers”: A novel approach to maintain CPR psychomotor skill competence. Resuscitation, 80(8), 909– 912. https://doi​.org​/10​.1016​/j​.resuscitation​.2009​.04​.021 Niles, D. E., Nishisaki, A., Sutton, R. M., Elci, O. U., Meaney, P. A., O’Connor, K. A., Leffelman, J., Kramer-Johansen, J., Berg, R. A., & Nadkarni, V. (2017). Improved retention of chest compression psychomotor skills with brief “rolling refresher” training. Simul Healthc, 12(4), 213–219. https://doi​.org​/10​.1097​/SIH​.0000000000000228 Owen, H. (2012). Early use of simulation in medical education. Simul Healthc, 7(2), 102–116. https://doi​.org​/10​.1097​/SIH​.0b013e3182415a91 Patel, S. M., Miller, C. R., Schiavi, A., Toy, S., & Schwengel, D. A. (2020). The sim must go on: Adapting resident education to the COVID-19 pandemic using telesimulation. Adv Simul (Lond), 5, 26. https://doi​.org​/10​.1186​/s41077​- 020​- 00146-w Perretta, J. S., Duval-Arnould, J., Poling, S., Sullivan, N., Jeffers, J. M., Farrow, L., Shilkofski, N. A., Brown, K. M., & Hunt, E. A. (2020). Best practices and theoretical foundations for simulation instruction using rapid-cycle deliberate practice. Simul Healthc, 15(5), 356–362. https://doi​.org​/10​.1097​/SIH​.0000000000000433 Peterson, E., Morgan, R., & Calhoun, A. (2021). Improving patient- and family-centered communication in pediatrics: A review of simulation-based learning. Pediatr Ann, 50(1), e32–e38. https://doi​.org​/10​.3928​/19382359​-20201211​- 02 Peterson, E. B., Porter, M. B., & Calhoun, A. W. (2012). A simulation-based curriculum to address relational crises in medicine. J Grad Med Educ, 4(3), 351–356. https://doi​.org​ /10​.4300​/JGME​-D​-11​- 00204 Phrampus, P. E., & Cole, J. (2005). Death during simulation training: Feedback from trainees. International Meeting on Medical Simulation. Phrampus, P. E., O’Donnell, J. M., Farkas, D., Abernethy, D., Brownlee, K., Dongilli, T., & Martin, S. (2016). Rapid development and deployment of Ebola readiness training across an academic health system: The critical role of simulation education, consulting, and systems integration. Simul Healthc, 11(2), 82–88. https://doi​.org​/10​.1097​/SIH​ .0000000000000137 Ritter, E. M., Lineberry, M., Hashimoto, D. A., Gee, D., Guzzetta, A. A., Scott, D. J., & Gardner, A. K. (2018). Simulation-based mastery learning significantly reduces gender differences on the fundamentals of endoscopic surgery performance exam. Surg Endosc, 32(12), 5006–5011. https://doi​.org​/10​.1007​/s00464​- 018​- 6313-y

296

Human Factors in Simulation and Training

Ritter, E. M., Taylor, Z. A., Wolf, K. R., Franklin, B. R., Placek, S. B., Korndorffer, J. R. Jr., & Gardner, A. K. (2018). Simulation-based mastery learning for endoscopy using the endoscopy training system: A strategy to improve endoscopic skills and prepare for the fundamentals of endoscopic surgery (FES) manual skills exam. Surg Endosc, 32(1), 413–420. https://doi​.org​/10​.1007​/s00464​- 017​-5697-4 Rudolph, J. W., Raemer, D. B., & Simon, R. (2014). Establishing a safe container for learning in simulation: The role of the presimulation briefing. Simul Healthc, 9(6), 339–349. https://doi​.org​/10​.1097​/SIH​.0000000000000047 Salas, E., & Burke, C. S. (2002). Simulation for training is effective when. Qual Saf Health Care, 11(2), 119–120. https://doi​.org​/10​.1136​/qhc​.11​.2​.119 Scott, S., Hearns, V., & Barker, M. A. (2019). Testing clinical skills: A look at the OSCE and USMLE clinical skills exams. S D Med, 72(10), 451–453. https://www​.ncbi​.nlm​.nih​.gov​ /pubmed​/31816205 Sharara-Chami, R., Sabouneh, R., Zeineddine, R., Banat, R., Fayad, J., & Lakissian, Z. (2020). In situ simulation: An essential tool for safe preparedness for the COVID-19 pandemic. Simul Healthc, 15(5), 303–309. https://doi​.org​/10​.1097​/SIH​.0000000000000504 ssih​.org.​ Sim Center Directory. https://www​.ssih​.org​/ Home​/SIM​-Center​-Directory Tavares, W., Brydges, R., Myre, P., Prpic, J., Turner, L., Yelle, R., & Huiskamp, M. (2018). Applying Kane’s validity framework to a simulation based assessment of clinical competence. Adv Health Sci Educ Theory Pract, 23(2), 323–338. https://doi​.org​/10​ .1007​/s10459​- 017​-9800-3 Tripathy, S., Miller, K. H., Berkenbosch, J. W., McKinley, T. F., Boland, K. A., Brown, S. A., & Calhoun, A. W. (2016). When the mannequin dies, creation and exploration of a theoretical framework using a mixed methods approach. Simul Healthc, 11(3), 149– 156. https://doi​.org​/10​.1097​/SIH​.0000000000000138 Truog, R. D., & Meyer, E. C. (2013). Deception and death in medical simulation. Simul Healthc, 8(1), 1–3. https://doi​.org​/10​.1097​/SIH​.0b013e3182869fc2 Wagner, M., Jaki, C., Lollgen, R. M., Mileder, L., Eibensteiner, F., Ritschl, V., Steinbauer, P., Gottstein, M., Abulebda, K., Calhoun, A., & Gross, I. T. (2020). Readiness for and response to coronavirus disease 2019 among pediatric healthcare providers: The role of simulation for pandemics and other disasters. Pediatr Crit Care Med, Publish. Ahead of Print. https://doi​.org​/10​.1097​/ PCC​.0000000000002649 Watling, C. J., & Ginsburg, S. (2019). Assessment, feedback and the alchemy of learning. Med Educ, 53(1), 76–85. https://doi​.org​/10​.1111​/medu​.13645 Wayne, D. B., Butter, J., Siddall, V. J., Fudala, M. J., Wade, L. D., Feinglass, J., & McGaghie, W. C. (2006). Mastery learning of advanced cardiac life support skills by internal medicine residents using simulation technology and deliberate practice. J Gen Intern Med, 21(3), 251–256. https://doi​.org​/10​.1111​/j​.1525​-1497​.2006​.00341.x

11

Design and Development of Algorithms for Gesture-Based Control of SemiAutonomous Vehicles Brian Sanders, Yuzhong Shen, and Dennis Vincenzi

CONTENTS Introduction............................................................................................................. 297 Background............................................................................................................. 298 Gestures and Gesture Capture Technology.................................................... 298 Cognitive Loading Considerations................................................................300 Approach, Implementation, and Results................................................................. 301 Approach........................................................................................................ 301 Developing Proper Training System Familiarization as a Prelude to Training.............................................................................................. 301 Implementation and Results: Phase 1............................................................ 303 Gestures and LMC Measurements..................................................... 303 Virtual Environment Development....................................................304 User Testing.......................................................................................307 Algorithm Redesign...........................................................................309 Implementation and Results: Phase 2............................................................ 310 Physical Demonstration Setup........................................................... 310 User Testing....................................................................................... 312 Summary and Conclusions..................................................................................... 314 References............................................................................................................... 315

INTRODUCTION Drones (aka, unmanned aerial systems or UAS) are used for a variety of purposes such as aerial videography, photography, and surveillance. Successful accomplishment of these tasks requires the execution of a series of basic maneuvering functions (i.e., take off, acceleration, point-to-point navigation) that, when combined, contribute DOI: 10.1201/9781003401353-11

297

298

Human Factors in Simulation and Training

to a mission-capable system. Commercially available small unmanned aerial systems (sUAS) have traditionally been designed and controlled using legacy interface approaches to control the remote vehicle. These traditional control interfaces are typically one-dimensional (1D) or two-dimensional (2D) devices that allow the user to interact with a system in a limited manner (Balog et al., 2016). For example, keyboards are 1D input devices that allow for text input and activation of preprogrammed functions via a sequence of key/text inputs. Mice have expanded input capabilities into a 2D framework, but input is still limited to menu item selection or “hotspots” on a graphical user interface. Both of these control devices, while functional and useful, are limited in nature and not very intuitive in terms of control movement, input, and function, and they are often slow and time consuming as control through these devices often requires a series or sequence of inputs to achieve the desired end state. Other legacy control devices, such as those that are joystick based are better, but still an attempt to translate 2D input into movement through a three-dimensional (3D) space or environment. Integration with touch-sensitive devices such as phones and tablets is emerging on the market to replace or augment discrete physical controls and information displays (Balog et al., 2016). However, these devices are, in many cases, simply electronic or digital versions of the same 2D legacy control devices. These devices typically combine electronic visual displays with touch input, and sometimes electronic input (GPS, accelerometers, and automation, for example). An alternative to these traditional command and control approaches is via the use of gestures. Development of a gesture-based approach for sUAS operation may be a viable alternative for implementation into command and control interfaces using technology that is designed to recognize gestures. A gesture-based approach can free the operator from having to hold and operate a multi-joystick, multi-button-based controller by correlating the vehicle operations to a set of fluid, intuitive, natural, and accepted set of hand gestures. This in combination with new visual displays can create an entirely new command and control interface structure. Design of these systems will require careful investigation of human factors issues to populate gesture libraries that are natural and intuitive, as well as cognitive loading considerations due to the easy availability of a vast amount of visual information. The remainder of this chapter is organized as follows. Following the introduction, the next section touches on related research on gestures and cognitive load considerations. The section that follows next illustrates a model research approach useful for the multidisciplinary nature of this effort, implementation of that model, and discusses some design guidelines obtained from simulations and physical demonstrations. The final section summarizes the findings and conclusions.

BACKGROUND Gestures and Gesture Capture Technology Gesture-based control, as well as traditional control technology, pose a unique challenge to remote operations of unmanned vehicles. To begin with, the term “unmanned system” is a misnomer at this point in time; since there is a human operator present in

Design and Development of Algorithms

299

the system, the system will always be “manned” in some way. The only difference in the case of unmanned systems is that the operator is not collocated with the vehicle. Thus, placing the operator in a unique position and providing a different operational perspective since many of the environmental cues normally present in manned scenarios are no longer present and available to the human operator. Research has suggested that while separated from the vehicle, gestures can help mentally connect with it. Cauchard et al. (2015) investigated how to interact with flying robots (aka drones). They conducted a study to learn how to naturally interact with drones. Results show strong agreement between participants for many interaction techniques, such as when gesturing for the drone to stop. They discovered that people interact with drones as with a person or a pet, using interpersonal gestures, such as beckoning the drone closer. Some previous related research centered around development of computer algorithms that would allow robotic systems to recognize gesture commands in the field as part of military teams. Other research has focused on virtual reality environments integrated with optical sensors to recognize and measure movement, velocity, and patterns of movement, of fingers and hands, and then translate those gestures into commands. Hamilton et al. (2016) conducted research that focused on developing the ability for robotic systems to understand military squad commands. The long-term goal was to develop the capability to integrate robots with ground forces as seamless teammates in combat operations. Their research focused on creating a recognition model that understands 12 squad-level commands, such as rally, listen, stop, and come here. The input into the model was collected using Microsoft Kinect’s skeletal model and processed with a logistic regression activation function to identify the gesture. The logistic model showed an overall 97% effectiveness when discriminating if the datasets are from a given member set. The decision model was 90% effective in determining the gesture class a given dataset represents (Hamilton et al., 2016). Lampton et al. (2002) conducted investigations into using a gesture recognition system integrated with a virtual environment. Their goal was to measure the accuracy and effectiveness of a VR-based gesture recognition system. The system consisted of two video cameras, software to track the positions of the gesturers hands, and software to recognize gestures by analyzing the position and movement of the hands. The researchers selected 14 basic and accepted hand gestures commonly used in the field by US Army personnel. In general, the results were mixed in terms of recognition and accuracy. Many of the gestures were problematic in terms of tracking, recognition, or both. Recent advancements in hardware and software processing have resulted in the ability to accurately capture gestures electronically. As mentioned above the Microsoft Kinect’s is one example. Another one is the Leap Motion Controller (LMC) (Leap Motion, 2019). It is a relatively recent technology that can capture and track hand motion with a sensor just slightly bigger than a standard USB flash drive. These devices have millimeter position accuracies (see, for examples, Weichert et al., 2013; Guna et al., 2014) and are able to capture a range of hand motions gripping processes (Smeragliuilo et al., 2016; Staretu & Moldovan, 2016). It has been suggested that these devices are better suited for 3D environments, such as that which a

300

Human Factors in Simulation and Training

sUAS operates in, as compared to 2D devices such as the joystick and mouse (Scicali & Bischof, 2015). There have been a few documented efforts to control drones with gestures and multimodal approaches. Sarkar et al. (2016) used the LMC to control some basic motions of a UAV. They present the implementation of using the LMC to control an off-the-shelf quadcopter via simple human gestures. Some basic tests were accomplished to document the feasibility of the LMC based system to control the vehicle motion. Chandarana et al. (2017) explored a multimodal natural language interface that uses a combination of speech and gesture input modalities to build complex UAV flight paths by defining trajectory segment primitives. Gesture inputs (measured with the LMC) were used to define the general shape of a segment while speech inputs provide additional geometric information needed to fully characterize a trajectory segment. They observed that the interface was intuitive, but the gesture module was more difficult to learn than the speech module. This, and the other studies cited above, highlights the possibilities of alternative command and control approaches with the emerging technology.

Cognitive Loading Considerations When combined with a head-mounted device (HMD) the gesture capture technology provided can be used to develop an alternative command and control system. In addition to identifying a gesture library careful consideration should be given to the cognitive loading aspects. This can result from the physical hand motions and the potential of information overload given the amount of available information that can be presented in a HMD. An example of what is possible, and methods to reduce cognitive loading, was discussed by Zollman et al. (2014). They investigated the application of micro aerial vehicles (MAVs) equipped with high-resolution cameras to create aerial reconstructions of selected locations. They discuss that this workflow yields several issues in cognitive loading, such as the need to mentally transfer the aerial vehicle’s position between 2D map positions and the physical 3D environment, and the complicated depth perception of objects flying in the distance. They presented an AR-supported navigation and flight planning of micro aerial vehicles by augmenting the user’s view with relevant information for flight planning and live feedback for flight supervision. Additionally, they introduced depth hints supporting the user in understanding the spatial relationship of virtual waypoints in the physical world and investigated the effect of these visualization techniques on the spatial understanding. The investigation highlighted the possibilities of an AR component of a command and control system and specific challenges related to cognitive processing. Zollman et al. (2014) highlighted a few of the cognitive loading issues related to the design of an AR based command and control system. There are several more that need to be considered (Givens et al., 1998; Rorie & Fern, 2014). For example, Dodd et al. (2014) investigated touch screen capability in aircraft cockpits and stated that as elements and workload increase in number and complexity, increased cognitive loading will follow. For the current effort, this concept drove the design in terms of the number and complexity of gestures a participant was expected to initiate for controlling

Design and Development of Algorithms

301

the vehicle. As the research progresses beyond the flat screen additional factors come into play. As AR capability was added, issues of switching views between the operator real-world view and a virtual framework need to be considered. Recent evidence indicates that very different brain processes are involved in comprehending meaning from these sources (Ravassard et al., 2013). The above discussion highlights some of the complexities that can quickly emerge in a command and control system, so careful consideration must be given to this design aspect as this approach is developed.

APPROACH, IMPLEMENTATION, AND RESULTS Approach There were two research objectives for this demonstration. One was related to the human factors aspect, while the second addressed the suitability of the proposed simulated environment as an assessment and training tool. They are stated as follows: 1. Investigate the application of gesture-based control of semi-autonomous systems to identify capability, challenges, and limitations to assess the feasibility (can you do it) and viability (does it add value) of the approach. 2. Assess suitability of a commercially available simulation environment to (1) support assessment of human performance and interface preferences for vehicle control and (2) provide a training environment for transition to a real-world system. A two-phase approach was taken to address the objectives. Phase 1 involved simulation only. It centered around the idea of observing a user’s ability to control a recreational quadcopter. This phase started with the identification of hand gestures to control the vehicle and the selection and utility validation of a representative gesture capture technology. With these fundamental building blocks in place, the simulation development followed an evolutionary approach where participants were brought in periodically to exercise the simulation and provide feedback on the basic gesturebased concept and simulation features. Modifications and additions were made after each of these events. While not a large sample size of participants, it did provide much-needed insight into major design features to minimize mental and physical fatigue and stable vehicle control. The objective of Phase 2 was to add validity to the findings from the pure simulated environment. In this phase a small ground vehicle was selected as the control model. This phase included a virtual reality simulation to train and familiarize the participant with the controls and vehicle performance. It was then followed by a physical demonstration of navigating to support findings from the simulation tests.

Developing Proper Training System Familiarization as a Prelude to Training Proper training methodologies and programs sometimes need to be modified to add additional training beyond the basic scope of training to become proficient to

302

Human Factors in Simulation and Training

operate the system. This may be necessary in two specific instances. The first instance may be encountered when training systems are deficient and the trainee encounters unanticipated confusion due to a mismatch in design between the training system and the real-world system; (i.e., the training system is poorly designed and does not operate in the same manner as the real-world system is designed to operate in terms of displays, controls, and visuals). The second instance may be encountered when dealing with new, unique, and unfamiliar technology. Training to learn to use the technology properly is an important first step that may be needed before learning to actually learning to operate the real-world system. In the case of systems that rely on gesture-based controls and gesture recognition, the need to learn what gestures are used to operate the system, and what features may be incorporated into the system to assist in interpretation and operation of the system in a smooth and efficient manner are essential prior to learning how to operate the system itself. In the current system being researched and developed, the complete system consists of multiple components using a combination of gestures, a gesture recognition system, and software algorithms designed to assist in the operation of the vehicle to produce smooth and efficient operation based on the command gesture produced by the trainee and recognized by the gesture recognition system. The gesture recognition system consists of the Leap Motion Controller which produces a “control bubble” that detects and recognizes a pre-determined set of gestures. Once a person places their hands inside the recognition area, the LMC will recognize and interpret the gesture for translation into movement of the vehicle. The software algorithms then enhance the interpretation of that gesture to assist in smoother, better-defined movement in the simulation or real-world vehicle operation. This type of control interface (gesture-based recognition system) is at a great disadvantage when compared to a type of control interface, such as a joystick, that has been in use for decades and very familiar to almost anyone who has played a video game or purchased a remote-control plane or car for entertainment. The most common control interface device for video games and remote-control vehicles today is a joystick-based hand controller. Almost all joystick interfaces have a number of features that are common across various units, and the functions of those control inputs are universal in nature. A gesture-based control system is something unfamiliar to most individuals, and must be clearly defined before use. The gestures used for recognition by the LMC are natural, intuitive, and somewhat familiar in a general sense, but the type of control interface is novel and unique. In other words, the gestures when taken by themselves are familiar to any user in a general sense, but to use those gestures in an interface used to control a vehicle is something foreign and not familiar. Therefore, the need to train the user on the gestures and their use in the control of the vehicle is essential before any system training can take place. Failure to do so will result in longer training times and poorer performance compared to training a user to control a vehicle with a device that is familiar to them such as a joystick-based hand controller.

303

Design and Development of Algorithms

Implementation and Results: Phase 1 Gestures and LMC Measurements The first step toward developing the capability was to conduct a task breakdown and gesture matching exercise to identify the functions associated with flying and operating a representative recreational hovercraft (aka., quadcopter). This task breakdown is shown in the first column of Table 11.1. It was partitioned into categories of flight control and camera control. Seven potential actions in the flight control category related to the movement of the vehicle in the airspace were identified. Three camera actions were identified. They refer to the camera view (i.e., a first-person view from the operator or vehicle) and direction (pitch and yaw). The description of the flight control is an abstraction and describes what the operator wants to make happen rather than how the vehicle does it. For example, the desired action is for the vehicle to climb or descend, translate in a horizontal plane, or yaw around its vertical axis. This motion is enabled through the application forces and torques on the vehicle. These forces and torques are determined by the internal control logic of the air vehicle based on user commands. In this investigation, one of the algorithms that drive the simulation translates the operator input to forces and torques on the vehicle. The Leap Motion Controller was selected as the representative technology for which to capture the gestures identified in Table 11.1. Previous investigations (Weichert et al., 2013) have demonstrated that the LMC has submillimeter accuracy. It was desired to build on this finding and assess how accurately the humanperformed gestures described in Table 11.1 are captured by the LMC. A C# script developed using the LMC application programming interface (API) was used to capture this data. Figure 11.1 is a representative example of the hand angle vs sample

TABLE 11.1 Quadcopter Control Actions and Corresponding Gestures Vehicle Action

Gesture

Flight Control Climb/Descend Translate Left/Right Translate Forward/Aft Yaw Increase/Decrease Speed Stop Control Initiation Camera Control Switch View Pitch

Left Hand Pitch Right Hand Roll Right Hand Pitch Left Hand Yaw or Roll Controlled by Vehicle Pitch and Roll Fist/Remove Hands from Control Environment Open Hand

Yaw

Left Hand Yaw

Tapping Motion Right Hand Pitch

304

Human Factors in Simulation and Training

FIGURE 11.1  Representative gesture capture using LMC – rolling right hand.

number captured by the LMC. In this case a left-right rotation of the hand. It was produced by performing the gesture with the right hand at a natural speed so as not to be excessively slow or fast. It can be observed that the LMC captured the gesture with a high degree of fidelity. The slope variations are a result of minor changes in rotational speed of the hand, indicating again the highly accurate nature of this sensor. This exercise demonstrated the precise results produced by the LMC algorithms used to process the captured images. It also illustrates the variations possible with a human-performed motion. Indicating the necessity of data smoothing in the gesture interpretation algorithms used in the simulation. The implementation of this is described in the section “Algorithm Redesign.” Virtual Environment Development The simulation development begins with a description of the visual component of the virtual environment (VE). This includes basic scene setting, information displayed, and vehicle control mechanisms. It then subsequently explores details of key components that make the simulation functional. The approach taken in this investigation is to minimize the load on the working memory. This will result in limiting the information transmitted to the user to include basic vehicle status (i.e., speed, altitude) and visual information to improve perception and vehicle component control. Taking this approach will keep the focus on the gesture control aspect and suitability of the basic simulation environment. Figure 11.2 shows a screen capture of the initial virtual environment. The drone is a generic representation of a recreational quadcopter. It models a 1kg drone with nominal dimensions of 30 cm × 30 cm × 10 cm and has red lights indicating the

Design and Development of Algorithms

305

FIGURE 11.2  Initial simulation screen design.

forward part of the drone and blinking green lights in the rear of the unit. The arrow in the left-hand corner serves as an orientation aid for users to determine the vehicle direction when it is too far away to clearly distinguish the lights. This is best understood by rotating the arrow 90° so it is on a parallel plane with the vehicle. For the case shown in Figure 11.2, the arrow indicates it is coming at the user from the right. The vehicle information displayed is altitude, speed, and range to vehicle and is shown in top left of the figure. An alternative concept for the vehicle data is to have it follow the vehicle in a fixed position such as off to the right. However, it would tax working memory unnecessarily and so not implemented since this was not a focus of the demonstration at this point in time. Therefore, the side position was selected so the user could quickly glance at the data when needed. A dynamic user interface (UI) was used to switch the camera view using gestures. It is a capability available in the Orion Version of the LMC API (Leap Motion, 2019). In this case, a dynamic UI is attached to the left hand and is visible when that hand is rotated toward the user as shown in Figure 11.3. It contains two buttons to enable the user to switch the view between the operator or vehicle camera. This type of dynamic UI is an attractive feature for the proposed system. It has the potential to lower working memory load since it is not always in the field of view. Now that the visual component of the VE has been described, we will discuss some of the mechanics that made it work starting with a discussion of how the vehicle motion was controlled. There were 14 individual C# scripts written to control the simulation to include visual features such as tracking and displaying vehicle information to capturing gestures and controlling the vehicle. Two scripts central to vehicle control are the GestureListener (captures gestures) and UAVController scripts. The former captures gestures (i.e., hand orientation, fist) of each hand such as that shown in Figure 11.2. The latter interprets the gestures control vehicle operations such as setting forces and torques on the vehicle and camera control.

306

Human Factors in Simulation and Training

FIGURE 11.3  Dynamic UI to control view perspective.

FIGURE 11.4  Modeling the vehicle forces and torques – actual (left) – in Unity (right).

Unity provides a physics engine to apply forces and torques to an object via its Rigidbody class, which controls the object’s linear motion via forces and angular motion via torques. Figure 11.4 shows two freebody diagrams of the model vehicle. The left one shows the four forces produced by each propeller. By adjusting individual propeller forces, a force–torque combination will be applied to the actual vehicle to produce the desired flight behavior. For this simulation, the vehicle was modeled with a Rigidbody component attached to it. This enables the application of a single 3D force vector and a single 3D torque vector to the vehicle. For the simulated vehicle, the four propeller forces are then modeled as single force in y-direction relative to the orientation of the vehicle (i.e., perpendicular) and a single torque vector as shown in freebody diagram on the right. Maximum forces and torques

Design and Development of Algorithms

307

values applied to the vehicle were adjusted so that the simulated vehicle performance closely approximated that of the real vehicle. A linear relationship is used to interpret the gesture and transform it to an applied force or torque in the UAVController script. It is a normalized value between −1 and 1 based on a prescribed maximum hand rotation. A limit on the hand rotation was based on the observations on the range of motion of natural hand gestures discussed previously. For example, the maximum wrist rotation was set to 30°. Even though the user may rotate the hand to a larger angle, the control input was maxed out at this condition. User Testing The purpose of the first round of tests was to make a comparative assessment between a joystick/button device (the Xbox 360) and gesture-based control. Four participants took part in the testing. Each participant engaged in two scenarios with each control approach. The first scenario was “play time”, and the second was a search mission. In playtime, the users were not asked to do anything specific; it was just meant to give them time to explore the response of the vehicle to the flight control inputs via the two techniques and also become familiar with the operation of the dynamic UI for controlling the camera view. In the second scenario, they were asked to locate and navigate the vehicle to a specific location in the scene. There was no prescribed path at this point but rather just a destination. After this, the participants engaged in a short post-test interview. The total time to complete the test and post-test interview typically took just under an hour per participant. In general, the participants preferred the Xbox controller over the gesturebased control system. Several observations and comments support this position. For example, on average twice as much time (11 min v. 22 min) was spent in play mode with the gesture-based system. This is an indication that the users felt more comfortable and familiar with the Xbox controller v. the gesture-based system. A typical user’s ability to control the vehicle significantly improved over the play period, but they still did not feel as comfortable with the gesture system as compared to the joystick device at the end of the play session. Finally, mission times when using the Xbox were on the order of three minutes while the missions using gesture control were rarely completed due to fatigue and frustration with the system. In the post-test interviews, participants reported feeling fatigued, mostly due to using the gesture system. This is most likely from a combination of physical and mental fatigue. Even though only minimal hand movement is required to control the vehicle, it was observed that the participants used large hand gestures requiring more energy compared to the small thumb motions that can be used with the joystick. Also, the vehicle did not respond as accurately to these gestures since they did not fall into the detection region (i.e., the green box) and were not the subtle motions expected by the processing algorithm. These observations coupled with the consideration that gesture control is a new approach probably led to a higher level of mental engagement and thus fatigue. For the most part, the visual content was satisfactory for the participants. The location and amount of the textural information was enough, and the user’s responses

308

Human Factors in Simulation and Training

did not indicate they were overly taxed with processing that information. In fact, they were typically so focused on the vehicle that they needed to be reminded this information was available. The exception to this was the virtual hands which they found distracting. Other comments and observations centered around the use of the dynamic UI and visual aids. Participants could not consistently produce the menu shown in Figure 11.3 and often could not make the selection once the menu was available. Restricting the region where the vehicle control was activated received unfavorable comments too. The control box made them feel constricted, and it led to lack of control because they frequently had to check where they were in the field of view. Finally, they had difficulty processing 3D vehicle orientation using the 2D arrow. One final observation that all of the participants made was that they liked how the gesture-based system made them feel more connected to the vehicle response. The comments and observations from this set of tests led to several modifications of the simulation. First, the idea of introducing an unconstrained play environment did not result in effective conditions for the participants to learn the new gesture interface. A building block training environment was implemented to address this shortfall. Second, participants had a difficult time processing the correlation of the vehicle orientation with the 2-D direction indicator. In the updated version of the simulation, a 3D representation of the vehicle was included. This is shown in the bottom of Figure 11.5 as a semitransparent sphere containing a small-scale version of the drone model. This drone matches the pitch and roll orientation as well as the direction the vehicle is flying. It was anticipated that this will reduce the cognitive loading and thus fatigue since it is a more direct representation of the vehicle’s orientation and will require minimum processing to understand the vehicle’s orientation. Components of the user interface were also updated. A neutral command was programmed into the simulation. If a hand was detected to be in the shape of a fist, then no control command would be transmitted to the vehicle. Also, the virtual

FIGURE 11.5  Modified simulation screen design.

Design and Development of Algorithms

309

hands were made out of clear material, so it was less distracting to the user but still available for reference. Next, the dynamic UI was hard for participants to control. So, this was replaced by simply performing a task that appears as if the user was touching the vehicle to change the camera view. When viewing from the camera a small semitransparent square in front of the viewer is the target interface. In addition to being a bit more intuitive, it is also a simpler technique. A command (rotating the index finger) was also added to rotate the camera pitch angle 90°. This let the users scan from a position parallel to the flight path and straight down, which was useful for searching an area and landing. To assess the effectiveness of these modifications, two participants from the previous test were brought back. It was conceded that the joystick approach far exceeded the gesture-based control at this time, so the users were asked only to engage in the gesture-based control approach. Each participant was first led through the training environment. As anticipated, this aided in helping them develop a feel for the limited range of motion required to control the vehicle. Then they again went into the play and mission scenarios. In general, the feedback from the users was much more positive and it was observed that they had better control of the vehicle, were able to complete the requested missions, and switch camera views. They also demonstrated a lower level of fatigue and frustration. Algorithm Redesign Previous implementation of the proportional-integral-derivative (PID) controller was limited to transiting to hover mode and ensuring the vehicle did not exceed its maximum rotation angle in pitch and roll. This approach was expanded to include more control setpoints. These setpoints include the vertical climb rate, yaw rate, and the pitch and roll angle. Having this structure results in the hand gestures determining the setpoint and then the PID controller determines the required force and torque vector to maintain the vehicle in this condition until an additional command is given. So, it is still a kinetic-based simulation. The setpoint is determined based on a cubic relationship using the normalized change in hand orientation. This approach can be clarified by studying Figure 11.6. This figure illustrates a cubic relationship between the normalized gesture command and a control parameter. For this illustration, the maximum value of the control parameter is set to three. Assume that the vehicle is on the ground waiting for takeoff. This condition then defines the initial setpoint shown in the figure. A change in hand orientation from the reference orientation, such as positive pitch rotation, is then normalized and the new setpoint is determined based on a cubic function. Once this command is set the user can then return their hands to the neutral (e.g., resting) position and the vehicle will continue to follow that last input command by virtue of the PID controller. Note that returning the hands to the reference orientation does not affect the setpoint. Incremental changes to this updated setpoint are made again following the cubic function, so a small change in hand orientation will result in a small change in the setpoint while a larger change will increase it more but not beyond its maximum. Finally, an additional state was added to the system, so in this version there were three: active control, hover, and cruise. Switching between the

310

Human Factors in Simulation and Training

FIGURE 11.6  Redesigned gesture interpretation algorithm.

states was achieved by touching the thumb and index finger. After a change in state, the reference hand position can be reset based on the user’s preference. Finally, data smoothing was implemented to remove the jitter resulting from the captured hand gestures. This approach provides the operator with a wide range of control and flexibility anywhere in the flight envelop. Several tests by the developer and a complete novice demonstrated that these changes resulted in a significantly improved command and control system. First, the vehicle is easier to control and is more stable in flight. More precise control of performance parameters and vehicle positioning is enabled too by the new control algorithm. Further, the updated gesture interpretation algorithm combined with the implementation of the state machine resulted in the user being able to keep a lower level of physical stress on the hands and wrist. This is a positive consequence of not requiring the user to maintain off-neutral, fixed-hand positions for the vehicle to maintain its current flight trajectory.

Implementation and Results: Phase 2 Physical Demonstration Setup The purpose of Phase 2 was to see if the observations and algorithms developed using the simulated environment transfer to an actual physical demonstration. Thus, adding validity to the lessons learned and observations from Phase 1 and supporting the realism of the simulation, the latter being one of the two objectives of the project. This was accomplished via the use of a small ground vehicle.

Design and Development of Algorithms

311

As discussed previously, it is a good idea to provide a preliminary training environment so that an operator can become familiar with the required hand gestures and vehicle response. For this situation, an immersive VR simulation was developed. The VR environment was designed to represent the geometry of a room with dimensions of 5 m × 10 m × 3 m. Figure 11.7 shows the actual and simulated environments with the model car. The virtual environment contained a few obstacles such as pillars and tables that provided targets when directing test participants to navigate around. Hardware used in this experiment include the LMC and Oculus Rift Headset. The LMC generates the operational gesture recognition environment while the Oculus Rift provides an immersive display environment. Figure 11.8 shows virtual car in addition to two visual aids. One is a virtual trackball concept introduced in Phase 1 (nearly transparent sphere) and the other is a crossbar control indicator. The virtual trackball and the crossbar work in a coordinated manner. The trackball itself has a diameter of 0.1 m, which is about the size of a softball. The intent is to provide the user with an anchor for the hand to rotate

FIGURE 11.7  Actual (top) and virtual (bottom) environments with model car.

FIGURE 11.8  Trackball and crossbar control indicator.

312

Human Factors in Simulation and Training

around as it hovers over the LMC. Erratic readings can result from the LMC if the hand gets too close it. This distance is approximately 2.5 cm above it. An alert range is conservatively set to 5 cm. At this point, the trackball turns from a nearly transparent to a yellow color. The control indicator is composed of a crossbar each with a disk that moves either vertically or horizontally. The vertical component is tied to the pitch of the right hand and the forward and backward motion of the vehicle. The horizontal motion is tied to roll of the right hand, which is used to control the steering angle. The maximum motion of these gestures is set to 30°. Control of the vehicle based on these gestures is made through the use of a wheel collider and will be discussed a little later. To further support muscle memory training, users are positioned in a chair with the LMC just below and in front of the right armrest. While not practical for an actual application, it helps to provide an anchor for the arm. This in turn lets the user focus on the small hand motions required for vehicle control. The basic algorithms used in the virtual environments were implemented to control the model car system shown in Figure 11.9. It has two components: the transmitter (left) and the model car (right). A schematic of the processing and control algorithm is shown in Figure 11.10. As shown in the left circle, Unity captures and interprets gestures from the LMC to send to the transmitter via the computer USB port. It is a string containing rear wheel motor power and turning commands. The transmitter then sends the signal to the car. The Arduino board on the car then interprets and executes the commands as indicated in the circle on the right. Minor modifications to the original software that came with the system were required to enable this capability. User Testing There were two rounds of testing conducted. In the first round the initial, the linear control algorithm mentioned earlier was implemented, so this was a direct control of the vehicle motor torque and steering using a linear gesture interpretation algorithm. The second round of testing implements most but not all of the modified control

FIGURE 11.9  Adeept controller and smart car (Adeept, 2019).

Design and Development of Algorithms

313

FIGURE 11.10  Schematic representation of Unity to car control scheme.

algorithm. This is due to the fact that feedback parameters (i.e., vehicle speed) are not always available. For example, vehicle speed is available in the VR simulation but not in the actual vehicle in its current configuration. The following features are included: cubic interpretation of gestures, incremental command inputs, and commands based on a neutral reference position. So, this still captures key foundational elements of the control approach related to lowering the physical stress on the user. Two sessions were conducted in the first round. One a high participation count (around 15 participants) but informal activity and one a more formal but lower participation count (2 participants). In each case, there was a training period followed by a play time in the VR environment, followed by an event where the user controlled the model vehicle. The initial training mode involved no vehicle movement, but the indicator was free to move. This enabled the user to become familiar with the hand position and the small range of motion required for vehicle control. After that forward and backward motion was enabled to allow the user to become familiar with the visual effect of the moving car. Finally, the car steering was enabled. This stepby-step training process was inspired from the findings of Phase 1. Informal observations of approximately 15 people took place. During these engagements, it was observed that within about 20 minutes the majority of the participants were able to reasonably control the vehicle in both virtual and reallife scenarios (10 minutes in each environment). Further, the large gestures from previous testing were not observed and the participants used small, relaxed gestures. So, it appears that the combination of the virtual visual aids and anchoring of the arm produced the desired results. One issue that was observed is that the turning performance was a little unstable. More like seeing someone ride a wobbly bicycle rather than the smooth, consistent motion. Two additional but more formal tests were conducted. In this case, each participant was processed through the same rigorous training process before enabling the play mode. Similar observations were made to the informal test described above

314

Human Factors in Simulation and Training

about hand motions and vehicle control. It was further observed that users became more confident in their ability to control the car in around 10 minutes. As described above in the informal test session, speed control was smooth but the turning was still a bit unstable. This is something that the latest control algorithm corrected. Participation in the second round of tests was limited to the developer. In this round, the developer implemented and exercised the new control scheme in both the VR environment and with the remote-control car. The main differences in implementation between car-based scenarios and the UAV simulation is in selection and implementation of setpoints. For the VR simulations, vehicle speed and wheel angle were the setpoints used with PID controller scheme. These setpoints are not available with the remote-control car. That would require additional vehicle sensors to provide feedback, so the active PID controller is not implemented. Other features, such as the cubic gesture interpretations and data smoothing, are integrated into the control methods. After some initial testing, it was decided to slightly modify the control algorithm to more smoothly control the car. In the UAV control, a performance parameter is set such as climb rate or desired roll angle to achieve the desired speed. Then the PID controller maintains that condition. In the case of the car controller, this worked well for the speed control. In the VR simulation, the user could adjust desired speed and then the PID controller would determine the required torque to apply to the wheel. In the remote-control car case, the user torque is directly linked to the cubic gesture function. In each of these scenarios, the user can still return their hand to the neutral position and the car will continue at that speed. To stabilize the steering required returning to a more rudimentary approach. This is due to the fact that steering, especially in confined venues, is a more dynamic event requiring constant adjustments. It was found that the best way to steer the car was to maintain the hand in a rotated position while turning but release it once the target direction was achieved. The wheel position would in turn then return to a zero angle. It was also decided to implement a 3: steering ratio. For example, the maximum recognized hand rotation angle was set to ±30° while the maximum steering angle of the car was set to ±10°. This is another feature that translates the less accurate human performance to more precise control of the car. These adjustments made executing basic maneuvering such as ovals and figure 8s more manageable. This final exercise illustrates the complexity of the control process and several of the features that need to be considered in the design of such a system.

SUMMARY AND CONCLUSIONS The ability to control vehicles via gesture-based control is achievable. Additionally, with the emergence of head-mounted, augmented reality technology, it may make it preferable. However, at this point, there is still a strong preference for the joystick approach. This may be the result of a combination of familiarization and maturity of the technology. The joystick-based controller has been around for a number of years; its basic design is well tailored and its functions are well developed. The gesture-based system is still new and can be intuitive, but it is not something that

Design and Development of Algorithms

315

individuals are very familiar with and comfortable using at this point in time. While care was taken in this effort to implement natural gestures, they were still new ideas. However, participants learned the new system quickly (on the order of minutes) to achieve a moderate level of vehicle control and stated they felt more connected to the vehicle using this approach. Another conclusion is that the available gesture capture systems are highly accurate and capable of detecting a wide range of hand motions. These hand motions can subsequently be transformed into control commands. However, the human is not as precise. To make these systems more usable, data processing and control systems need to be implemented that smooth out the variations in human performance and thus stabilize the vehicle control. Also, being a new interface will require training environments to familiarize the user with the range of motion required since it is basically unlimited by any mechanical constraints. For example, a joystick is a mechanical-based system and it has motion limits. A hands-free gesture system is wide open and limited only by the ergonomic boundaries of the operator. Establishing proper training environments showed that this motion can be easily learned if natural, accepted, and intuitive movements are considered in the design. Further, the research has revealed other subtleties on motion that were not originally considered. For example, the original gesture concept was to simply rotate flat hands via a pitch and roll motion of the wrist. Observations showed the preferred neutral position of a hand was slightly offset and semi-rounded, making it more suitable for a virtual trackball concept. Testing on a larger scale is required to further investigate human performance and preferences for this control idea. The system developed in this research is now set up to conduct these larger-scale tests. As these new technologies and types of interfaces become more commonplace and more familiar to the user, research and development in these areas will expand and incorporation of new technology such as gesture recognition combined with various software solutions will become commonplace and accepted much like the current joystick technology and control interfaces of today.

REFERENCES A. Givens, M. E. Smith, H. Leong, L. McEvoy, S. Whitefield, R. Du & G. Rush, “Monitoring Working Memory Load During Computer-Based Tasks with EEG Pattern Recognition Methods,” Human Factors: The Journal of the Human Factors and Ergonomics Society, vol. 40, no. 1, pp. 79–91, 1998. A. H. Smeragliuilo, N. J. Hill, L. Disla & D. Putrino, “Validation of the Leap Motion Controller Using Markered Motion Capture Technology,” Journal of Biomechanics, vol. 49, no. 9, pp. 1742–1750, 2016. A. Sarkar R. K. Ganesh Ram, K. A. Patel & G. K. Capoor, “Gesture Control of Drone Using a Motion Controller,” in International Conference on Industrial Informatics and Computer Systems (CIICS), pp. 1–5, Sharjah, 2016. A. Scicali & H. Bischof, “Useability Study of Leap Motion Controller,” in Proceedings of the International Conference on Modeling, Simulation and Visualization Methods (MSV), Athens, Greece, 2015. Adeept. “Adeept,” [Online]. Available: https://www​.adeept​.com/. [Accessed 18 September 2019].

316

Human Factors in Simulation and Training

C. Balog, B. Terwillinger, D. Vincenzi & D. Ison, “Examining Human Factors Challenges of Sustainable Small Unmanned Aircras Systems (sUAS),” in Advances in Human Factors in Robots and Unmanned Systems. Vol 499 of the series Advances in Intelligent Systems and Computing, New York, NY, Springer International Publishing, 2016, pp. 61–73. C. Rorie & L. Fern, “UAS Measured Response: The Effect of GCS Control Model Interfaces on Pilot Ability to Comply with ATC Clearances,” in Proceedings of the Human Factors Ergonomics Society 58th Annual Meeting, 2014. D. R. Lampton, B. Knerr, B. R. Clark, G. Martin & D. A. Washburn, “ARI Research Note 2306-6 - Gesture Recognition System for Hand and Arm Signals,” United States Army Research Institute for Behavioral Sciences, Alexandria Va, 2002. F. Weichert, D. Bachmann, B. Rudak & D. Fissler, “Analysis of the Accuracy and Robustness of the Leap Motion Controller,” Sensors, vol. 13, no. 5, pp. 6380–6393, 2013. I. Staretu & C. Moldovan, “Leap Motion Device Used to Control a Real Anthropomorphic Device,” International Journal of Advanced Robotic Systems, vol. 13, no. 113, 2016. J. Guna, G. Jakus, M. Pogacnik, S. Tomazic & J. Sodnik, “An Analysis of the Precision and Reliability of the Leap Motion Sensor and Its Suitability for Static and Dynamic Tracking,” Sensors, vol. 14, no. 2, pp. 3702–3720, 2014. J. R. Cauchard, L. E. Jane, K. Y. Zhai & J. A. Landay, “Drone & me: An exploration into natural human-drone interaction,” in Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Osaka, Japan, 2015. Leap Motion. “Leap Motion,” [Online]. Available: https://www​.leapmotion​.com/. [Accessed 16 09 2019]. M. Chandarana, E. L. Meszaros, A. Trujillo & B. D. Allen, “Natural Language Based Multimodal Interface for UAV Mission Planning,” in Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting, Los Angeles, CA, 2017. M. S. Hamilton, P. Mead, M. Kozub & A. Field, “Gesture Recongition Model for Robotic Systems of Military Squad Commands,” in Interservice/Industry, Training, Simulation and Education Conference, Orlando, FL, 2016. P. Ravassard, A. Kees, B. Willers, D. Ho, D. Aharoni, J. Cushman, Z. M. Aghajan & M. R. Mehta, “Multisensory Control of Hippocampal Spatiotemporal Selectivity,” Science, vol. 340, no. 6138, pp. 1342–1346, 2013. S. Dodd, J. Lancaster, A. Miranda, S. Grothe, B. DeMers & B. Rogers, “Touch Screens on the Flight Deck: The Impact of Touch Target Size, Spacing, Touch Technology and Turbulence on Pilot Performance,” in Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Chicago, Ill, 2014. S. Zollman, C. Hoppe, T. Langlotz & G. Reitmayr, “FlyAR: Augmented Reality Supported Micro Aerial Vehicle Navigation,” IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 4, pp. 560–568, 2014.

12

The Influence of New Realities How Virtual, Augmented, and Mixed Reality Advance Training Methods in Aviation Graham King, Kendall Carmody, and John Deaton

CONTENTS Introduction............................................................................................................. 317 Virtual, Mixed, and Augmented Reality as We See It … or Don’t......................... 318 The Growth of Technology in Aviation Training.................................................... 319 Use of AR, MR, and VR in the Aviation Field........................................................ 320 The Shortage and the Future................................................................................... 322 Benefits, Drawbacks, and Resolutions.................................................................... 326 Conclusion.............................................................................................................. 327 References............................................................................................................... 328

INTRODUCTION As times change and technology grows, the expectations of humankind grow in tandem: businesses desire processes that are simpler and more efficient, the public requires goods and services that are more advanced and easier to obtain, and industries look to faster training methods. While these expectations become reality and the world more automated with each industrial revolution, it has been increasingly apparent that virtual, augmented, and mixed reality may be a significant factor in this present transformation. Despite their unfamiliarity to many, these new realities have been all around us, advancing industries since their introduction. The constant and overwhelming burgeoning of this immersive technology has displayed its attainable commercial capabilities in recent years, allowing many companies to jump at the opportunity to maximize their efficiency and effectiveness through the implementation of such devices. Several big names like Microsoft, Facebook, and VIVE have relentlessly engaged in the matter of developing this technologically advanced equipment DOI: 10.1201/9781003401353-12

317

318

Human Factors in Simulation and Training

for over a decade. With the current growth in aviation and shortage of pilots, maintenance personnel, and flight crew, the Federal Aviation Administration’s hesitation to recognize and implement virtual, mixed, and augmented reality for training in its regulations may soon be void. Recognized aviation schools have already begun to test out virtual reality headsets in their training. Maintenance, repair, and overhaul (MRO) aircraft mechanics have turned to augmented reality for training and operations in order to decrease the time of task completion. By simplifying movements, providing workers with a step-by-step guide as they perform tasks, and reducing the risk involved with rarely-before-seen repairs, these aircraft mechanics hope to use augmented and mixed reality to streamline MRO. American Airlines gave its new flight attendants a chance to practice and train using a virtual reality lab to reduce error and the cost of running a simulator for longer. Even branches of the military are using state-of-the-art virtual reality technology to train their classes of pilots. Virtual, mixed, and augmented reality products, although minor, have some drawbacks that affect user comfort and the way in which the technology can be used. However, researchers and programmers are already finding ways to counteract these drawbacks, and the potential for insurmountable paradigm within the field of aviation makes implementing these devices a necessary paragon.

VIRTUAL, MIXED, AND AUGMENTED REALITY AS WE SEE IT … OR DON’T Virtual reality (VR), typically the most recognizable of the three reality-changing technologies, is a system which eliminates the user’s current reality and transports them to a simulated world, using a medium, such as a headset, to stimulate their brain and perceptions with the created virtual elements. As for mixed and augmented reality, previous research has had difficulties in defining the terms, as their meaning is debated among scholars, engineers, and researchers (Yung & Khoo-Lattimore, 2017). However, a relatively simple version defined by Intel Corporation states that augmented reality overlays simulated/digital information on the real world, while mixed reality combines real world and simulated/digital elements, meaning one manipulates both the simulated elements and the real world together (Intel, 2019). Microsoft, viewed  as a leader in the mixed-reality network, has been heading up its efforts in the creation of the HoloLens, a mixed-reality headset designed to create easy solutions for business training by projecting a 3D model into real world elements (Foundry4, 2019). Both types of simulated reality (augmented (AR) and mixed (MR)) are becoming more popular in the gaming industry as well. Google’s spinoff company Niantic implemented AR into one of the most popular games on the market: Pokémon GO (Foundry4, 2019). Game players can download the app on their smartphones and spend countless hours walking around outside in the “real world” collecting various Pokémon and battling with trainers they find mixed into their surroundings. It is only a matter of time before children walking the streets equipped with augmented reality devices is a “norm.”

The Influence of New Realities

319

Altering what users see in real life does not come without risk, however. Many players of the aforementioned game Pokémon GO for example were involved in accidents attributed to distractions while playing. The Journal of the  American Medical Association conducted a study analyzing keywords in tweets and news articles from July 10 through 19, 2016 (Ayers et al., 2016). Taking a random sample of 4,000 tweets, they found that 33% indicated that a driver, passenger, or pedestrian was  distracted by Pokémon GO (Ayers et al., 2016). The study found that around 18% of the 4,000 tweets suggested a person was playing and driving (Ayers et al., 2016). In the study, they also found that there were 14 car crashes as a result of the game in news reports during the same time period (Ayers et al., 2016). VR sickness, disorientation, seizures, and a lack of direct transfer of learning can also be inherent problems people find when using VR, MR, and AR devices in excess; however, with all disadvantages considered, the ease of learning, cost effectiveness, decreased risk of harm in training, and faster training can make the use of this technology inherently beneficial. While the risks should be strongly analyzed, they should not deter the use or implementation of such devices; rather users should heed caution and use them in a proper manner: one that enhances training and is in accordance with the manufacturer’s suggestions.

THE GROWTH OF TECHNOLOGY IN AVIATION TRAINING Overall, high demand for these systems and the need to innovate within certain industries has led many companies to devote much of their time and resources to developing fully functioning augmented, mixed, and virtual reality systems, which are now constantly being incorporated into various forms of training, sometimes before they are even fully developed. The aviation industry in particular is always looking for new strategies to modernize and restructure its practices and procedures to maintain the safest skies for everyone in the most efficient way possible. For years airlines, flight schools, and various other training centers have used and will continue to use simulators, flight training devices (FTDs) and Aviation Training Devices (ATDs) in their airman certification training, testing, and checking tasks. As technology flourished, the Federal Aviation Administration (FAA) and Department of Transportation (DOT) adapted their regulations and standards to employ these nowcertified methods of training. In the FAA’s advisory circular AC 120-40B, they specifically state that “as technology progressed and the capabilities of flight simulation were recognized, FAR revisions were made to permit the increased use of simulators in approved training programs” (United States Department of Transportation, 1991, p. 2). On February 2, 1970 greater use of FAA-approved airplane simulators was permitted when training airline crews; to avoid congestion in the air, effective August 1st, 1996, the FAA issued a final rule that permits the use of flight simulators and flight training devices for most airman certification training, testing, and the checks themselves (Federal Aviation Administration, 1996;  Federal Register, 1996). The FAA document made these changes to be “consistent with a state-of-the-art training concept and recognizes industry recommendations for the expanded use of sophisticated flight simulation” (Federal Register, 1996, p. 1). They have also stated that

320

Human Factors in Simulation and Training

simulators provide more “in depth training than can be accomplished in airplanes and provide a very high transfer of learning and behavior from the simulator to the airplane” (United States Department of Transportation, 1991, p. 2). With just how rapid the growth of simulators in training became since the 1950’s when they were first introduced, some experts believe that as VR develops more it will have a similar if not faster track. The managing editor of FAA Safety Briefing, Tom Hoffman indicated in an article that VR is an up-and-coming area for use with the ATD and FSTD (flight simulation training device), especially with it boasting “broader visuals and 3-D imaging” (Federal Aviation Administration, 2017, p. 13). It has been shown off by trainers and researchers at several conferences such as NTAS and as Hoffman mentions big events like FlightSimCon (Federal Aviation Administration, 2017). Sooner rather than later it is likely aviation regulations regarding VR, AR, and MR will be implemented as the FAA becomes more confident in the technology and its capabilities because of its lower cost and real-world feel.

USE OF AR, MR, AND VR IN THE AVIATION FIELD As virtual reality headsets become more readily available for industry and general purposes alike, research on their effects has grown. There is an extensive assortment of research across many industries with respect to VR and performance. Previous research focuses mainly on utilizing VR to enhance training performance. Such cutting-edge technology is being swiftly implemented into areas of training and education for the aviation industry. Understanding the benefits and limitations of these training methods is crucial as the technology becomes more mainstream. Utilizing VR devices for enhancing visual inspection training is a prominent area of research for the aviation industry and one that has yielded some notable results. A study focusing on this subject was conducted in 2002 by Vora et al. for the Department of Industrial Engineering at Clemson University. The aim of this study was to analyze and compare a subject’s perceptions of two differing systems with respect to an actual aft-cargo bay environment to identify which system had the most support in future  training. The systems included: ASSIST: a computerbased training program; and VR: an SGI  Onyx2 VR system (Vora et al., 2002). A population of graduate and undergraduate students at Clemson University was utilized for this study, with 14 subjects who were randomly selected (Vora et al., 2002). All subjects used both systems, with the treatment factor being the type used. To cancel out any order effects, the order in which the subjects performed the task was counterbalanced (Vora et al., 2002). An immersive tendency questionnaire as well as a presence questionnaire were administered, respectively (Vora et al., 2002). The results indicated that the VR system was better for the inspection task and was preferred over the PC-based aircraft inspection simulator (Vora et al., 2002). Areas of the aviation industry are beginning to take this advanced technology more seriously and conduct increased research on the impact of implementing it into everyday training, especially with respect to maintenance and visual inspection. While research with VR, AR, and MR continues to be conducted and methods improved upon, some organizations and training hubs are beginning to give

The Influence of New Realities

321

the current technology a trial run. Particularly at aviation-related universities, the transition to VR and MR devices at these schools with flight programs, such as the University of North Dakota (UND), has become necessary. Following suit of commercial flight training devices, UND’s John D. Odegard School of Aerospace Science recently incorporated the use of VR head-mounted devices (HMDs) in its all-new VR lab (Weirauch, 2020). Using the HTC VIVE VR headset, and the yoke, throttle, and rudder pedal controls, students new to aviation can begin to develop situational awareness and basic piloting skills at the stations (Murphy, 2020). Far from being imaginary or fake, the simulator bay makes use of photorealistic images with a 360-degree full-cockpit view of one of their Piper Archer single-engine aircraft (Murphy, 2020). For the school, the lab not only serves as an extra tool for training when the simulators are being used for checks and more advanced training, but because of the fun reputation and ease of use VR headsets have, it helps to draw in students that may have an interest in becoming a commercial pilot, especially since each VR station is on wheels and can easily be transported around the school. Schools like the University of North Dakota want to train their students in the most thorough yet cost-effective way possible, while upholding their reputation as premier schools for future aviators. While higher-end VR headsets like the new HTC VIVE Cosmos Elite can cost around $900 in total, FRASCA flight training devices, which are some of the most common simulators used in training, start around $200,000 per unit before any upgrades (VIVE, 2021; Frasca Flight Simulation, 2021). Adding repair costs and labor on top of that, the cost of even the cheapest simulators can add up quickly. As VR develops and becomes more trusted by the FAA and in school programs for flight training, it could potentially save thousands of dollars by requiring less actual time in the aircraft for pilots-in-training, and less use of costly simulators. Associate Dean of Aerospace Sciences at the university, Beth Bjerke, indicated that UND plans to partner with other universities and organizations to gather data and build a case to present to the FAA on the legitimacy of virtual reality in flight training (Murphy, 2020). MR and AR have been an exciting advantage in the aviation industry, making flight simulators and other sources of training all the more realistic. Many flight schools and universities have implemented some sort of AR or MR as a training medium. One of the many reasons it has proven so useful is due to the spatial memorability it allows students, trainees, and employees to perceive; that is: AR/VR/MR all affect long term memory through enhancing memory recall (Macchiarella et al., 2005). The aspect of muscle memory (e.g., head movements during a task) is a large factor that leads to this enhanced memory recall (Macchiarella et al., 2005).  Much like driving to the same destination repeatedly becomes so engrained in our brains that on occasion we may forget driving there, repetitions of movement producing a positive end result when using these devices may also become encoded and automatically processed. When these head mounted devices bring the brain to a different or enhanced reality it can keep the focus on that task and eliminate distractions that may detract from learning and encoding. Not only are these devices being used in the civil side of the aviation industry, but the military has been employing virtual reality in much of its training methods to speed up the time of training. Through their Pilot Training Next (PTN) program,

322

Human Factors in Simulation and Training

student pilots working toward their wings in the United States Air Force could put on the HTC VIVE Pro headset and enter the cockpit of their T-6 trainer, or any aircraft, with little to no effort, hearing the real-time sounds of the engine and noises from the cockpit along with the sound of their instructor’s voice like in the real aircraft (Losey, 2018). Instructors can start the student midair to not waste time so they can practice loops, barrel rolls, and any other maneuvers that were necessary for training that day, without having to use up the resources of the plane and take on the danger of a real military jet (Losey, 2018). Using biometrics and artificial intelligence, the instructors can monitor their students, checking pulse rate, heartbeat, blood pressure, and stress levels to see how engaged they are and if it is too stressful and adjust the lessons accordingly (Losey, 2018). Compared to the legacy simulators alone which use a stagnant screen, using this technology is an immersive experience, similar to an IMAX movie, where the virtual environment is all around engulfing the student and tricking their brain into thinking they are actually in the cockpit (Losey, 2018). When they are dismissed back to their dorms for the day, students can practice or fine tune their skills using their own setup that they share with a roommate so they can progress faster (Losey, 2018). The time saved is exactly what they are looking for and is a huge reason to use this kind of technology. It usually takes a year to graduate the pilot training program, but students of PTN were able to graduate in just four months after starting (Losey, 2018). Another huge reason the Air Force developed this program is the cost. Legacy simulators for class and training can cost $4.5 million each, while the headsets run for less than $1,000 (Losey, 2018). Added with the controls and other equipment, the VR headset and devices would cost around $15,000 (Losey, 2018). As a result of the low cost and the little surface area involved, the Air Force could make 20 simulators (Losey, 2018). Since there are so many students these devices help occupy their time rather than wait for the one legacy simulator or until the weather is good and the aircraft the student needs is occupied.

THE SHORTAGE AND THE FUTURE Boeing forecasts that “763,000 new civil aviation pilots, 739,000 new maintenance technicians and 903,000 new cabin crew members will be needed to fly and maintain the global fleet over the next 20 years,” and although the COVID-19 pandemic seems to have decreased commercial air travel dramatically, the industry plans to recover. The number of outgoing flights will soon rise, and a large population of the workforce is predicted to reach mandatory retirement age simultaneously (Boeing, 2020, p. 2). However, because of this decrease in commercial travel and cuts in their budget, airlines have had to furlough unessential maintenance, cabin crew workers, and pilots despite having the need to maintain aircraft in storage. Because of this aircraft MRO shortage, that was already in effect, using the minimum amount of time to complete training and keeping costs low will be crucial to keep the industry stable. VR has shown repeatedly that because students can learn independently and on their own time they can finish at their own pace and push themselves to get done while feeling fully comfortable. However, in VR there is a disconnect between a

The Influence of New Realities

323

user’s current reality and their simulated one, which can make interactions and using surroundings tricky. Because of the disconnect from reality that VR creates, certain tasks are better suited for AR or MR. In AR, the link between the real world and the simulated medium exists because it is simply a projection of symbols, words, and computer-aided design elements onto what the user sees presently (Ceruti et al., 2019). Having a virtual aide to help students or maintenance personnel is crucial for properly handling demanding and complex maintenance tasks where high risk is involved. An article in the Journal of Computational Design and Engineering thoroughly discusses the use of AR in aeronautical maintenance in an industry 4.0 context with that being the fourth industrial revolution (Ceruti et al., 2019). The integration of the next revolution encompasses smart factories, where data are shared through connected machines and devices in the Internet of Things (IoT), completely autonomous systems, without the need for human intervention on complex issues, and machine learning (Ceruti et al., 2019). AR systems could be as simple as seethrough glasses with a camera and small projector or even a smartphone or tablet where the camera would be used to capture the external environment and the screen is the output (Ceruti et al., 2019). In both situations when the user moves the camera, the symbols or virtual projections would not change in reference to the external world, but they would move with it or stay in place according to where they were calibrated; in other words, they change position according to the video output to align with the surroundings (Ceruti et al., 2019). In a study on wearable technology, 15 mechanics employed by GE Aviation were asked to perform a complex maintenance task and a simpler maintenance task, twice: once in the traditional manner with paper manual instructions and using normal tools and the second, using smart glasses and a Wi-Fi enabled torque wrench (Robertson et al., 2018). Over the course of three weeks, one maintenance professional each day would perform tasks, working on the Variable Geometry Actuator (VGA) as the simpler component and a Main Fuel Pump on the CF34-8C engine as the complex component (Robertson et al., 2018). The team chose to use Google Glass and were given a WiFi-enabled Atlas-Copco Saltus MWR85TA torque wrench to be able to connect with the technology and sense their movements during the second portion of the study (Robertson et al., 2018). At the conclusion of the study, it was found that the wearable technology with augmented reality-like components reduced the completion time for both tasks by 7.7% for the VGA and 11.6% on the main fuel pump (Robertson et al., 2018). In the AR simulation, the participants were able to complete the tasks in a quicker manner without constantly going back and forth, sometimes up and down a ladder, from where they were working to the manuals because with the glasses all the necessary information was overlaid. Being able to stay in one place saved a lot of time and many stated that once they are more familiar with the wearable technology, certain tasks may get completed even faster because they will know how to properly use it (Robertson et al., 2018). For the survey portion of the study, it was found that 60% of the participants preferred to use the wearable technology over the more traditional method (Robertson et al., 2018).

324

Human Factors in Simulation and Training

Another study with collaborative efforts by a design engineer at Boeing and Iowa State University tested individuals who used three different methods to assemble the wings of an aircraft, a challenging process requiring over 50 steps with nearly 30 parts to assemble (Augmented Reality for Enterprise Alliance, 2015). The groups were composed of a stationary desktop computer with a work instruction pdf file, a mobile tablet with the work instruction pdf file, and a mobile tablet equipped with the AR software that could show guided steps for task completion using graphical overlays (Augmented Reality for Enterprise Alliance, 2015). The study found that when the users had the tablets in the AR mode, on average, they made zero errors (Augmented Reality for Enterprise Alliance, 2015).  The group using AR also completed the tasks faster the first time than the other methods (Augmented Reality for Enterprise Alliance, 2015). The results also indicated that using the AR helped reduce the time of completing the wing by about 30% and gave a 90% increase in the quality of the first build of the AR modes in comparison with using the desktop (Augmented Reality for Enterprise Alliance, 2015). When it comes to aircraft and the safety of pilots and passengers, maintenance repair and overhaul is of utmost importance. The FAA and other governing bodies recognize this fact and that is why mechanics must go through an arduous process of certification and obtain years of experience and training. As technology grows and aircraft systems become more electronic and complex, maintenance has become more specialized, and training of maintenance personnel must be technologically adapted and regulated to avoid accidents like that of the MCAS technology on the Boeing 737 Max 8. By incorporating AR into manuals for the aircraft, which can often include hundreds of parts, the operator now has the capabilities to see the position of the part he or she is looking for on the actual aircraft itself (Ceruti et al., 2019). One of the main challenges facing the implementation of this technology is time: time to input the manuals and add the animations and projections; however, if that is done beforehand in another location by the aircraft manufacturer and added to the device to be used when needed, it could be a simple solution (Ceruti et al., 2019). On the training side of things, it can make complex maintenance tasks and situations much easier by not having to find parts that facilities may not have or that are hard to come by. Using augmented reality can also reduce the time cost and resource waste of printing a new manual every time a new maintenance update comes out. It also allows for more than one aviation professional to utilize the manual if there is a limited amount. AR permits the employee, trainee, or mechanic to access the required information on his or her device with  just a few taps. The development and employment of AR in an industry context has had a long journey, first being developed in 1992, and has reached a level of maturity where it could satisfactorily be made available to various factories and aviation facets, such as aircraft maintenance (Ceruti et al., 2019). Unfortunately, just like with virtual reality flight training at aviation associated universities and the airlines, there are no current legislation and certification processes in place; the lack of legislation surrounding these devices limits the widespread application of AR technology. However, if the market continues to employ VR, AR, and MR capabilities at the current rapid rate, governing bodies can be

The Influence of New Realities

325

pushed to develop proper rules and this technology could be in place all around us within the next couple of years (Ceruti et al., 2019). Not just in maintenance, but a lot of creators of VR and AR programs and trainers in different sectors of aviation are taking this to heart as they work toward a more self-directed training method, meaning that the student can work on the parts that they do not feel comfortable with or have knowledge on and skip through the parts they already feel competent on. This way training of their employees can be faster and more streamlined to help with the shortage. In a presentation given at the World Aviation Training Summit in 2019, instructor Roger Lowe of American Airlines and David Jones, President and CEO of Quantified Design Solutions, joined forces to conduct a case study in late 2017–March of 2018, to maximize the efficiency and level of training for the cabin crew of American Airlines using VR. American Airlines wanted a smoother way of teaching the aircraft so the training can be processed and absorbed in the best way, and a trainer that accurately represented their fleet, so they had Quantified Design develop a twelve-room VR training lab to introduce the crew to new aircraft and train before their certified check out (Jones & Lowe, 2019). After a 20-minute familiarity briefing each Sunday before the start of each week, trainees or new hires who wanted training on one of American Airlines’ type of aircraft: A321, 777, 787, a quick refresher before a work trip, or a brush up prior to an evaluation can use one of the rooms to practice door operations, their knowledge of the location of emergency equipment and the various preflight check that are required by the FAA (Jones & Lowe, 2019). Many VR devices and programs can track performance and give instructors statistics on what the students may need to work on and what makes sense or comes naturally to them, and the VR trainer used in this study is a prime example of that. Monitored by a staff member in another room as they work during the week, students’ performance in the training sections they do are recorded under their accounts so they can keep moving forward and progress as they return (Jones & Lowe, 2019). This adaptive measure not only helps the individual hone in on what they might need to work on and helps them work on more advanced procedures when they are ready, but it also helps American Airlines’ training development determine errors made by many students and improve focus on problem areas before they get to further VR training and the sims. Of the 50 students in the February class, the high self-efficacy scores for opening an A321 door rose from 20% of students (before the virtual reality training was introduced) to 68% (post training), which means students felt more confident and comfortable after the addition of the training (Jones & Lowe, 2019). Adding VR dropped the required amount of use for simulators and thus the operating costs because the number of students required to repeat evaluations on a physical simulator went from 25% to 2% from January to March (Jones & Lowe, 2019); the performance in students also drastically improved, moving the percentage of error free qualifying events from 34% to 82% during the January to March time frame (Jones & Lowe, 2019). Considering the cost of running a door trainer, to include the instruction time, maintenance, and operation of the simulator, to get upwards of $51 an hour and multiplying it by all the repeats, the error evaluations, the debriefs required, and the time it takes to complete all of these, American Airlines and any other airline who may decide to make use of

326

Human Factors in Simulation and Training

this could potentially save $207,809 for the three-month new hire training (Jones & Lowe, 2019). When you multiply this by three for nine months of new hire training, the savings could be upwards of $623,429, not to mention the cost of a flight attendant accidentally deploying a slide or forgetting material when just using a simulator (Jones & Lowe, 2019).

BENEFITS, DRAWBACKS, AND RESOLUTIONS In this ever-changing world, the expansion of technology has pushed for more modern and relatable ways of learning: learning for not only students but educators, mentors, parents, and military personnel alike who are tasked with teaching the next generation using tools they may be unfamiliar with. For example, in recent times amidst the Covid-19 pandemic, where in-person classes and gatherings were  not possible, many universities were forced to train their students and staff to use Zoom or other programs, which allow video communications between students and their teachers, video chatting, screen sharing, and collaborative learning; even job interview have been conducted by use of applications such as Zoom. Using technology as an advantage has become paramount for successful teaching and learning so students can still attend school and learn no matter what the circumstances. Advances in available resources have allowed classrooms to shift in a way past generation would never have dreamed of: going from chalkboards to tablets and computers, from textbooks to ebooks and podcasts, from analog projectors to state-of-the-art ceiling-mounted projectors incorporating Apple TVs, from speech class and speech therapy to VR-based exposure therapy. Virtual reality has many important advantages that enhance the learning experience and make it easier, especially in the aviation field. It can be extremely interactive as users work through the given program and can practice the skills they are taught in a stress-free manner; any mistakes that are made can quickly be reset with the press of a button and students can learn what they did wrong and fix it. By being able to practice what they learned repetitiously and in a quick manner without having to find new resources, users can build the confidence they need to feel proficient. While this can be extremely beneficial, one of the main concerns aviation experts have on implementing this technology is the issue of convincing the brain that they are in a real-life flight situation when the student subconsciously knows he or she can just tap the stop or pause button, get the simulator to reset, and have another crack at it if things go badly; there is no fear factor or pressure to get it right if there is no real feel (Ellis, 2019). NASA’s solution to this issue of not having the completely real element to it is their “Fused Reality” technology (Conner, 2015). It combines computer-generated scenes and environments with real-world video by use of a flight helmet with a special optical system that can overlay the graphics of another plane, airfield, or potentially dangerous situation to the outside camera view (Conner, 2015). This technique allows for an extremely immersive way of training to prepare pilots for all situations in a safe manner. Challenging tasks such as aerial refueling, formation flying, or highspeed landings can be practiced without the huge risk, but because the training is done in the aircraft with the feel and sound of being in the air, it is as real as it gets.

The Influence of New Realities

327

The pilots are essentially able to take the simulator in the air with them, seeing real elements, such as the mountains and clouds, but having the simulation right there with them as well to practice the harder stuff (Conner, 2015). In the aforementioned American Airlines study, the virtual reality lab’s training system gives many advantages, however, it can still only be used as a supplement for the simulators and training on the aircraft itself. Using virtual reality methods is not yet certified by the FAA, so students and trainees still have to get qualified and take examinations on a physical simulator, and the transfer of training from the virtual environment to a simulator can be egregiously different. Designers can make the simulated programs as sound and smooth as physically possible to instill good procedures, but subconsciously the student will not have the stress of a real situation and they may not receive the same tactile and auditory feedback or stimulation as onboard the aircraft. Many health hazards exist with using VR, MR, and AR as well. Prolonged exposure to the virtual reality world can also cause extreme disorientation; the user may experience dizziness, nausea, or headaches (Chang et al., 2020). A study examining virtual reality sickness found that among the 858 survey participants involved, “48.6–52.8 % of participants had experienced VR sickness regardless of age” (Lim et al., 2021, Section 3.1). In addition to the survey, when looking at brain waves, the study found that “across two repeated tests, all waves showed significant changes when EEG baseline and EEG sickness were compared” (Lim et al., 2021, Section 3.2). When being exposed to virtual reality many of the participants experienced sickness and hit the VR sickness button, and although many designers create elements and safeguards including using higher fidelity technology, it is still a great possibility that users can get sick especially from long-term exposure in training where their brain is already being worked and used. Although extreme and unusual, some users (about 1 in 4,000) will experience dizziness, seizures, or blackouts as a result of VR among other things (Lim et al., 2021). These are mysterious and disconcerting effects that can happen, especially if the technology is not used properly; however, for the majority of people this is not a concern if use is limited and conducted in an appropriate manner. The guidebooks to VR devices like the Oculus Rift even suggest giving a break in the VR experience or reminders to the user as to how long they have been using it (Chang et al., 2020).

CONCLUSION The immeasurable growth in technology gives mankind the opportunity to use an extensive array of tools to make life easier and more enjoyable in their personal life, such as a Rumba for cleaning floors, an Oculus Rift to play virtual reality games on, a smart TV that users can speak to, to watch what they want, or smart lights that turn off simply by asking Alexa, and for an industry as a whole. The use of VR and AR in the medical field, for example, has allowed a ton of growth and explored new horizons for what can be accomplished: dentists can use AR projections of patient data to help them build more calculated caps or crowns, nurses can work through many examples of different patients using an AR equipped tablet, and VA-ST SmartSpecs

328

Human Factors in Simulation and Training

can allow the legally blind or critically visually impaired recognize faces, see their environment, and find lost items (Alliance of Advanced BioMedical Engineering, 2017). Many enthusiasts and experts hope that aviation will be the next industry influenced and expanded upon as a result of this revolutionary technology; for this to be possible, the FAA, ICAO, and government bodies must adapt their regulations to accommodate, recognize, and implement these tools as acceptable aids and training methods. As technology is enhanced even more, improvements are made, and there are more use cases. AR, MR, and VR will become a huge part of the learning environment in the next several years.

REFERENCES Alliance of Advanced BioMedical Engineering. (2017). Augmented Reality to Revolutionize the Health Care. The Alliance of Advanced BioMedical Engineering; Frost & Sullivan. https://aabme​.asme​.org​/posts​/novel​-augmented​-reality​-technology​-to​-revolutionize​ -the​-health​-care​-industry Augmented Reality for Enterprise Alliance. (2015, August 20). Augmented Reality Can Increase Productivity. AREA. https://thearea​.org​/augmented​-reality​-can​-increase​ -productivity/ Ayers, J. W., Leas, E. C., Dredze, M., Allem, J.-P., Grabowski, J. G., & Hill, L. (2016). Pokémon GO—A New Distraction for Drivers and Pedestrians. JAMA Internal Medicine, 176(12), 1865. https://doi​.org​/10​.1001​/jamainternmed​.2016​.6274 Boeing. (2020). Pilot and Technician Outlook. Boeing. https://www​.boeing​.com ​/resources​/ boeingdotcom ​/market ​/assets​/downloads​/2020​_ PTO​_ PDF​_Download​.pdf Ceruti, A., Marzocca, P., Liverani, A., & Bil, C. (2019). Maintenance in aeronautics in an Industry 4.0 context: The role of Augmented Reality and Additive Manufacturing. Journal of Computational Design and Engineering, 6(4), 516–526. https://doi​.org​/10​ .1016​/j​.jcde​.2019​.02​.001 Chang, E., Kim, H. T., & Yoo, B. (2020). Virtual Reality Sickness: A Review of Causes and Measurements. International Journal of Human–Computer Interaction, 36(17), 1658–1682. https://doi​.org​/10​.1080​/10447318​.2020​.1778351 Conner, M. (2015, September 29). Fused Reality. NASA. https://www​.nasa​.gov​/centers​/ armstrong​/features​/fused​_ reality​.html Ellis, C. (2019, October 9). Are VR flight simulators the future of pilot training? | Air Charter Service. Aircharterservice​.co​m. https://www​.aircharterservice​.com ​/about​-us​/news​ -features​/ blog​/are​-vr​-flight​-simulators​-the​-future​-of​-pilot​-training Federal Aviation Administration. (1996). FAA Historical Chronology. https://www​.faa​.gov​/ about​/ history​/chronolog​_ history​/media​/ b​-chron​.pdf Federal Aviation Administration. (2017). Sim City. Federal Aviation Administration Safety Briefing. https://www​.faa​.gov​/news​/safety​_briefing​/2017​/media​/novdec2017​.pdf Federal Register (July, 1996). Aircraft flight simulator use in pilot training, testing, and checking at training centers; Final rule. Federal Register, 61(128), 34508–34568. Foundry4. (2019, February 19). 7 Augmented Reality Companies to Watch. Foundry4​.co​m. https://foundry4​.com ​/7​-augmented​-reality​-companies​-to​-watch Frasca Flight Simulation. (2021, February 20). How Much Does a Frasca Simulator Cost? Frasca Flight Simulation. https://www​.frasca​.com ​/ how​-much​-does​-a​-frasca​-simulator​ -cost/ intel. (2019). Virtual Reality Vs. Augmented Reality Vs. Mixed Reality. intel. https://www​ .intel​.com ​/content ​/www​/us​/en ​/tech​-tips​-and​-tricks​/virtual​-reality​-vs​-augmented​-reality​.html

The Influence of New Realities

329

Jones, D., & Lowe, R. (2019). Maximizing Virtual Reality Cabin Crew Training: A Case Study. WATS 2019. Quantified Design Solutions and American Airlines Presentation https://www​.wats​-event​.com ​/wp​-content​/uploads​/2019​/05​/Jones​_ Lowe​.pdf Lim, H. K., Ji, K., Woo, Y. S., Han, D., Lee, D.-H., Nam, S. G., & Jang, K.-M. (2021). Test-Retest Reliability of the Virtual Reality Sickness Evaluation Using Electroencephalography (EEG). Neuroscience Letters, 743, 135589. ScienceDirect. https://doi​.org​/10​.1016​/j​ .neulet​.2020​.135589 Losey, S. (2018, September 30). The Air Force Is Revolutionizing the Way Airmen Learn to Be Aviators. Air Force Times. https://www​.airforcetimes​.com ​/news​/your​-air​-force​ /2018​/09​/30​/the​-air​-force​-is​-revolutionizing​-the​-way​-airmen​-learn​-to​-be​-aviators/ Macchiarella, N., Liu, D., & Gangadharan, S. (2005) Augmented Reality as a Training Medium for Aviation/Aerospace Application. Murphy, C. (2020, January 30). Simulators on Campus: UND Aerospace Launches VR FlightTrainer. UND Today. Robertson, T., Bischof, J., Geyman, M., & Lise, E. (2018). Reducing Maintenance Error with Wearable Technology. 2018 Annual Reliability and Maintainability Symposium (RAMS). pp. 1–6. https://doi​.org​/10​.1109​/ram​.2018​.8463068 U.S Department of Transportation. (1991). Advisory Circular 120-40B. Federal Aviation Administration. https://www​.faa​.gov​/documentLibrary​/media ​/Advisory​_Circular​/120​ -40B1​.pdf VIVE. (2021, March 11). Find the right high-end VR system for you | VIVE United States. http://www​.vive​.com ​/us​/product/ Vora, J., Nair, S., Gramopadhye, A. K., Duchowski, A. T., Melloy, B. J., & Kanki, B. (2002). Using Virtual Reality Technology for Aircraft Visual Inspection Training: Presence and Comparison Studies. Applied Ergonomics, 33(6), 559–570. https://doi​.org​/10​.1016​ /s0003​- 6870(02)00039-x Weirauch, C. (2020, April 27). Enhanced Pilot Training Via VR. The Journal for Civil Aviation Training, 31(2). Halldale Group. Yung, R., & Khoo-Lattimore, C. (2017). New Realities: A Systematic Literature Review on Virtual Reality and Augmented Reality in Tourism Research. Current Issues in Tourism, 22(17), 2056–2081. https://doi​.org​/10​.1080​/13683500​.2017​.1417359

13

Training, Stress, Time Pressure, and Surprise An Accident Case Study Julianne M. Fox and Mustapha Mouloua

CONTENTS Introduction and Background................................................................................. 331 Training and System Design................................................................................... 333 An Accident Case Study: Colgan Air Flight 3407.................................................. 333 Why Didn’t Previous Training Prevent This Accident?.......................................... 337 Opportunities for Better Outcomes......................................................................... 338 Training: A Countermeasure for Responding to Startle and Surprise........... 338 Automation Design: Keeping the Human in the Loop Where Feasible........ 339 Highlighting the Lessons Learned.......................................................................... 341 References............................................................................................................... 342

INTRODUCTION AND BACKGROUND In all activities in which we engage, there is risk and flying in an aircraft is not an exception. However, perhaps contrary to public perception, flying an aircraft is safer than everyday activities in which many of us choose to regularly engage, such as driving a car or traveling by rail, bus, ferryboat, or train – just to name a few (Savage, 2013). The aviation system, by design, is incredibly robust and over the years, safety in this sector has continued to improve (Savage, 2013). Catastrophic accidents have become more and more rare. However, challenges in the system still exist. Over the last couple decades, advances in automation technologies have driven improvements to the aviation system through: enabling a decrease in the physical workload required to operate aircraft, facilitating all-weather flying, driving greater fuel efficiency, system reliability, and an increase to flight safety (Wiener, 1988; Mouloua et al., 1997, 2010). However, such advances have also come at a cost as flight operations have become more automated which has resulted in human factors challenges for the flight crew such as a deterioration of situation awareness (Endsley, 1999; Kaber & Endsley, 1997), automation-induced complacency (Wiener, 1977; Parasuraman et al., 1993; Mouloua et al., 1993a, 2010), increased mental workload (Endsley, 1999), and these unintended consequences of its use have resulted in the loss of aircraft control accidents following unexpected transitions to manual control (Parasuraman DOI: 10.1201/9781003401353-13

331

332

Human Factors in Simulation and Training

et al., 1992; Mouloua et al., 2010). These challenges are further impacted by factors such as the effects of interruption and distraction (Ferraro et al., 2017, 2018; Stader, 2014; Dismukes et al., 1998; Latorella, 1996; Dismukes, 2006), automation reliability (Wickens & Dixon, 2007; Oakley et al., 2003; Ferraro et al., 2017, 2018), and its impact upon operator trust in automation (Parasuraman, 1987; Parasuraman & Riley, 1997) which ultimately affects its use. Despite the continuous improvements in the areas of system design and reliability, information availability, enhanced and evolving procedures and training, and the ability to practice in simulated environments, approximately 60–80% of accidents continue to stem from human error – including aviation accidents (Lautman & Gallimore, 1987; Rasmussen et al., 1994). To offset this known potential inherent fallible link in the system (i.e., human error), part of a flight crew’s training is designed to present flight crews with unexpected challenging scenarios which will require peak human performance to successfully respond. The problem is that human perception and performance can be significantly negatively impacted when expectations are violated (Martin et al., 2016; Dewar & Olson, 2007; Olson & Farber, 2003; Hole & Tyrell, 1995; Foyle et al., 2002) and even more so when stress and time pressure exists or is perceived to exist (Landman et al., 2017a; Casner et al., 2013; Beringer & Harris, 1999; Wickens et al., 1993; Sheridan, 1981; Easterbrook, 1959). The best-known countermeasure is training and where and when possible, to a level of automaticity (Staal, 2004; Driskell et al., 1986; Holt & Rainey, 2002; Driskell et al., 1992; Dismukes, 2007). Because it is impossible to train for all possible scenarios, it is crucial to develop specific simulated training exercises. Thus, it is recommended that flight crews be exposed to a wide array of practical training and simulation exercises that can target pilot skills with regard to unpredictable events, unforeseen situations, and emergencies. These scenarios will include framing mismatches following surprising and/or startling events (Landman et al., 2017a; Rankin et al., 2016; Kochan, 2005). It has long been established that the accumulation of knowledge and skill through practice and experience serves to offset pilot performance decrements in response to such surprising and/or startling events (Landman et al., 2017a). And flight simulators have long since enabled training organizations to expose flight crews to such scenarios in a realistic, but safe environment. In fact, as a result of a growing number of loss of control accidents with a startle/surprise element, regulatory agencies are now recommending that scenarios incorporating startle and surprise be included in training programs (Federal Aviation Administration, 2015; European Aviation Safety Agency, 2015; International Civil Aviation Organisation, 2013). However, the key challenge continues to be to capture the right scenarios for the training syllabus (both initial and recurrent training) and to replicate the surprise and/or startle element that would likely exist outside of the simulator if such a situation were to unfold (Driskell et al., 1992; Gainer & Sullivan, 1976). When an experienced flight crew is presented with a scenario that is within the same context and with the same attributes which they were exposed to in previous training, the likelihood of an effective response is higher than when the context and/or attributes are seen for a first time (i.e., a novel presentation of the scenario such as likely occurred during the accident

Training, Stress, Time Pressure, and Surprise

333

discussed in this case study). We attribute it to the way in which expertise manifests itself over time. Expert responses can be either helped or hindered in accuracy and response time by an expert’s usage of pattern recognition and intuition in formulating their response as compared to novices. The responses of experts, when accurate, can be faster and more effective, but when inaccurate can be strong, but wrong (Reason, 1990) which is evidenced by an absence of the cognitive flexibility required to be able to shift scripts when a response is not achieving the desired outcome. Many factors such as experience, expertise, attention, motivation, heuristics, framing, and biases are likely to influence the entire process related to a response to surprise (Mauro et al., 2001; Kochan, 2005). As a result, it is recommended that a portion of flight crew training needs to focus upon the training of reframing skills in response to framing mismatches following scenarios where surprising and/or startling events ensue (Landman et al., 2018; Landman et al., 2017a, 2017b, 2017c; Rankin et al., 2016; Casner et al., 2013).

TRAINING AND SYSTEM DESIGN As a part of the design and certification process of an aircraft, the consequences of function loss are identified, and based upon the consequence level (Advisory Circular 25.1309-1A, 1988), the systems probability of allowable failure is determined. For functions whose failure will have less of an impact on a given flight, a higher probability of failure is permissible. But for those where a loss of function is likely to result in a catastrophic aircraft accident (i.e., a hull loss), a more robust and/or redundant system design is mandated – one that by design (statistically) should never fail during the life of an aircraft. For those failures that by design can occur, unexpectedly and at a point in the flight where a near-immediate and accurate response is required with the potential to result in surprise and or startle for the flight crew, repetitive training has historically been provided as a countermeasure (e.g., engine failure just prior to taking off – V1 engine cuts and wing stall recovery). However, exposing flight crews to all possible perturbations (i.e., with same attributes and within the same context) for each of these failures is not realistic. According to the regulations governing transport category aircraft certification, 14 Code of Federal Regulations (CFR) 25.1309 (2007), each system must be designed to meet its intended function and in the event of a system failure, the flight crew must receive a warning alerting them to the unsafe system operating condition. According to CFR 25.1302 (2013), the operational behavior of the system must be predictable, unambiguous, and designed to enable the flight crew to intervene in a manner appropriate to the task. When failures occur, the system must enable the flight crew to take corrective action and as a part of the certification process, the potential for the failure to go unnoticed must be taken into account.

AN ACCIDENT CASE STUDY: COLGAN AIR FLIGHT 3407 To further the understanding of these challenges, we will present a case study of an aviation accident that occurred in Buffalo, New York, on February 12, 2009. In

334

Human Factors in Simulation and Training

this chapter, we focus upon how on a particular day, a flight crew’s perception and performance was impacted by system design, automation use, distraction, loss of situational awareness, weather, and the flight crew’s training. Additionally, we will explore why the repetitive prior training did not serve as an adequate countermeasure on this day. We will describe the events that led to this accident and discuss the key human factors and system design challenges which contributed to this outcome. Finally, we will propose some guidelines for the design of system interfaces and training and highlight the lessons learned from the presented case study. This case study and the guidelines presented are applicable not only to the aviation domain but also to any domain where similar challenges exist (e.g., autonomous vehicles, spacecraft, maritime vessels, etc.). On Thursday, February 12, 2009, at approximately 10:17 pm, a Colgan Air Inc., Bombardier Dash 8-Q400 (N200WQ) flight designated as Continental Express Flight 3407, crashed while on approach into Buffalo Niagara International Airport. Four flight crew including the captain and first officer and 45 passengers were fatally injured and the aircraft was destroyed. One person on the ground was fatally injured as well. The flight was a Title 14 Code of Federal Regulations (CFR), Part 121 scheduled passenger flight (i.e., a transport category flight flown by a Continental Airlines commuter airline) from Newark Liberty International Airport (Newark) to Buffalo Niagara International Airport (Buffalo) and occurred during night instrument meteorological conditions (IMC). During the approach into Buffalo, the aircraft flew primarily in the clouds (i.e., IMC) and in and out of icing conditions. Night IMC prevented the flight crew from being able to visually see the horizon outside of the aircraft requiring them to monitor instrumentation in the aircraft for this information. Having a visible horizon would have enabled them to more easily maintain a sense of their geographical orientation during their approach into a major metropolitan area by merely looking outside the aircraft. Simply put, flying in IMC is a higher workload task especially in the area of maintaining situational awareness as compared to when flying in visual conditions (Rousseau et al., 2004; Endsley, 2000; Klein, 2000; Shebilske et al., 2000; Endsley, 1999; Durso & Alexander, 2010; Wickens, 2002; Kaber & Endsley, 1997). And in accordance with this flight crews training and procedures, at the time that the accident sequence began, they were utilizing their autopilot (as opposed to flying manually) which in turn required them to focus upon and monitor what the autopilot was commanding the aircraft to do as well. As discussed in the introductory chapter, the use of automation has many advantages, however it also ushers in the opportunity for flight crews to lose track of what the aircraft automation is commanding the aircraft to do (i.e., to experience a loss of situational awareness) setting up the potential for what is referred to as automation surprise which stems from automation misuse (i.e., a flight crew becoming surprised as a result of not monitoring the automation effectively) (Parasuraman & Riley, 1997). Additionally, at the time the accident sequence began to unfold, the first officer was likely focused upon requesting their landing data. At that time, she was also in the midst of determining the weather at their arrival destination, selecting and inputting the arrival runway, communicating with the captain about the weather and runway, and checking with the cabin crew to see if there were any “specials”

Training, Stress, Time Pressure, and Surprise

335

– all potentially required events likely serving to distract from the monitoring of the automation (Simons & Chabris, 1999; Dismukes, 2010; Dismukes, 2006; Dismukes, 2007; Dismukes et al., 1998). During their approach into Buffalo, as a result of the potential for icing given the weather conditions, the flight crew turned on the aircraft’s deicing system by selecting the Reference Speed (i.e., the REF SPD) switch to ON. Once this decision was made and the system was turned on, the flight crew also needed to request landing speeds which would factor in the potential of icing to accumulate on the surface of the wing during their final approach. Landing speeds for icing conditions are notably faster than those used when icing is not likely to exist. If icing were to begin to adhere to the aircrafts wing during the approach (which there is no evidence to suggest that that this is what happened on this night), higher airspeeds would theoretically enable the aircraft to maintain greater lift than at the lower non-icing landing airspeeds – offsetting the decrement in lift caused by the ice which would obstruct the smooth laminar airflow across the wing. To request landing airspeeds (either icing or non-icing), the pilot monitoring and not flying (in this case the first officer), was required to make the request through an onboard system referred to as the Aircraft Communications Addressing and Reporting System (ACARS). To do so, the first officer was required to input the request into the Flight Management Computer (FMC) by manually typing the request using a keypad. To request icing airspeeds, the first officer had to first remember that they needed to enter this information into the FMC, next they needed to select the correct page within the FMC and then type “ICING” in the correct location, using the correct spelling, and submit the request. Unfortunately, either a misspelling such as “ICEING” (versus ICING) or forgetting to go to the correct page and enter the information altogether, would result in the default non-icing airspeeds being provided to the flight crew. Although we don’t know whether the first officer forgot to make the request (a slip) or misspelled the input (error of commission), we do know that on this night, 24 minutes before the aircraft crashed, non-icing airspeeds were received by the flight crew which were incongruent with the REF SPD switch position and neither crew member detected that this discrepancy existed. A slip or error of commission are both types of human error (Norman, 1981; Reason, 1990). Regardless which type occurred, the system interface design did not enable the error to be easily nor readily noticed. This mismatch set the accident sequence in motion. As discussed, the REF SPD switch set to the ON position activates the aircraft’s deicing system, however it also increases the airspeed at which stall warning information is conveyed to the flight crew. When this aircraft is approaching an airspeed at which the stall warning system determines the wing of the aircraft will stall resulting in the aircraft no longer being able to sustain adequate lift to fly, a system called the stick shaker automatically activates. The stall warning system is a very salient warning in that it provides very attention-getting tactile, visual, and audible feedback to the flight crew. The yoke of the aircraft vibrates (referred to as stick shaker), a loud vibrating sound is emitted (i.e., stick shaker), and visual feedback is provided via the flight displays. This feedback is designed to capture human attention and focus it upon these warnings.

336

Human Factors in Simulation and Training

On this night, had icing speeds been requested, the VREF airspeed supplied to the flight crew would have been 138 knots; however, since this did not occur, a non-icing target airspeed of 118 knots was delivered to the first officer which was 20 knots slower than it should have been given the aircraft’s configuration. As a result, the airspeed at which the stall warning system activated was 20 knots faster than would have been reasonably anticipated by the flight crew and this framing mismatch, as evidenced by the flight crew’s reaction, both surprised and startled them as a result. At the time the VREF airspeed target was received, the first officer set the speed bug which depicts a fly-to target on the airspeed indicator (i.e., the airspeed tape). Because this target airspeed (118 knots) on the airspeed tape was out of view at the time it was being set (given the depicted airspeed resolution on the airspeed tape) and only the numerical presentation was visible, neither flight crew member had the opportunity to readily detect that a configuration error was occurring. Had the resolution of the airspeed tape enabled the 118-knot position on the tape to be visible at the time the bug was being set (or feedback provided in another form), the flight crew would have had the opportunity to see that the target airspeed that they were attempting to set was at too slow an airspeed (i.e., within the low-speed cue) based upon their configuration. This feedback would have likely had the opportunity to abruptly stop this accident sequence. Along the edge of the airspeed tape, a lowspeed cue which appears as a red and black barber pole clearly identifies the airspeed region in which the stall warning system will activate, and it would be reasonable to expect that a flight crew would not intentionally set an airspeed bug at any airspeed within the low-speed cue. Thus, as the flight crew was configuring the aircraft for landing, the autothrottles were commanding the aircraft to decelerate and the autopilot was subsequently commanding an increase to the pitch attitude of the aircraft as required during any final approach phase. In the final sequence of events, the first officer lowered the landing gear as required and then shortly thereafter extended the flaps to 10⁰, another required step in the process. As the aircraft decelerated below 131 knots (on its way to the 118-knot target airspeed that had already been selected), the stall warning system activated, and the autopilot disconnected instantaneously transitioning the control of the aircraft from the autopilot and autothrottles to the captain. This abrupt unexpected transition of control from the autopilot to the captain required a near-immediate manual intervention to enact the appropriate recovery procedure for an approaching wing stall. This likely came as quite a surprise to the flight crew because given the target airspeed they had received, this occurred at a significantly faster airspeed than they would have reasonably expected to result in a stall warning. Thus, their expectation was likely significantly violated by this automation surprise. Additionally, the stick shaker given its attention-getting properties and its occurrence at a time not expected, likely served to startle the flight crew as well. A perception of significant time pressure also likely existed as a near-immediate response is usually required for such a recovery maneuver. Given the communication between the crew as retrieved from the Cockpit Voice Recorder (CVR) and their performance recorded by the Digital Flight Data Recorder (DFDR), stress and perceived time pressure existed. From both the flight data recorder and the CVR, we know that the flight

Training, Stress, Time Pressure, and Surprise

337

crew’s response was opposite that of what was required to recover from an approaching wing stall. The captain should have decreased the pitch attitude of the aircraft (i.e., push the nose of the aircraft down), enabling an increase in lift, but instead, he manually increased the pitch attitude up to over 30⁰ which in turn then induced a wing stall. In approximately 3.7 seconds after the stick shaker activated, the captain increased power (albeit to a setting less than appropriate) which was another error committed in this chain of events. As the captain increased the aircraft’s pitch to over 30⁰, the aircraft rolled into a left 50⁰ bank (i.e., significantly greater than the 30⁰ maximum bank angle which is typically flown during normal operations) until the stick pusher (another automated system which automatically pushes the nose of the aircraft down in response to an approaching wing stall to reduce the angle of attack to enable the wing to produce lift) activated for the first time and the master caution warning (another attention-getting flight deck effect) also activated. At this point, the aircraft’s pitch attitude was decreased to an approximate 5⁰ pitch down attitude and the aircraft rolled into a 100⁰ right bank (resulting in the aircraft becoming inverted). At an airspeed of less than 110 knots (which would be considered quite slow for this aircraft in this configuration), the first officer inappropriately began to retract the flaps and then the landing gear, which were the final erroneous actions committed by this flight crew prior to the complete loss of control of this aircraft. Flaps enable the aircraft to fly at a slower airspeed, so retracting them would have resulted in the aircraft stalling sooner (i.e., at a faster speed). Within 26 seconds of the stick shaker first activating, this aircraft had crashed.

WHY DIDN’T PREVIOUS TRAINING PREVENT THIS ACCIDENT? Given that all pilots receive extensive training in the recovery to an approaching wing stall from the earliest stages of flight training and they continue to receive such training throughout their entire career (including as transport category airline pilots), we are interested in exploring, from a human factors standpoint, the underpinnings of this tragic accident. First, during our investigation, we noticed that the attributes of the approach to stall which existed on this night did not align with the typical scenarios used during most/all simulated stall training, so it is very likely that this flight crew had not previously experienced an approach to stall presented in this insidious manner. That is, while on autopilot which was automatically trimming a slowly increasing pitch-up attitude, along with the computer automatically decelerating the aircraft by retarding the throttles (autothrottle was engaged), and enabling the aircraft to decelerate to a purposefully selected (albeit mismatched) airspeed which would unexpectedly initiate the stall warning earlier than likely anticipated (given the misconfiguration), at an altitude in close proximity to the ground, at night without a visual horizon all while in conditions susceptible to icing. Added to this likely novel set of attributes, a conversation was taking place between the flight crew which likely served to further distract them from what the autopilot was commanding the aircraft to do (resulting in poorer situational awareness and ultimately resulting in an automation surprise). Thus, when the autopilot disconnected, the captain was ill prepared to take over and

338

Human Factors in Simulation and Training

manually control the aircraft. The startle/surprise likely experienced by the flight crew, while in close proximity to the ground, in and of itself could explain the captain erroneously increasing back pressure as a response to this situation. However, the first officer’s decision to retract the flaps during the attempted recovery sequence begs further question as to why this may have occurred. Consequently, there is an additional factor that needs to be taken into consideration. This flight crew had prior tailplane stall recovery training preceding the accident. However, the tail plane stall training was administered in the form of a video produced by NASA and no hands-on training in the airplane or simulator was made available to this flight crew. Given that this flight crew had been required to watch this video and their response to the stall was more aligned with a tailplane stall recovery, we are left to ponder whether they were confused as to which type of stall warning they were encountering. The NASA Tailplane Icing Video (produced by Glenn Research Center) indicates that tailplane stalls are a type of stall more likely to occur in icing conditions and during the approach phase as the aircraft decelerates. Crews are warned that some of the cues foreshadowing the conditions for this type of stall can be missed when autopilot is in use. Flight crews are also warned that the differences between tailplane and wing stall are subtle; however, the recovery procedure is the exact opposite. For an approach to wing stall, flight crews are instructed to add power, relax back pressure or push forward on the yoke or joystick depending upon the trim setting (i.e., decrease pitch attitude) and for tailplane stall, pilots are instructed to do the very opposite – pull back on the yoke/joystick (i.e., increase pitch attitude), reduce flaps (to last position), and reduce power (on some aircraft) as power aggravates the tail stall condition. Additionally, during a tailplane stall, the nose of the aircraft will pitch down subsequently requiring significant back pressure to counteract it. In this accident, the activation of the stick pusher (another automated system installed on the accident aircraft which automatically pitches the nose of the aircraft down) may have served to confirm a bias that the aircraft was experiencing a tailplane stall (confirmation bias). Important to note is that flight crews received very little exposure to the stick pusher activation during their training. Also, due to the lack of effective crew resource management depicted during the recovery sequence during this accident flight (e.g., verbal communication between the two), we are uncertain as to whether a negative transfer of training (related to wing versus tailplane stall recovery procedures) may have had any bearing on this flight crew’s procedural steps, but we suggest that it should be considered as a possibility.

OPPORTUNITIES FOR BETTER OUTCOMES Training: A Countermeasure for Responding to Startle and Surprise We have briefly discussed the historic training approach for emergency and nonnormal procedures where time pressure and stress are likely to exist at the time an accurate crew response is required. Historically, certain scenarios (such as recovery to an approaching wing stall) have been repeatedly practiced by flight crews such

Training, Stress, Time Pressure, and Surprise

339

that they are able to develop a level of automaticity in their response. This approach has served as the best countermeasure to prevent an ineffective, incorrect, or delayed response which could otherwise occur due to the normal, natural effects of stress and time pressure on human performance (e.g., hypervigilance, panic, etc.). However, this approach is effective as long as all of the attributes of the scenario practiced exist if and when the scenario presents itself during flight. Given the likelihood that a flight crew will not face a “textbook” scenario (i.e., at least some of the attributes are likely to be different), we have learned that the rigidity of this training approach needs to be addressed (Burratto & Graef, 2022; Landman et al., 2018). We have learned that flight crews are best served by experiencing variety in the setup for the scenarios in which they are presented during their training. Through experiencing a range of scenarios (e.g., in the attributes leading up to the stall warning), flight crews can train their ability to adapt to the variation in presentation, yet still apply a timely and appropriate response. This approach enables flight crews to train their ability to accommodate the differing attributes (i.e., make sense of them), reframe the situation, and apply an appropriate and timely response (Burratto & Graef, 2022; Landman et al., 2018). Over the last 20 years, the approach to flight crew training has been transitioning from a historically task-based approach, which focused upon providing repetitive training for predicted situations, to a competency-based approach which places a greater emphasis on training the skills which support enhanced performance when flight crews face startling, unanticipated situations (e.g., resilience resulting from demonstrating competence and confidence across a wide variety of scenarios). According to Burratto and Graef (2022), the goal of the Competency-Based Training and Assessment (CBTA) approach seeks to prepare flight crews for an infinite number of situations by developing a finite number of competencies enabling crews to successfully manage unexpected situations. By exposing flight crews to a variety of situations as opposed to the highly anticipated training scenarios that were historically presented, flight crews have demonstrated that they become better at their ability to adapt to novel situations. Not only will flight crews benefit from this type of training, research has shown that there is also opportunity for improvement in human performance when interfacing with automated systems (Stader et al., 2013).

Automation Design: Keeping the Human in the Loop Where Feasible As previously described, enabling the operator to transition system control away from the human and to the machine has no doubt brought about an added layer of safety and efficiency to flight operations as it allows for the offloading of a demanding task during high workload periods and enables greater precision of flight path control. In essence, it provides an additional resource to the flight crew and at a time when they need and/or want it. Today’s modern aircraft allow a pilot to select the level of automation desired. However, unfortunately, there are also downsides to this added resource. As standard operating practice has allowed (or even dictated) flight crews to use automation during a significant portion of their operations, the flight crew’s situational awareness and manual proficiency have been negatively impacted,

340

Human Factors in Simulation and Training

and this trend in reliance is likely to only be exacerbated by today’s worldwide shortage of expert pilots. Research into alternative designs (Stader et al., 2013) where either the machine or the human can determine when to best transition control has shown great promise for leveraging the benefits of automation while limiting the associated decrements. Unlike fixed “static” automation that we have previously reviewed, the design of adaptive automation (AA) is more human-centered and dynamic, thereby allowing users and machines to mutually exchange tasks and roles in the control of function allocation between a user (e.g., operator such as a pilot or driver) and machine (computer, autopilot, intelligent tutoring system, etc.). This dynamic exchange, also referred to as adaptive function or task allocation, can be initiated by either a computer or human operator as prescribed by adaptive automation philosophy (Scerbo, 1996, 2007). This process of invoking adaptive automation relies on various task parameters, such as performance criteria and theoretically based models of adaptation (Mouloua et al., 1993b; Parasuraman et al., 1996; Mouloua et al., 2002; Scallen et al., 1995; Harris et al., 1995), and levels of subjective, physiological, and secondary workload adaptation measures of workload states (Morrison & Gluckman, 1994; Hilburn et al., 1997; Byrne & Parasuraman, 1996; Kaber & Riley, 1999). The mechanism of invoking AA is mutually assumed and can be initiated by either a machine (adaptive) or a human (adaptable). An example of such a system currently in use is the Automatic Ground Collision Avoidance System (Auto-GCAS) and the Pilot Activated Recovery System (PARS) currently installed on the F-16. This system is designed to transition control away from the pilot when Controlled Flight Into Terrain (CFIT) is imminent as is seen in military operations as a result of spatial disorientation and/or G-induced Loss of Consciousness (G-LOC). When the pilot realizes they need assistance and is capable of soliciting the assistance, the adaptable automation system PARS is available to them. However, when the pilot is unable to solicit the automated assistance due to, for example, a loss of consciousness, the Auto-GCAS system automatically takes control of the aircraft (adaptive automation). By design, rules for invoking AA pre-exist the scenario and can be triggered through some algorithms using various performance metrics (such as detection accuracy, reaction time, flight path deviations, etc.), physiological responses (Hear Rate, HRV, EEG/ ERP waveforms, FNIRs, etc.), workload-based measures (Byrne & Parasuraman, 1996; Kaber et al., 2001; Bailey et al., 2006), and/or modeling and performance-based adaptation methods (Parasuraman et al., 1996; Mouloua et al., 1993b; Kaber et al., 2005). As a result of these studies, it was demonstrated that when users revert to manual control, even for a short period of time, then switch back to automation control, their performance is markedly improved over the course of subsequent automation cycles. Additionally, operator workload could be regulated and situation awareness could be maintained or improved under automation control due to the benefits of AA (Kaber et al., 2001, 2005, 2007; Kaber & Endsley, 2004). This is mainly due to enhanced operator engagement in the task, as well as their active control in the supervisory monitoring task (Parasuraman et al., 1996; Mouloua et al., 2019).

Training, Stress, Time Pressure, and Surprise

341

Moray et al. (2000) also examined the benefit of AA in a fault task using varying levels of automation reliability across both manual and automated control on performance measures such as root mean square error, avoidance of accident, false shutdowns, subjective trust in the system, and operator self-confidence. Their findings indicated that trust in automation, but not self-confidence, was strongly affected by automation reliability.

HIGHLIGHTING THE LESSONS LEARNED In this chapter, we focused upon, from a training standpoint, why the training that these pilots received (both in and out of simulators) did not serve as a meaningful countermeasure to the normal and natural hypervigilant or panic reaction that can be elicited by an occurrence that requires an accurate and near-immediate correct reaction in a situation where a failed response will likely subsequently cause a fatal outcome. We also highlighted the failures that occurred to discuss the related human factors literature. This chapter has been written to serve as a tool to better explain these concepts using a real-world example and ideally provide information that serves to help prevent accidents with similar causal and contributing factors (both within and outside the aviation domain) in the future. To that end, we offer the following guidance for the design of system interfaces and training as highlighted by this case study. • Since we cannot train responses to all possible failures, we need to train operators to retain some cognitive flexibility in the process so that they are able to reframe a scenario when required. To do so, operators should be exposed to a wide variety of unpredictable, yet practical training to exercise reframing skills in response to framing mismatches following scenarios where surprising and/or startling events ensue. • Systems fail; however, they need to remain tolerant of human error. It is imperative that human error can be detected and remedied and that operators remain capable of doing so. Through sound design and interface testing, it is imperative to eliminate the potential for undetectable framing mismatches to develop where feasible. • Enable an automation design philosophy that is more adaptive than static resulting in greater operator awareness and improved responses. • Distraction comes in many forms. The potential for human performance decrements that accompany distraction and inattention needs to be emphasized during training (e.g., inattentional blindness, change blindness, prospective memory error). • Automation can be incredibly effective at reducing operator workload, but the downside is that through the increase in reliance upon automation, the operator will likely become less capable of successfully handling a transition back to manual control especially when it occurs unexpectedly and when time pressure exists. Operational procedures need to take situational awareness, the potential for over-reliance and complacency, and the need for skill retention into account.

342

Human Factors in Simulation and Training

• Operators require a thorough understanding of the automation and must be adept in its use including how to effectively handle failures and unexpected disconnections. • Negative transfer of training must be a consideration during training development. Simulator fidelity and the accuracy of system behavior exposed to operators during training is an imperative consideration. • Specific to this accident and the aviation domain: both stick pusher and tailplane stall training need to be incorporated into flight crew training, and simulators need to be able to accommodate this training in a realistic way.

REFERENCES Advisory Circular 25.1309-1A. (1988). System design and analysis. June 21. Bailey, N. R., Scerbo, M. W., Freeman, F. G., Mikulka, P. J., & Scott, L. A. (2006). Comparison of a brain-based adaptive system and a manual adaptable system for invoking automation. Human Factors, 48(4), 693–709. Beringer, D. B., & Harris, H. C. Jr. (1999). Automation in general aviation: Two studies of pilot responses to autopilot malfunctions. International Journal of Aviation Psychology, 9, 155–174. Burratto, F., & Graef, R. (2022, January). Training pilots for resilience. Safety First: The Airbus Safety Magazine #33, 18–27. Byrne, E. A., & Parasuraman, R. (1996). Psychophysiology and adaptive automation. Biologicalpsychology, 42(3), 249–268. Casner, S. M., Geven, R. W., & Williams, K. T. (2013). The effectiveness of airline pilot training for abnormal events. Human Factors, 55, 477–485. Dewar, R., & Olson, P. (2007). Human Factors in Traffic Safety (2nd Ed., pp. 11–32). Tucson, AZ: Lawyers and Judges Publishing Company. Dismukes, K. (2006). Concurrent task management and prospective memory: Pilot error as a model for the vulnerability of experts. Proceedings of the Human Factors and Ergonomics Society 50th Annual Meeting, 909–913. Dismukes, K. (2007). Prospective memory in aviation and everyday settings. In M. Kliegel, M. A. McDaniel, & G. O. Einstein (Eds.), Prospective Memory: Cognitive, Neuroscience, Developmental, and Applied Perspectives (pp. 411–431). Mahwah, NJ: Erlbaum. Dismukes, R. K. (2010). Remembrance of things future: Prospective memory in the laboratory, workplace and everyday settings. In D. Harris (Ed.), Reviews of Human Factors and Ergonomics (Vol. 6, pp. 79–122). Santa Monica, CA: Human Factors and Ergonomics Society. Dismukes, K., Young, G., & Sumwalt, R. (1998). Cockpit interruptions and distractions: Effective management requires a careful balancing act. ASRS Directline, 10, 1–26. Driskell, J. E., Carson, R., & Moskal, P. J. (1986). Stress and Human Performance. Final report Orlando, FL: Naval Training Systems Center. Driskell, J. E., Willlis, R. P., & Cooper, C. (1992). Effect of overlearning on retention. Journal of Applied Psychology, 77, 615–622. Durso, F. T., & Alexander, A. L. (2010). Managing workload, performance, and situation awareness in aviation systems. In E. Salas & D. Maurino (Eds.), Human Factors in Aviation (pp. 217–247). Burlington, MA: Elsevier. Easterbrook, J. A. (1959). The effect of emotion on cue utilization and the organization of behavior. Psychological Review, 66, 183–201.

Training, Stress, Time Pressure, and Surprise

343

Endsley, M. (1999). Situation awareness in aviation systems. In D. Garland, J. Wise, & V. D. Hopkin (Eds.), Handbook of Aviation Human Factors (pp. 257–276). Mahwah, NJ: Erlbaum. Endsley, M. R. (2000). Situation models: An avenue to the modeling of mental models. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 44(1), 61–64. European Aviation Safety Agency. (2015). Loss of control prevention and recovery training: Notice of Proposed Amendment 2015–13. Cologne, Germany. FAR 25.1309 Equipment, systems, and installations. November 8, 2007. FAR 25.1302 Installed Systems and Equipment for Use by the Flightcrew. May 3, 2013. Federal Aviation Administration. (2015). Advisory circular (120/111). Washington, DC. Ferraro, J., Christy, N., & Mouloua, M. (2017). Impact of auditory interference on automated task monitoring and workload. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 61(1), 1136–1140. Ferraro, J., Clark, L., Christy, N., & Mouloua, M. (2018). Effects of automation reliability and trust on system monitoring performance in simulated flight tasks. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 62(1), 1232–1236. Foyle, D., Hooey, B., Wilson, J., & Johnson, W. (2002). HUD symbology for surface operations: Command guidance vs. situation guidance formats. SAE Transactions: Journal of Aerospace, 111, 647–658. Gainer, C. A., & Sullivan, D. J. (1976). Aircrew Training Requirements for Nap-of-the-Earth Flight. Final Report 203-1, Santa Barbara, CA: Anacapa Sciences, Inc. Army Research Institute For The Behavioral And Social Sciences. Harris, W. C., Hancock, P. A., Arthur, E. J., & Caird, J. K. (1995). Performance, workload, and fatigue changes associated with automation. The International Journal of Aviation Psychology, 5(2), 169–185. Hilburn, B., Jorna, P. G., Byrne, E. A., & Parasuraman, R. (1997). The effect of adaptive air traffic control (ATC) decision aiding on controller mental workload. In M. Mouloua & J. Koonce (Eds.), Human-Automation Interaction: Research and Practice (pp. 84–91). Mahwah, NJ: Erlbaum. Hole, G. J., & Tyrell, L. (1995). The influence of perceptual ‘set’ on the detection of motorcyclists using daytime headlights. Ergonomics, 38(7), 1326–1341. Holt, B. J., & Rainey, S. J. (2002). An overview of automaticity and implications for training and thinking process. Alexandria, VA: U.S. Army Research Institute Research Report 1790. International Civil Aviation Organisation. (2013). Manual of Evidence-Based Training (Nr. 9995). Montreal, Canada: Author. Kaber, D. B., & Endsley, M. R. (1997). Out‐of‐the‐loop performance problems and the use of intermediate levels of automation for improved control system functioning and safety. Process Safety Progress, 16(3), 126–131. Kaber, D. B., & Riley, J. M. (1999). Adaptive automation of a dynamic control task based on secondary task workload measurement. International Journal of Cognitive Ergonomics, 3(3), 169–187. Kaber, D. B., Riley, J. M., Tan, K. W., & Endsley, M. R. (2001). On the design of adaptive automation for complex systems. International Journal of Cognitive Ergonomics, 5(1), 37–57. Kaber, D. B., & Endsley, M. R. (2004). The effects of level of automation and adaptive automation on human performance, situation awareness and workload in a dynamic control task. Theoretical Issues in Ergonomics Science, 5(2), 113–153. Kaber, D. B., Wright, M. C., Prinzel III, L. J., & Clamann, M. P. (2005). Adaptive automation of human-machine system information-processing functions. Human Factors, 47(4), 730–741.

344

Human Factors in Simulation and Training

Kaber, D. B., Perry, C. M., Segall, N., & Sheik-Nainar, M. A. (2007). Workload state classification with automation during simulated air traffic control. The International Journal of Aviation Psychology, 17(4), 371–390. Klein, G. (2000). Analysis of situation awareness from critical incident reports. In M. Endsley & D. J. Garland (Eds.), Situation Awareness Analysis and Measurement (pp. 51–71). Mahwah, NJ: Erlbaum. Kochan, J. A. (2005). The role of domain expertise and judgment in dealing with unexpected events (PhD thesis). University of Central Florida, Orlando. Use of Frames to explain performance during surprise events. Landman, A., van Oorschot, P., van Paassen, M. M. R., Groen, E., Bronkhorst, A., & Mulder, M. (2018). Training pilots for unexpected events: A simulator study on the advantage of unpredictable and variable scenarios. Human Factors, 60(6), 793–805. Landman, A., Groen, E., Van Paassen, M., Bronkhorst, A., & Mulder, M. (2017a). Dealing with unexpected events on the flight deck: A conceptual model of startle and surprise. Human Factors, 59(8), 1161–1172. Landman, A., Groen, E., Van Paassen, M., Bronkhorst, A., & Mulder, M. (2017b). The influence of surprise on upset recovery performance in airline pilots. The International Journal of Aerospace Psychology, 27(1–2), 2–14. Landman, A., Groen, E. L., van Paassen, M., Bronkhorst, A. W., & Mulder, M. (2017c). The effect of surprise on upset recovery performance. 19th International Symposium on Aviation Psychology, 37–42. Latorella, K. A. (1996). Investigating interruptions: An example from the flight deck. Proceedings of the Human Factors and Ergonomics Society 40th Annual Meeting, 249–253. Lautman, L. G., & Gallimore, P. L. (1987). Control of the crew caused accident: Results of a 12-operator study. Airliner, 56(10), 1–6. Martin, W. L., Murray, P. S., Bates, P. R., & Lee, P. S. Y. (2016). A flight simulator study of the impairment effects of startle on pilots during unexpected critical events. Aviation Psychology and Applied Human Factors, 6(1), 24–32. Mauro, R., Barshi, I., Pederson, S., & Bruininks, P. (2001). Affect, experience, and aeronautical decision-making. Proceedings of 11th International Symposium on Aviation Psychology, Columbus, OH: The Ohio State University. Moray, N., Inagaki, T., & Itoh, M. (2000). Adaptive automation, trust, and self-confidence in fault management of time-critical tasks. Journal of Experimental Psychology: Applied, 6(1), 44. Morrison, J. G., & Gluckman, J. P. (1994). Definitions and prospective guidelines for the application of adaptive automation. Human Performance in Automated Systems: Current Research and Trends, 256–263. Mouloua, M., Ferraro, J., Parasuraman, R. Molloy, R., & Hilburn, B. (2019). Human monitoring of automated systems. In M. Mouloua & P. A. Hancock (Eds.), Human Performance in Automated and Autonomous Systems: Current Theory and d Methods (pp. 1–26). Boca Raton, FL: CRC Press (Taylor & Francis Group). Mouloua, M., Gilson, R., & Koonce, J. (1997). Automation, flight management and pilot training: Issues and considerations. In R. A. Telfer & P. J. Moore (Eds.), Aviation Training: Learners, Instruction and Organization (pp. 78–86). Aldershot: Avebury Aviation. Mouloua, M., Hancock, P., Jones, L., & Vincenzi, D. (2010). Automation in aviation systems: Issues and considerations. In J. Wise, D. Garland, & D. V. Hopkin (Eds.), Handbook of Aviation Human Factors (pp. 8-1–8-11). Boca Raton, FL: CRS Press (Taylor & Francis Group).

Training, Stress, Time Pressure, and Surprise

345

Mouloua, M., Parasuraman, R., & Molloy, R. (1993a). Monitoring automation failures: Effects of single and multi-adaptive function allocation. Proceedings of the 37th Annual Meeting of the Human Factors Society. Santa Monica, CA: Human Factors and Ergonomics Society, 1–5. Mouloua, M., Parasuraman, R., & Molloy, R. (1993b). Monitoring automation failures: Effects of task type on performance and subjective workload. Proceedings of the First Mid-Atlantic Human Factors Conference, 155–161. Mouloua, M., Smither, J.A., Vincenzi, D.A., & Smith, L. (2002). Automation and aging: Issues and considerations. In E. Salas (Ed.), Advances in Human Performance and Cognitive Engineering Research: Automation (pp. 213–237). Oxford: Elsevier. Norman, D. A. (1981). Categorization of action slips. Psychological Review, 88(1), 1–15. Oakley, B., Mouloua, M., & Hancock P. (2003). Effects of automation reliability on human monitoring performance. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 47(1), 188–190. Olson, P., & Farber, E. (2003). Forensic Aspects of Driver Perception and Response (2nd Ed.). Tucson, AZP: Lawyers and Judges Publishing Company. Parasuraman, R. (1987). Human-computer monitoring. Human Factors, 29(6), 695–706. Parasuraman, R., Bahri, T., Deaton, J. E., Morrison, J. G., & Barnes, M. (1992). Theory and Design of Adaptive Automation in Aviation Systems. Washington, DC: Catholic University of America cognitive science lab. Parasuraman, R., Mouloua, M., & Molloy, R. (1996). Effects of adaptive task allocation on monitoring of automated systems. Human Factors, 38(4), 665–679. Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39(2), 230–253. Parasuraman, R., Molloy, R., & Singh, L. (1993). Performance consequences of automationinduced “complacency”. The International Journal of Aviation Psychology, 3(1), 1–23. Rankin, A., Woltjer, R., & Field, J. (2016). Sensemaking following surprise in the cockpit: A re-framing problem. Cognition, Technology & Work, 18, 623–642. Use of Frames to explain performance during surprise events. Rasmussen, J., Pejtersen, A. M., & Goodstein, L. P. (1994). Cognitive Systems Engineering. New York: John Wiley & Sons, Inc. 135, 144–146. Reason, J. T. (1990). Human Error (pp. 1–18, 53–96). Cambridge: Cambridge University Press. Rousseau, R., Tremblay, S., & Breton, R. (2004). Defining and modeling situation awareness: A critical review. In S. Banbury & S. Tremblay (Eds.), A Cognitive Approach to Situation Awareness: Theory and Application (pp. 3–21). Hampshire: Ashgate. Savage, I. (2013). Reflections on the economics of transportation safety. Research in Transportation Economics, 43(1), 1–8. Scallen, S. F., Hancock, P. A., & Duley, J. A. (1995). Pilot performance and preference for short cycles of automation in adaptive function allocation. Applied Ergonomics, 26(6), 397–403. Scerbo, M. W. (1996). Theoretical perspectives on adaptive automation. In R. Parasuraman & M. Mouloua (Eds.), Automation and Human Performance: Theory and Applications (pp. 37–63). Hillsdale, NJ: Lawrence Erlbaum. Scerbo, M. (2007). Adaptive automation. In R. Parasuraman & M. Rizzo (Eds.), Neuroergonomics: The Brain at Work (pp. 238–252). New York: Oxford University Press. Shebilske, W. L., Goettl, B. P., & Garland, D. J. (2000). Situation awareness, automaticity, and training In M. Endsley & D. Garland (Eds.), Situation Awareness Analysis and Measurement (pp. 303–323). Boca Raton, FL: CRC Press.

346

Human Factors in Simulation and Training

Sheridan, T. B. (1981). Understanding human error and aiding human diagnostic behavior in nuclear power plants. In J. Rasmussen & W. B. Rouse (Eds.), Human Detection and Diagnosis of System Failures (pp. 19–35). New York, NY: Plenum. Simons, D. J. & Chabris, C. F. (1999). Gorillas in our midst: Sustained inattentional blindness for dynamic events. Perception, 28, 1059–1074. Staal, M. A. (2004). Stress, cognition, and human performance: A literature review and conceptual framework. NASA/TM-2004-212824, Moffitt Field, CA: Ames Research Center. National Aeronautics and Space Administration. Stader, S., (2014). Impacts of complexity and timing of communication interruptions on visual detection tasks. Unpublished Doctoral Dissertation, UCF Stars Library: Electronic Theses and Dissertations. 4571. Retrieved from https://stars​.library​.ucf​.edu​/etd​/4571. Stader, S., Leavens, J., Gonzalez, B., Fontaine, V., Mouloua, M., & Alberti, P. (2013). Effects of display and task features on system monitoring performance in the original multi-attribute task battery and MATB-II. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 57(1), 1435–1439. Wickens, C. D. (2002). Situation awareness and workload in aviation. Current Directions in Psychological Science, 11(4), 128–133. Wickens, C. D., Stokes, A., Barnett, B., & Hyman, F. (1993) The Effects of Stress on Pilot Judgment in a MIDIS Simulator. In O. Svenson & A. J. Maule (Eds.), Time Pressure and Stress in Human Judgment and Decision Making (pp. 271–292). Boston, MA: Springer. Wickens, C. D., & Dixon, S. R. (2007). The benefits of imperfect diagnostic automation: A synthesis of the literature. Theoretical Issues in Ergonomics Science, 8(3), 201–212. Wiener, E. L. (1977). Controlled flight into terrain accidents: System-induced errors. Human Factors, 19(2), 171–181. Wiener, E. L. (1988). Cockpit automation. In E. L. Wiener & D. C. Nagel (Eds.), Human Factors in Aviation (pp. 433–461). San Diego, CA: Academic Press.

Index A

Low-fidelity, 154, 227, 257, 279, 281

ACLS, 261, 263, 271 ATLS, 263–264, 271 ATOM, 260, 264, 270

M

B Biases, 333 BLS, 261, 263 Boot camp, 291

C Cognitive processes, 79, 212

D Debriefing, 104, 227, 256, 275–276, 278–280, 282–283, 286–288, 290, 292 Decision-making, 32, 54, 79, 87, 89, 94, 103–104, 108–111, 122, 127, 209–217, 219–224, 232, 237, 261, 267, 344

F Fallacies, 122 FES, 256, 263, 295 FLS, 256, 263, 273 FRS, 263

H Hand gestures, 298–299, 301, 307, 309–311 Healthcare simulation, 151, 160–162, 225, 227, 229, 231, 233–235, 237, 239, 241–243, 245, 247–251, 253, 272, 275, 277–283, 285, 287, 289, 291, 293–295 Highfidelity, 27, 56, 139, 149, 227, 243, 248, 267, 277, 291, 293

L Latent safety threats, 243, 281 Learning, 54, 67, 70, 72–80, 82–86, 90, 92–94, 97, 103, 130, 133–135, 140, 142, 145–146, 153, 163, 175, 179, 184, 186, 188, 190– 191, 198, 202–203, 215–216, 220–224, 226, 230–232, 234–236, 240, 245–246, 249–252, 256–257, 259, 261, 267, 269, 272–273, 275–280, 282–288, 290–296, 302, 319–321, 323, 326, 328

Mannequin, 228–229, 232, 234, 242, 247–248, 277, 280–282, 291–292, 296 Mastery training, 240–241, 245 Medical devices, 244, 265 Medical education, 226, 228, 231, 235, 245–253, 255–256, 261–262, 271–273, 286–287, 291, 294–295 Medical students, 153, 236, 239, 243, 261–262, 270, 293

Q Quality improvement, 269

R Rapid cycle deliberate practice, 275, 282, 284, 291, 293 Residents, 236, 239–241, 245–246, 249, 256, 261–264, 267, 270–272, 290, 296

S Semi-autonomous systems, 301 Simulation, 15–63, 65–66, 68, 70, 72, 74, 76, 78–80, 82, 84, 86–88, 90, 92–98, 100, 102–104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128–132, 134–138, 140–144, 146– 149, 151–154, 156, 158–168, 170, 172–174, 176–180, 182, 184, 186, 188, 190, 192–196, 198–200, 202, 204, 206, 209–210, 212, 214–220, 222, 224–253, 255–273, 275–298, 300–306, 308–324, 326–328, 331–332, 334, 336, 338, 340, 342, 344, 346 Standardized patients, 225, 228, 233–234, 246, 250, 253, 260, 264, 281 Surgery, 153, 176, 178, 227, 230–231, 235, 238, 243, 245–246, 248, 250, 252, 255–257, 262–273, 285, 288, 292–293, 295 Surgical simulation, 230–231, 251, 255–257, 259–261, 263, 265, 267, 269, 271–273

347

348

Index

T

V

Team training, 149, 160–161, 164, 221, 225, 236–238, 240–241, 246, 248–249, 251–253, 261, 268, 271–272 Training transfer, 65, 78, 94, 131, 218, 225, 238, 245

Virtual reality, 68, 78, 84–86, 152, 155–157, 167–168, 176–178, 181, 191–193, 207, 225, 228–229, 245–246, 249, 251–253, 256, 259–260, 263, 272, 281–282, 291, 299, 301, 318–321, 324–329

U Unity, 306, 312–313